21.2.12

nvsDv ksd vk

vdnksDnvk
yugjbasfdg
7gushsa6u
gh8iguy

5.10.11

Location Metric


•Location metric is a function of its location, not of its contents

•A URL u leading to P, then u defines the location metric.

•URLs ending with ".com" may be deemed more useful than URLs with other endings

•URL containing the string "home" may be more of interest than other URLs

•Another location metric that is sometimes used considers URLs with fewer slashes more useful than those with more slashes.


These five metrics which define the importance of a webpage may be combined and give the best results. For example-
If  a crawler has it’s algorithm in a way that it analyses both backlinks, location metric, and pagerank then it’ll give much better results to it’s client.

Forward Link Count


•Links that emanate from P are called Forward Links.

•A page with many outgoing links is very valuable, since it may be a Web directory.

•Forward links are used in conjunction with other factors to reasonably identify index pages.

•We could also define a weighted forward link metric, analogous to IR(P)

SEO: PageRank


•More the backlinks to a webpage more the pagerank it has

•The PageRank backlink metric, IR(P), recursively defines the importance of a page to be the weighted sum of the backlinks to it.

•It is very useful in ranking results of user queries. 


Mathematical representation of PageRank:

•consider a page P that is pointed at by pages T1, ..., Tn.

•Let ci be the number of links going out of page Ti.

•Also, let d be a damping factor* (whose intuition is given below). Then, the weighted backlink count of page P is given by-
IR(P) = (1-d) + d ( IR(T1)/c1 + ... + IR(Tn)/cn)

•This leads to one equation per Web page

•The equations can be solved for the IR values



Damping factor:
Let user "surfing" the Web, starting from any page, and randomly selecting from that page a link to follow. When the user reaches a page with no outlinks, he jumps to a random page. Also, when the user is on a page, there is some probability, d, that the next visited page will be completely random. This damping factor d makes sense because users will only continue clicking on one task for a finite amount of time before they go on to  something unrelated. The IR(P) values we computed above give us the probability that our random surfer is at P at any given time.

Backlink Count


•Backlinks are the links to page ‘P’ that appear over the entire Web.

•a page P that linked to many pages is more important than one that is seldom referenced

•Many links to a webpage make it more important because this page may react like as a web directory and will help both client and web crawler

•.Crawler treats all links equally. Thus, a link from the Yahoo home page counts the same as a link from some individual’s home page.

•However, since the Yahoo home page is more important (it has a much higher backlink count), it would make sense to value that link more highly.