Do search engine's robots search the entire web each time that somebody cliks the search button or do their search indexes on say Google's servers?
My guess is that it's impossible to search the internet in several seconds.
Howe is it in reality?
|
Do search engine's robots search the entire web each time that somebody cliks the search button or do their search indexes on say Google's servers?
My guess is that it's impossible to search the internet in several seconds.
Howe is it in reality?
What happens is that the search bots “crawl” through the Web, taking snapshots of as many Web pages as they can find. The info they gather is then cached and indexed. When someone makes a Web search, the engine will go through the cache of Web pages and display results that match the search terms.
This is a simplified account of what basically happens; what actually happens may be much more complicated. For example, some engines can recognize typos or spelling errors and display close-matching results rather than exact-matching results.
You may notice that when Google displays a search result, it includes a brief quote from the Web page displaying some or all of your search terms; when you click the link to go to the Web page, however, you may find that the contents of the page is different from the brief quote accompanying the search result. This is proof that Google does not search the Web directly, but only its cache of stored pages. Between the time when the crawler took a snapshot of the Web page and the time you made your search, the Web page might have been updated.
Cached pages are also not permanently stored with the search engine. Each cached page has limit for how long it’s stored; when the time limit has expired, it will be deleted from the cache. This is to prevent the engine’s server from becoming overloaded with outdated cached pages.
Well, for large scale like Google, I haven't had the chance to learn yet. But maybe for a flavor of the direction you might head, say you have a search engine for a small website. You make a matrix, with each column being a normalized vector of the keyword frequencies (rows correspond to keywords). The search query is turned into a normalized vector of keywords. You do the matrix vector multiplication, resulting in vector of the cosine of the angle between each column and the search vector. Order them by angle to see which pages match the keyword query best.
References:
Steven J. Leon, Linear Algebra with Applications, 2002, p 230-232.
« best major to get into computer science? | Computers in Robots » |