A Comparative Study of Various search engines Manab Chetia, Aniruddha Mazumdar, Jugapratim Gohain, BiswaJit Choudhury, Aniruddha Deka, Mridul Jyoti Roy Assam Engineering College, Jalukbari, Guwahati, Assam Abstract??” The purpose of this paper is to study different search engines’, their advantages and cons. In this paper we have compared the two most popular search engines i. e. Google and Yahoo, how they are different from each other in terms of technology, user experience and performance.
Although there are other search engines available in the web but they are not so popular among common people and heir ranking of pages, search content are not so much liked by people. That’s why we have limited our study and comparison in this paper only to the giants like Google and Yahoo. B. Other Search Options Available with Google: Keywords??”search engines, technology l. INTRODUCTION HE Internet was invented as the ARPANET[I] . There were only four nodes or computers connected to it. But t ay nas Increased to sucn as a nuge amount tnat no one will be able to remember the name of all the nodes or the websites.
Today for a single topic there are almost thousands of websites, which contain the information about it. It is mpractical to remember these websites. Search engines play their role and eased our Job. If someone wants to know about a topic, typing the query and clicking the search button, provides the list of websites matching the searched query. It is the list of the results or the ranking of pages makes each search engine unique. Google uses Page Rank and more 200 algorithms to give good results. Bing was designed to understand natural language queries. Search Engines like Google, Bing , Yahoo have become a part and parcel of our life.
We use them in our day life without considering the performance or technological ifferences between them. e. g. Google displays results which are more popular. Yahoo displays better results for news. In this paper we have presented the technology used by Google and Yahoo , their differences and their similarities. II . GOOGLE: AN OVERVIEW A. Features: Implied Boolean (+) sign, (-) sign Double quotes V) for phrases Stop words Google File Search Specific file search “l’ m Feeling Lucky'(goes directly to top ranked site in query) “Google scout” (bring up list of related sites) “Uncle Sam” (Searches govt. nd Milsites) “Search within results” option Field searching with ‘link only Google Scholar Maps Images News YAHOO: AN OVERVIEW A. Structure Yahoo is hierarchically organized with subject catalogue or directory of the web, which is browseable and searchable . Yahoo indexes web pages, UseNet and email address B. Features Topic and region specific “yahoos! ” Automatic truncation No case sensitivity and stop words The syntax that yahoo follows for searching is fairly standard among all search engines C.
Search Option Cached page archives Result clustered by indention Result displayed option, from 10-100 “Google Search” Supports: We can browse Yahoo Simply by clicking on the arious categories listed on each page, or can search Yahoo By entering a word into the search box that appears on every page in the directory. Again one can combine the two strategies and can “browse and then search” or “search and then browse. ” D. “Main page” supports: Search in yahoos subject categories Implied Boolean(+) and (-) signs Double quotes(” for phrases i. e. phrase search Truncation: use of* e. g. physic*, denotes suffix or right truncation.
Field specific search: use of (t 🙂 and URL respectively. Advanced search (labeled ‘search options’) supports: All features of “main page” search and Boolean type searching. Yahoo subject categories. “UseNet newsgroups” searches Other search options Yahoo News We may combine any of the query syntax as long as the syntax is combined in the proper order, which is +, t:, and *. If Yahoo does not find any matching entries, pertaining to a query, in its main database, the query will automatically be transferred to the Inktomi database PERFORMANCE: Google  was and is among the popular search engines.
They have determined relevancy primarily on their PageRank algorithm. PageRank essentially says that a site that has more inbound links than heir competitors is likely a better site, therefore should rank higher. Webmasters soon realized this, and also realized that all they had to do was build an increased number of links – enough links to outpace their competitors – in order to rank highly. Google of course has reacted by changing the ranking algorithm somewhat. Now there are elements 0T autnorlty ana relevancy appllea to tne Page Ran K algorithm.
How Google works is, once the pages are crawled and indexed, they are returned to Google for ranking . Google has employed thousands of servers to calculate these rankings. They look at hundreds of factors – both on the page and off the page (such as inbound links). They have used hundreds of algorithms to perform these calculations. Essentially there should be one algorithm per factor. The algorithms weight the pages, and assign their values. These values are then stored for later use. When we perform a query , yet another set of algorithms weigh the previously calculated values against one another to determine overall relevance.
Results are then outputted to the users browser. As one can imagine, this type of processing power requirement must be huge. In addition, based on how fast Google returns results, not much data can be written to the hard drives of the individual servers. Therefore, one must assume that most of the Google index resides virtually in memory. Or at least the parts that are served to users. We searched for “search engine” (We intentionally misspelled it) and it returned 68,900 results. In addition, the engine returned some sponsored results across the side of the page, as well as a spelling suggestion.
All in 0. 36 seconds. And for popular queries the engine is even faster. For example, searches for Hurricane Katrina or MTV awards (both recent events) took less than . seconds each. And Google is famous for decentralization and redundancy. For every single cached page there is likely 2-3 copies stored, perhaps even more. Google has broken the index into very small parts – as small as 2 Megabytes each, and as mentioned earlier, these 2 Megabyte sections are stored all over the Google infrastructure. Each 2 Megabyte section may be stored next to an unrelated section.
For example, there may be a few pages from a pet site next to pages from a blog, next to pages from an e-commerce site. While each datacenter acts independent of the other, there is likely some overlap in tasks. Let us imagine a room with thousands of computers running in unison with each other. Now imagine that same room copied over and over to all the other data centers spread out throughout North America. It is because of these different data centers, each acting separately, but with the same end goal, that we used to experience the “Google Dance” monthly.
The Google Dance was tnat perloa 0T time wnen Google would update tnelr search results across the data centers. Further, each data center would update on its own, so pages that may have ranked #1 in one data center may not have appeared in the top 30 on other data centers. Of course the factors Google has used to rank pages has changed over time. They have placed less emphasis on PageRank, but it is still important. Its important to note that moving different factors around within the calculation can greatly impact a site’s rankings.
For example, if the site has a high PageRank, but a low keyword density, it may rank #1 if PageRank affects the calculation later, however the site may disappear from the results if PageRank is considered earlier. And this is probably what is happening now – Google has essentially moved the PageRank factor to somewhere else in the final calculation. Remember, there are likely hundreds of factors affecting rankings. By rearranging the order in which they are applied to the final rankings can have a dramatic impact on overall placement on the search results page.
Google also appears to have moved from a once per month update to a more perpetually updating index. We only rarely notice the changes happen, but they do happen on a more incremental level, with more major updates happening less frequently. I guess one could view Google as a series of layers – each layer building on the work performed by the layer before. The uppermost layer is the only one we are exposed o via the browser, however that page that you see would not exist without the work performed by the lower layers .
Yahoo – While no one other than Yahoos engineers know for sure, we can speculate that Yahoo search technology works very similar to Google’s The reason Yahoo is so difficult to gauge is because they haven’t really built a search engine from the ground up like Google or MSN. Of course the Yahoo search you see is unique unto it, however Yahoo has built its search on the backs of other technologies they have purchased in previous years. It was Just around Christmas 2002 when Yahoo has purchased earch service Inktomi. Up until then Yahoo had received their search results either from Inktomi or more recently Google.
In fact, up until the time they purchased Inktomi there was speculation that Yahoo would buy Google. It was Just a few months after this that Overture (a payper-click advertising company) purchased Altavista – one of the first and strongest search engines out there. Then, lust a Tew weeks arter tnat overture purcnasea Alltneweo. com Trom FAST. It was clear that Overture was going to move into the algorithmic search space. But shortly after these rumblings began that Yahoo may e interested in purchasing some or all of Overture’s technology. And in July 2003 Yahoo did indeed buy Overture.
We didn’t hear much about Yahoo search until February 2004 – that’s when the company launched it’s own version of algorithmic search. And it wasn’t what many expected. Some thought that theyd simply rebrand Inktomi, while others thought they would rebrand one of the Overture purchases and turn either Altavista or Alltheweb search into Yahoo search But that isn’t what happened. Yahoo has built their own search, cobbling together features from all the technology they owned. They had the super fast Inktomi and Altavista crawlers, as well as the surprisingly good Alltheweb and Altavista ranking algorithms.
So they mashed that all together to get Yahoo Search. Yahoo Search isn’t much different that Google. Their own website says that they analyze pages using many factors to determine relevance to a search query, and the results of that analysis are what the user sees when they perform a query. Of course Yahoo like all the other engines, has spent the past year or more working to improve its ranking algorithms. When they first came out, it seemed that they placed a lot of emphasis on the home page of a given site, ith less emphasis on inbound links, or even the other site pages.
However, over the past few months we’ve noticed a subtle shift from homepage only rankings to multiple site pages ranking where the home page once ranked. In addition, they tend to rank inbound links differently than Google. When you perform a link check on Google and the same check on Yahoo the Google results almost always tend to be lower. Google says this is because they only show a snapshot of the “relevant” links whereas Yahoo shows them all regardless of relevance. And there are other differences as well, but there are too many to go through in this article.
Suffice to say that Google and Yahoo has used roughly the same technology to return similar results. Granted we have seen differences in the rankings, but this is due to many things. For example, Yahoo appears to update less frequently than Google. We’ve worked with sites that have new pages indexed and ranking in Google within days of creation and sometimes it can take months for Yahoo to do the same. Essentlally wnat we are saying Is tnls: IT all we are concerned with is rank – then optimizing for Google will get us decent rankings in Yahoo but it may Just take longer for us to show up in Yahoo search results.
That is because, in the end, the technology behind both Yahoo and Google is very similar. CONCLUSION With Microsoft’s new Bing search engine , there has been a close competition between Google, Yahoo and Bing . Each of them is trying to be the best in the field of worldclass search. Microsoft has included many features to its search engine like the Quick Previews which peek at a site prior to visiting it, Explorer Pane for more enhanced search, Sentiment Extraction and many more. Google is also not lagging behind to provide the extraordinary search  service with its Street Views.
Yahoo on the other hand is no weak opponent. With its rich Yahoo content network, Yahoo makes streaming audio and playing it within your search results. Now summing up everything, we can conclude that Google is currently the one at the strongest position and is capable of giving us quality. But to be ahead of others, it has to keep on implementing new features and cater to the changing needs of the user. But the future is unknown to all, no one knows what it holds and how will these search engines be in future. VI. REFERENCE Wikipedia, History of the Internet http://en. wikipedia. g/wiki/ARPANET The evolution of speech:a comparative review W. Tecumseh Fitch http://groups. lis. illinois. edu/amag/langev/paper/fitchoospeech. html  Recognition and Correction of VoiceWeb Search Queries Keith Vertanen, Per Ola Kristensson  Is Google enough? Comparison ofan internet search engine with academic library resources, Jan Brophy and David Bawden ,Department of Information Science, City University, London, I-JK.  The Perfect Search Engine Is Not Enough: A Study of Orienteering Behavior in Directed Search Jaime Teevant, Christine Alvaradot, Mark S. Ackerman* and David R.
Karger  SPEECHBOT: An Experimental Speech-Based Search Engine for Multimedia Content in the Web Jean-Manuel Van Thong, Pedro]. Moreno, Beth Logan, Blair Haler, Katrlna MaTTey an o Mattnew Moores  The Business and Politics of Search Engines: A Comparative Study of Baidu and Google’s Search Results of Internet Events in China  The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin and Lawrence Page  http://en. wikipedia. org/wikwahoo_search  Research Paper On Bing Determining Web Page Credibility http://www. seroundtable. com/bing-site-credibility- 131 55. html