HowToHitInGoogle

Download Report

Transcript HowToHitInGoogle

How to hit in google
The anatomy of a modern web
search engine
1
Gregor Gisler-Merz
23.07.2003
Content:
Why do we need search engines
Design goals of a search engine
What are the benefits of a basic Web Search Engine knowledge? 4
System Anatomy:
Google Architecture Overview
Searching
How do I practically benefit from the new insights.
Search tips
How do I get listed in google
References
2
3
3
5
6
7
7
8
Gregor Gisler-Merz
23.07.2003
Why do we need search engines:
• The amount of information is growing rapidly
- over 3 billion indexed documents till now
- over 150 million queries per day
• Human maintained indices cover not every topic, are expensive to build and maintain.
• Automated search engines that rely on keyword matching usually return too many low quality matches.
• A lot of advertisers take measures to mislead automated search engines.
Design goals of a search engine:
• Improve search quality
• Easy usage
• Novel research activities on large scale web data
3
Gregor Gisler-Merz
23.07.2003
What are the benefits of a basic Web Search Engine knowledge? :
• Know what you can expect from your searches.
• Get a listing of your own web site.
• Build a reasonable Intranet Search Engine.
• Improve your search infrastructure in your own applications.
4
Gregor Gisler-Merz
23.07.2003
Google Architecture Overview:
• Most of Google is implemented in C/C++ .
• Downloading of web pages by several distributed web
crawlers.
• Every stored web page has an associated ID (docID).
• The Indexer reads the repository, uncompresses the
documents, and parses them.
• Parsing/Scanning is done by a lexical analyzer
(generated with flex)
5
Gregor Gisler-Merz
23.07.2003
Searching :
• The Google Query Evaluation
1 Parse the query
2 Convert words into wordIDs.
3 Seek to the start of the doclist in the short barrel for every word.
4 Scan through the doclists until there is a document that matches all the
search
terms.
5 Compute the rank of that document for the query.
6 If we are in the short barrels and at the end of any doclist, seek to the
start of the
doclist in the full barrel for every word and go to step 4.
7 If we are not at the end of any doclist go to step 4.
Sort the documents that have matched by rank and return the top k.
• The ranking system includes hitlists, anchor text and the PageRank. Google always tries to balance out on thes
factors.
• Page Ranking is backed by a lot of mathematics (graph theory, linear algebra and so on)
6
Gregor Gisler-Merz
23.07.2003
Search tips:
• Specify your search as much as you can.
• Use exact phrases “Säuliämtler Seifenkistenrennen”
• Look for Zürich with StopWords +Zürich
• Exclude unwanted words with the - operator
How do I get listed in google?
• Choose the correct keywords for your site and raise the keyword density.
• Place your most important keyword phrase toward the beginning of the title tag.
• Use Description and Keyword Meta Tags.
• Use Header Tags.
• Incorporate keywords in the alt tag of your images and place keywords to Page links.
• Create a site map and a contact page.
• Put only Quality Content on your Site (250-300 word per page).
• Create for one keyword only one doorway page.
• Do not use hidden text, repair broken links.
• Attention with FRAMES: Add a lot of keyword rich text to the NOFRAMES tag.
• Get reciprocal links and cross link your site (if possible).
Now get your web site listed in the major search engines and get a good ranking!!
7
Gregor Gisler-Merz
23.07.2003
References :
• google http://www.google.com/addurl.html
• altavista http://www.altavista.com/addurl.html
• alltheweb http://www.alltheweb.com/add_url.php
• Tipps for getting listed: http://www.totalsubmission.co.uk, http://www.amigos.org
• PageRank Uncovered: http://www.supportforums.org/PageRank.pdf
• PageRank Computation and the Structure of the Web: Experiments and Algorithms
http://www2002.org/CDROM/poster/173.pdf
• The Anatomy of a Large-Scale Hypertextual Web Search Engine
http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm
• flex scanner generator: http://www.gnu.org/software/flex/flex.html
8
Gregor Gisler-Merz
23.07.2003