How To Read Research Papers

Download Report

Transcript How To Read Research Papers

Search
•
•
•
•
•
Search Engines
Search Engine Optimization
Search Interfaces
Personalization
The openness of the Web changes everything
- Access
- Technological progress
- Expectation
Anatomy of a LS Web Search Engine
• Initial Google Design
• PageRank
- PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
- “A model of user behavior”
• probability of a random surfer visiting a page is
its PageRank +
• a damping factor (boredom)
- Pages point to a page
- Highly ranked pages point to a page
- Anchor text is mined (the label for the link)
- Proximity included
Anatomy 2
• Repository of page content
• Document index
- Forward (sorted)
- Inverted (sorter)
•
•
•
•
•
Lexicon of words & pointers
Hit Lists of word occurrence(s)
Crawlers
Ranking
Feedback of selection (~)
Metasearch Issues
•
•
•
•
•
•
•
•
•
One place for everything?
First place to look?
Last place to look?
Better interface?
Combined results
Syntax Errors
State Information (monitoring)
Copyright
User, content and interface
mismatches/challenges
Search Engine Optimization
• Found by spiders and submissions
- More links to and from site
- Registration on major directories
- Links to and from major directories
• Real Contact information Helps prove validity
-
META tag
Header and footer of home page
About Us or Contact Us pages
Location/Map page
Good Design is SEO
• Basic interface
• Well-structured links
- Comprehensive Site Navigation
- Updated and accurate links
• Easy to find (via the Web or on the site itself)
• Clear labels
-
TITLEs
Headings
Term consistency
Link consistency
• Small sizes to download quickly
Web Search Tests
• Perform searches with targeted keywords
• Compare and contrast top results with your potential
site
- Similar terms
- Links (external and internal)
- Popularity (sites that link to the site)
• Use Data to
- Build a keyword list
- Build an introductory text
• Blurbs
• Description (2 sentences max)
• Any page found via a Web search engine should have
search for the site itself
• Regularly monitor Search with your terms
Internal Search
• Robots.txt
• Log and analyze search results
Measure success and failure
Tune for click-through productivity
Keep list of terms
Match terms to pages
• Add terms
• Script terms to certain pages
- Provide list (links) of most recent search terms
- Provide list (links) of most popular search terms
-
Page Design
• Use CSS
- <style type=“text/css”>
- Keep content in pages, not CSS templates
• Put JavaScript, etc. in external files
- <script language=“JavaScript” src=“scripts/myscript.js”
type=“text/javascript”>
</script>
- <noscript> tag too for alternate content
•
•
•
•
•
•
Continually verify external links
ALT tags & Accessibility Compliance
Index link on Splash page (if needed)
Exact consistency on internal links (ending “/”s)
<noframes>
Redirects <META HTTP-EQUIV=“refresh” content=“0”;
URL=http://www.newsite.com/index.html>
Search and MIME types
• Flash now supports internal text
• PDF files
- Add comments and authorship info
- Modify existing PDFs
• Check Document PropertiesFonts with fonts shows that
PDF can be indexed (not a group a graphics files)
- Provide text abstract or summary of PDF
• PPT, use text if possible
• Java interfaces prove difficult
• Dynamic pages should have key(word) static
elements
• FORMs not always completely indexed
Track your Tracking
• Keep list of sites submitted to
- When, Who, Email address, exact URL submitted
- Suggested categories, Current site description
- Terms and Conditions
• Keep list of “goal” keywords
• Keep list of sites you check keywords
- Keywords
- Dates
- Successes/Failures