Google-opoly. - Amy N. Langville

Download Report

Transcript Google-opoly. - Amy N. Langville

Amy N. Langville
Mathematics Department
College of Charleston
[email protected]
Math Meet
2/20/10
Outline
Short History of Web Search
 Link Analysis and Google’s PageRank
 The Random Surfer
 Google-opoly
 March Madness
 Conclusion

Thesis
1998
Pre-1998 Web

Trip back in time to 1995
–
How did you find information then?
Pre-1998 Web

Trip back in time to 1995
–
How did you find information then?
– Better question:
Pre-1998 Web

Trip back in time to 1995
–
How did you find information then?
– Better question: how old were you then?
Pre-1998 Web

Trip back in time to 1995
–
How did you find information then?
– Better question: how old were you then?
Inverted Index
Main tool of pre-1998 search engines
Problems with the Inverted Index
•Too many pages
Problems with the Inverted Index
• Too many pages
• Spam
Problems with the Inverted Index
• Too many pages
• Spam: human eyes vs. spider eyes
Problems with the Inverted Index
• Too many pages
• Spam: human eyes vs. spider eyes
Problems with the Inverted Index
• Too many pages
• Spam: human eyes vs. spider eyes
Problems with the Inverted Index
• Too many pages
• Spam: human eyes vs. spider eyes
Win a ipod
Learn how to make millions
Text 8 if you’re awake
Link Analysis
text analysis
1998
Link analysis
• pre-1998 engines only used text analysis.
• Link analysis saved search from SEOs and
built companies like Google, Yahoo, Ask.
• Nearly every major search engine uses
link analysis.
Link Analysis
text analysis
1998
Link analysis
• pre-1998 engines only used text analysis.
• Link analysis saved search from SEOs and
built companies like Google, Yahoo, Ask.
• Nearly every major search engine uses
link analysis.
Moral #1
Sometimes being perceived as an expert
forces you to become one.
What happens when you google?
All the old text analysis
+
the new link analysis
What happens when you google?
ranked list
1
2
3
4
5
6
7
8
Why are rankings so important?
Web as a graph

Each node is a
webpage.

Each arrow is a
hyperlink.
In-links vs. Out-links

In-links are
of endorsement from one
page to another.
A Trip to Google-topia

Emmie

Randy, the Random Surfer
video clip
A Random Walk on the Web graph
Matrix Notation
BUT THERE ARE
SOME PROBLEMS!

The surfer gets
stuck!

This is called a
dangling node.

How does Google
fix this?
The surfer can “teleport”

We add a link from
the dangling node to
every other node.

When web surfing,
this is equivalent to
typing an address in
the URL bar.
Probability Matrix

We must also take this into consideration for
our probability matrix.
Dangling nodes and
teleportation
video clip
Let’s look at another
problem.

Our surfer gets
stuck in the
webpages 4, 5, and
6.

This is called a
cycle.

How do we fix this?
Cycling
video clip
Full Teleportation

We must consider the possibility of, at any
time, using the URL bar to type an address.
 We add an extra link from every vertex to
every other vertex.
Surfing vs. teleporting

Do people always use the URL bar as much
as they use hyperlinks?
 Google doesn’t think so.
 They think you only use the URL about 15%
of the time.
Computing PageRank by observing Randy
video clip
Summary of Ranking
Search query
Pull out relevant
webpages from
inverted index
Use PageRank and other
information to rank webpages
Creators of Google

Sergey Brin and
Larry Page

Computer Science
majors

Now entire PhD
programs in
information retrieval
Creators of Google

Sergey Brin and
Larry Page

Computer Science
majors

Now entire PhD
programs in
information retrieval
The world’s largest eigenvector computation
Moral #2
Take a leave of absence for brilliant ideas.
More on PageRank

SIAM’s WhydoMath? Project
–

url =http://dev.whydomath.org/node/google/index.html
DDL on PageRank
–
url = http://spinner.cofc.edu/~langvillea/DISSECTIONLAB/ClarePageRankModule/1_WebLetter.html?referrer=web
cluster&

LOCI: Google-opoly
–
url=http://mathdl.maa.org/mathDL/23/?pa=content&sa=view
Document&nodeId=3355
Moral #3
The more ways you can view a problem, the
more likely you are to truly understand it, and
hence, solve it.
Google-opoly

applets
March Madness
How should teams vote?
• Losing teams give one vote to each team that beats them.
• Losing teams vote with margin of victory.
• Both winning and losing teams vote with # points scored.
Point Differential Voting
Moral #4
Now is a great time to do math.
Conclusion
PageRank is a sophisticated algorithm
that set Google apart
 The Web can be represented with
graphs and matrices
 PageRank’s idea of Voting has many
applications.

Acknowledgements
Tim Chartier
 Carl Meyer
 Emmie Douglas
 Kathryn Pedings
 Clare Rodgers
 Erich Kreutzer
 Ben Kovanich
 Ryan Dumville

Luke Ingram
 Anjela Govan
 Nick Dovidio
 Yoshi Yamamoto
 Neil Goodson
 Colin Stephenson
