Social Search Engine
Download
Report
Transcript Social Search Engine
Social Search Engine
Using trusted metadata to improve the
relevance of search results
The motivation factor
The essential idea of SpaK social search engine is
that people make decisions based primarily on a few
people whom they trust. The average person has a
set of experts whom they consult in designated
areas: the computer expert, the car expert, the
fashion expert, the financial expert. If the opinions of
these experts can be collected, they are incredibly
useful: it is this metadata (data about other data)
that gives the most intelligent filtering and sorting of
the information on the internet.
"We're the search engine, but you're the fuel."
What is a search engine?
A search engine is a program designed to help find
information stored on a computer system, such as
the World Wide Web, inside a corporate or a
proprietary network or a personal computer.
The search engine allows one to ask for content
meeting specific criteria and retrieves a list of
references that match those criteria.
Search engines use regularly updated indexes to
operate quickly and efficiently.
A program called a ‘crawler’ indexes all the web
pages as it ‘crawls’ through the links on a page.
Problems with existing search
engines
Page rank system: Page rank assumes each
incoming link is a valid vote for a website.
Some links are not really valid at all. People
use guestbook and blogs to spam and they
are becoming less efficient.
Problems with existing search
engines
Word frequency: In this method, relevance is
decided based on the number of times the
word repeats in web page. Word frequency
can be increased by inserting unrelated
keywords in a ‘meta’ tag of a web page.
An alternative: Social Search
Social community
+
Examples:
Existing search engines
Examples:
Engineer
Doctor
Student
Actors
Indians
Americans
Yahoo
MSN
Google
AltaVista
Social search
engines
About social search engines
Social search engines are a class of search
engines that use social networks to organize,
prioritize or filter search results.
They use ‘metadata’ to judge the relevance of
web pages to a user.
‘Metadata’ is defined as ‘data about data’.
In this case, the metadata refers to the
feedback given by the community about the
web pages.
Continued…..
It is really about people indexing the
information we find on the web ,instead of the
computational formulae that guide the
traditional sites.
Since the relevance is based on trust, the
users of such a social search engine are
automatically secured from spamming and
phishing sites.
This is how it works for the
example
User searches for "Thailand", and the page
containing photos of a friend's Thailand
vacation is chosen by the search engine.
An illustration…
Continued…
Continued…
Design And Implementation
Details
The registration process
New user
SIGN UP
REGISTRATION PAGE
•ASSIGNED A UNIQUE UID AND USERNAME.
•NEW ENTRY IN THE MEMBERS TABLE.
Re-register
NO
YES
Login
After successful log in…
Profile page
1
View Community
2
View Buddies
3
4
5
Plain search
Search History
Categories
Community
View community
List of other
Registered members
THE BUDDY TABLE
IN THE DATABASE IS ACCESSED
Rate your friend on
a scale of 1 to 5.
Display message:
ADDED AS FRIEND
NO
YES
Display message:
ALREADY ADDED AS FRIEND
Buddies…
View buddy
CLICK TO DELETE
Database entry deleted.
Display message:
DATABASE ENTRY HAS BEEN DELETED
Search history…
Personal search history
Search on specified topic
Buddy search history
Search
Stem from user
and his buddies
Select the query
Porter’s stemmer
algorithm
Default results
Similarity function
KN(p,q) / MAX [ kn(p), kn(q) ]
Aggregate user rating,
clicks, similarity.
Result array extracted
Using community feedback
Display re-ordered
results
Default result array
Using the search api
The Similarity Function
For each stemmed word, we select a similar stem
from the database, and the queries associated with
that stem are extracted.
Using the similarity function
KN (p, Q)/max [kn (p), kn (Q)]
Where KN (p, Q) is the number of common words in
the extracted query (p) and the user query (Q).
kn(p) is the number of words in the extracted query.
kn(Q) is the number of words in the user query.
Continued…
The output of the similarity function is a real number
which lies between (0,1).The similarity value for a
user query is stored in the database for
corresponding extracted query.
We calculate an aggregate value that is a function of
(similarity * clicks * rating) and order the links in the
descending order of the output of this function.
The array of links got from community feedback is
compared with the default search results of the API.
The default search results of the API are then
rearranged based on the metadata received from
the community.
Continued…
The output is arranged in decreasing order
for a user based on previous searches made
by his buddies for the same search query or
a related query.
Stemming...its importance
Stemming is the process of stripping the
suffix off a word.
Stemming is important for our project
because words with common stems will
usually have similar meanings,
for example: predict, prediction, predicted etc.
Keywords in the search query are grouped
according to their stems.
The Porter Stemmer
Description of Porter Stemmer
A consonant in a word is a letter other than A, E, I, O or U.If a letter is not a
consonant then it is a vowel. A consonant will be denoted by c, a vowel by v. A
list ccc... of length greater than 0 will be denoted by C, and a list vvv... of length
greater than 0 will be denoted by V. Any word, or part of a word, therefore has
one of the four forms:
CVCV ... C
CVCV ... V
VCVC ... C
VCVC ... V
These may all be represented by the single form
[C]VCVC ... [V]
where the square brackets denote arbitrary presence of their contents.
Using (VC) {m} to denote VC repeated m times, this may again be written as
[C](VC){m}[V].
m will be called the \measure\ of any word or word part when represented in
his form. The case m = 0 covers the null word.
User interface design
Technologies used
Linux : The operating system
Apache : The web-server
MySQL : The RDBMS
PHP
: Hypertext Pre Processor
CGI
: Common Gateway Interface
CSS
: Cascading Style Sheets
The End.
Thank you. Please try it out!