Technology for E
Download
Report
Transcript Technology for E
Technology for E-commerce
Helena Ahonen-Myka
In this part...
search tools
metadata
personalization
collaborative filtering
data mining
Search tools
the site has to be accessible
site architecture and navigation
structure is important
… but some users prefer search
keep users on the site
usage can be monitored: useful
knowledge about the users’ needs
Users’ preferences
search: 50%
navigation: 20%
mixed: the rest...
Search tools
Indexer: gathers the words from
documents (HTML pages, local files,
database records) and puts them into
an index file
Search engine: accepts queries, locates
the relevant pages in the index, and
formats the results in an HTML page
Remote vs local search
search tool can reside in a different
server, also in a remote location
indexing may take a lot of processing
time, and the resulting index may need
a lot of space
local software may be faster
Indexer
local: scans directories
web spider: an indexing robot begins at
a given page, then follows the links and
stores words of the pages
’robots.txt’ file: which robots allowed
HTML meta elements:
<meta name=”robots” content=”noindex, follow”>
<meta name=”robots” content=”index,nofollow”>
<meta name=”robots” content=”noindex,nofollow”>
Indexer
link structure should reach all the pages
that should be indexed
non-text links (imagemaps etc.): robots
may not be able to follow links ->
provide also text links
frames: provide some navigational links
to give a context, if the page is retrieved
by a query
Search page
search forms are the user interface of
the search engine
simple form: just a text field and a
button
or a(n advanced) search page: boolean
search, date ranges, subscopes...
Search results
the occurrences of the query terms are
located from the index
the results are sorted according to their
(assumed) relevance to the query
results page should have the same
look-and-feel than the other pages on
the site
Why searches fail?
empty searches: people just put the
search button without giving any words
wrong scope: people think they are
searching the entire web
vocabulary mismatch: terms are too
specific, too general, just not used
spelling mistakes
query requirements not met
Why searches fail?
problems with query syntax: spaces,
parentheses, etc.
capitalization and special characters:
exact matches required
stopwords: some common words are not
indexed
short words: short words are not
indexed
numbers are not indexed
No-matches pages
answer pages to the user if the search
does not return any matches
should have the same look-and-feel
than the other pages + navigation aids +
search again field
explanations why the search might have
failed and what to do next
Some usability issues
web design: strong sense of structure
and navigation support
some people do not like to search
people who search end up in some
page: they should know where they are
people need to move around in the
neighborhood
search should be available on every
page
Some usability issues
scoped search: difficult for the users to
understand what is the scope -> scope
should be stated clearly, and a search
to the entire site has to be offered easily
boolean search is difficult: ’cats and
dogs’ vs ’cats or dogs’ -> ’or’ could be
used in the query, ’and’ in the ordering
Metadata
often a search results in a long list of
matches; many of them may be
irrelevant
metadata can make the queries more
powerful
HTML meta elements
<head profile=”http://www.acme.com/profiles/core”>
<title>How to complete memo cover sheets</title>
<meta name=”author” content=”John Doe”>
<meta name=”copyright” content=”© 2000 Acme”..
<meta name=”keywords” content=”corporate,
guidelines, cataloging”>
<meta name=”date” content=”2000-10-17”>
</head>
Metadata
RDF (Resource Description Framework):
– Gives means to define metadata for XML and HTML
documents
– Give means to interchange it between different applications
on the Web
Example: Dublin Core metadata
– Contains 15 elements (title, creator, date…)
Dublin Core
Dublin Core Metadata Elements:
Content:
Title
Subject
Description
Language
Relation
Coverage
Intellectual
Property:
Creator
Publisher
Contributor
Rights
Instance:
Date
Type
Format
Identifier
Dublin Core in RDF
Dublin Core represented in RDF
<RDF:RDF>
<RDF:Description RDF:HREF="URI">
<DC:Relation>
<RDF:Description>
<DC:Relation.Type> isPartOf
</DC:Relation.Type>
<RDF:Value RDF:HREF="URI2"/>
</RDF:Description>
</DC:Relation>
</RDF:Description>
</RDF:RDF>
Searching XML documents
structure of XML documents can be
used to make more precise queries, e.g.
find Albert Einstein in Author element
only
problem: how the user specifies the
structure
Searching XML documents
1) The user specifies the hierarchy in
the query: Einstein in Author
2) The user makes a simple query, but
the search engine presents the
alternative contexts: Einstein can be in
Author or in Street or in School
Using links
good site: many links into the site,
particularly from other good sites
text surrounding the link describes
(probably) what the target of the link is
about
the knowledge above + the contents of
the page itself are taken into account
e.g. Google (www.google.com)
Natural language queries
E.g. Ask Jeeves
questions and answers prepared by
human editors
user’s query is mapped to the prepared
queries
Personalization
goal: the right people receive the right
information at the right time
but: people do not like to state complex
queries, or initialize a service (like
answering a questionaire)
user profiles have to be generated and
stored, preferably automatically
User profiles
may contain data like: interests,
geographical area, age
could be collected once, and shared
with many services
trust of the user: the profile should only
be used to offer better service, and only
if the user wants to let some service to
use it
Recommendations
users who bought this book also bought
these books / liked these cd’s etc.
rating movies, tv programs, wines…
recommending paths on a site
Recommendations
based on the user’s former behavior
and profile data
based on social (collaborative) filtering:
what similar users liked
User’s former behavior
if used as the only source: the user
never sees anything new
particularly a new user hardly gets any
recommendations
Collaborative filtering
draws on the experiences of a
population or community of users
the profile information of the target user
is compared to the profiles of nearestneighbor users
look for correlation between users in
terms of their ratings: recommend items
that are included in the neighbors profile
but not in the target user’s profile
Collaborative filtering
Problems:
cannot recommend new items (some
users have to rate an item before it can
be recommended)
unusual user may not get (good)
recommendations: no neighbors that
are close enough
Matching engines
Apply one set of complex characteristics
to another
e.g., recruiting sites: match a job seeker
and a job
Data mining for e-commerce
users’ behavior on the web site provides
a lot of information:
Which pages the users view?
Which paths the users navigate?
How long the users spend on the site?
What is the rate of viewing a product
and purchasing it?
Data mining process
Gathering the data
Cleaning/preprocessing the data
Transforming the data
Analysis / finding general models
Interpreting the results
Using the knowledge
Data collection
clickstream logging: web server logs or
packet sniffers
business event logging
Clickstream logging
web log: page requested, time of
request, client HTTP address, etc.
lot of requests for images -> have to be
filtered out
users and user sessions difficult to
identify
requests for a page: the same page, but
different dynamic content
Clickstream logging
more efficient at the application server
layer
instead of just pages, knowledge on
products
user and session tracking possible
also track of information absent in web
server logs: pages that were aborted
while being downloaded
Business event logging
looking at subsets of requests as one
logical event or episode:
add/remove item to/from shopping cart
initiate/finish checkout
search (log keywords and nr of results)
register
From order data to customers
collected data is order-oriented
data for each customer is spread into
many records
information on customers is the real
target
information for each customer has to be
aggregated
From order data to customers
What percentage of each customer’s
orders used a VISA credit card?
How much money does each customer
spend on books?
What is the frequency of each
customer’s purchases?
Model generation
Answer questions like:
What characterizes heavy spenders?
What characterizes customers that
prefer promotion X over Y?
What characterizes customers that buy
quickly?
What characterizes visitors that do not
buy?
Data mining tools
e.g., classification rules
IF Income > $80,000 AND
Age <= 30 AND
Average Session Duration is between
10 AND 20 minutes
THEN Heavy spender
Understanding the results
result of a data mining process may be
difficult for a business user to
understand: e.g. thousands of rules
visualization is important
tailored for a specific domain
Using the results
site structure can be updated
procedures like registering or checkingout can be simplified
metadata can be added to make search
more efficient
personalization rules, recommendating
systems