Technology for E

Transcript Technology for E

Technology for E-commerce
Helena Ahonen-Myka
In this part...





search tools
metadata
personalization
collaborative filtering
data mining
Search tools





the site has to be accessible
site architecture and navigation
structure is important
… but some users prefer search
keep users on the site
usage can be monitored: useful
knowledge about the users’ needs
Users’ preferences



search: 50%
navigation: 20%
mixed: the rest...
Search tools


Indexer: gathers the words from
documents (HTML pages, local files,
database records) and puts them into
an index file
Search engine: accepts queries, locates
the relevant pages in the index, and
formats the results in an HTML page
Remote vs local search



search tool can reside in a different
server, also in a remote location
indexing may take a lot of processing
time, and the resulting index may need
a lot of space
local software may be faster
Indexer




local: scans directories
web spider: an indexing robot begins at
a given page, then follows the links and
stores words of the pages
’robots.txt’ file: which robots allowed
HTML meta elements:
<meta name=”robots” content=”noindex, follow”>
<meta name=”robots” content=”index,nofollow”>
<meta name=”robots” content=”noindex,nofollow”>
Indexer



link structure should reach all the pages
that should be indexed
non-text links (imagemaps etc.): robots
may not be able to follow links ->
provide also text links
frames: provide some navigational links
to give a context, if the page is retrieved
by a query
Search page



search forms are the user interface of
the search engine
simple form: just a text field and a
button
or a(n advanced) search page: boolean
search, date ranges, subscopes...
Search results



the occurrences of the query terms are
located from the index
the results are sorted according to their
(assumed) relevance to the query
results page should have the same
look-and-feel than the other pages on
the site
Why searches fail?





empty searches: people just put the
search button without giving any words
wrong scope: people think they are
searching the entire web
vocabulary mismatch: terms are too
specific, too general, just not used
spelling mistakes
query requirements not met
Why searches fail?





problems with query syntax: spaces,
parentheses, etc.
capitalization and special characters:
exact matches required
stopwords: some common words are not
indexed
short words: short words are not
indexed
numbers are not indexed
No-matches pages



answer pages to the user if the search
does not return any matches
should have the same look-and-feel
than the other pages + navigation aids +
search again field
explanations why the search might have
failed and what to do next
Some usability issues





web design: strong sense of structure
and navigation support
some people do not like to search
people who search end up in some
page: they should know where they are
people need to move around in the
neighborhood
search should be available on every
page
Some usability issues


scoped search: difficult for the users to
understand what is the scope -> scope
should be stated clearly, and a search
to the entire site has to be offered easily
boolean search is difficult: ’cats and
dogs’ vs ’cats or dogs’ -> ’or’ could be
used in the query, ’and’ in the ordering
Metadata


often a search results in a long list of
matches; many of them may be
irrelevant
metadata can make the queries more
powerful
HTML meta elements
<head profile=”http://www.acme.com/profiles/core”>
<title>How to complete memo cover sheets</title>
<meta name=”author” content=”John Doe”>
<meta name=”copyright” content=”© 2000 Acme”..
<meta name=”keywords” content=”corporate,
guidelines, cataloging”>
<meta name=”date” content=”2000-10-17”>
</head>
Metadata

RDF (Resource Description Framework):
– Gives means to define metadata for XML and HTML
documents
– Give means to interchange it between different applications
on the Web

Example: Dublin Core metadata
– Contains 15 elements (title, creator, date…)
Dublin Core

Dublin Core Metadata Elements:
Content:
Title
Subject
Description
Language
Relation
Coverage
Intellectual
Property:
Creator
Publisher
Contributor
Rights
Instance:
Date
Type
Format
Identifier
Dublin Core in RDF

Dublin Core represented in RDF
<RDF:RDF>
<RDF:Description RDF:HREF="URI">
<DC:Relation>
<RDF:Description>
<DC:Relation.Type> isPartOf
</DC:Relation.Type>
<RDF:Value RDF:HREF="URI2"/>
</RDF:Description>
</DC:Relation>
</RDF:Description>
</RDF:RDF>
Searching XML documents


structure of XML documents can be
used to make more precise queries, e.g.
find Albert Einstein in Author element
only
problem: how the user specifies the
structure
Searching XML documents


1) The user specifies the hierarchy in
the query: Einstein in Author
2) The user makes a simple query, but
the search engine presents the
alternative contexts: Einstein can be in
Author or in Street or in School
Using links




good site: many links into the site,
particularly from other good sites
text surrounding the link describes
(probably) what the target of the link is
about
the knowledge above + the contents of
the page itself are taken into account
e.g. Google (www.google.com)
Natural language queries



E.g. Ask Jeeves
questions and answers prepared by
human editors
user’s query is mapped to the prepared
queries
Personalization



goal: the right people receive the right
information at the right time
but: people do not like to state complex
queries, or initialize a service (like
answering a questionaire)
user profiles have to be generated and
stored, preferably automatically
User profiles



may contain data like: interests,
geographical area, age
could be collected once, and shared
with many services
trust of the user: the profile should only
be used to offer better service, and only
if the user wants to let some service to
use it
Recommendations



users who bought this book also bought
these books / liked these cd’s etc.
rating movies, tv programs, wines…
recommending paths on a site
Recommendations


based on the user’s former behavior
and profile data
based on social (collaborative) filtering:
what similar users liked
User’s former behavior


if used as the only source: the user
never sees anything new
particularly a new user hardly gets any
recommendations
Collaborative filtering



draws on the experiences of a
population or community of users
the profile information of the target user
is compared to the profiles of nearestneighbor users
look for correlation between users in
terms of their ratings: recommend items
that are included in the neighbors profile
but not in the target user’s profile
Collaborative filtering



Problems:
cannot recommend new items (some
users have to rate an item before it can
be recommended)
unusual user may not get (good)
recommendations: no neighbors that
are close enough
Matching engines


Apply one set of complex characteristics
to another
e.g., recruiting sites: match a job seeker
and a job
Data mining for e-commerce





users’ behavior on the web site provides
a lot of information:
Which pages the users view?
Which paths the users navigate?
How long the users spend on the site?
What is the rate of viewing a product
and purchasing it?
Data mining process






Gathering the data
Cleaning/preprocessing the data
Transforming the data
Analysis / finding general models
Interpreting the results
Using the knowledge
Data collection


clickstream logging: web server logs or
packet sniffers
business event logging
Clickstream logging




web log: page requested, time of
request, client HTTP address, etc.
lot of requests for images -> have to be
filtered out
users and user sessions difficult to
identify
requests for a page: the same page, but
different dynamic content
Clickstream logging




more efficient at the application server
layer
instead of just pages, knowledge on
products
user and session tracking possible
also track of information absent in web
server logs: pages that were aborted
while being downloaded
Business event logging





looking at subsets of requests as one
logical event or episode:
add/remove item to/from shopping cart
initiate/finish checkout
search (log keywords and nr of results)
register
From order data to customers




collected data is order-oriented
data for each customer is spread into
many records
information on customers is the real
target
information for each customer has to be
aggregated
From order data to customers



What percentage of each customer’s
orders used a VISA credit card?
How much money does each customer
spend on books?
What is the frequency of each
customer’s purchases?
Model generation





Answer questions like:
What characterizes heavy spenders?
What characterizes customers that
prefer promotion X over Y?
What characterizes customers that buy
quickly?
What characterizes visitors that do not
buy?
Data mining tools

e.g., classification rules
IF Income > $80,000 AND
Age <= 30 AND
Average Session Duration is between
10 AND 20 minutes
THEN Heavy spender
Understanding the results



result of a data mining process may be
difficult for a business user to
understand: e.g. thousands of rules
visualization is important
tailored for a specific domain
Using the results




site structure can be updated
procedures like registering or checkingout can be simplified
metadata can be added to make search
more efficient
personalization rules, recommendating
systems

Technology for E

Transcript Technology for E

Directory