Transcript Chapter 13

Chapter 13
How do Web Applications Work?
Typical Web Applications


Web Browser
E-mail
Web Browsing


Web Searching
Pop Up Windows
How does a search engine work?



It doesn’t search the Web.
A search engine contains a database with
information on lots of Web pages.
When you do a search, it looks through it’s
database to find pages which might be
useful and returns a list of them
Details for Search Engines




You submit a query.
The search engine looks through its
database.
The search engine orders the likely pages
by relevance.
The search engine returns the list of
pages.
Web Page Information




URL
Title
Keywords
Description
Search Engine Database



Search Engines typically use programs
called spiders which crawl the Web.
These spiders examine the Information on
Web pages that they find and save this
information to the database for the Search
Engine.
The spiders work 24/7/365 and they
revisit pages to see if they have changed.
Database (continued)


So the database at a typical search engine
contains information on millions of pages
that they can search when you do a query.
The search engine companies have
algorithms to determine how relevant a
page is to your query.
Relevance


Different search engines use different ways of
determining relevance.
For example, suppose you did a search on “cat
food”.


The search engine would look for pages whose titles
or descriptions or keywords were had cat food or cat
or food and arrange them in some reasonable order.
Probably they would list the pages with cat and food
ahead of the pages with car or food.
Relevance Continued

Cat food example (cont)



Some search engines might determine the
importance of a particular site based on how many
OTHER sites have a link to it.
Some search engines might determine the importance
of a particular site based on how often other users
who typed the query “cat food” chose a particular
site.
Some search engines might determine the importance
of a particular site based on money paid to the search
engine by the web site.
Why don’t Search Engines just
search the Web?



SPEED.
A typical search on Google, for example,
takes a few seconds
If they searched the Web it would
probably take 5 or 10 seconds EACH for
the Web pages examined. Thus a search
for “cat food” would take several hours
rather than 2 seconds.
How can Google search billions of
pages in its database in only two
seconds?


The pages are indexed.
So instead of having to look at each of the
pages, the search engine only has to look
through the index to find a page, much
like you’d use the index or the Table of
Contents to search a book
Why do porn sites show up a lot?




Pornography is big business so pornographers want their
sites to have lots of business.
(Lots of people search for porn at work or at school)
But you will often get porn sites even when you search
for something else.
Porn sites can manipulate information about web pages.



The official web site of the white house is www.whitehouse.gov.
www.whitehouse.com used to be a porn site. , and may still be.
One porn site added key words “windows, windows 95,
windows 98 …” and several others to it’s keyword metatag
and to its title page.
How do Web sites increase their
visibility?






Use metatags to make their sites more visible to
search engines.
Put relevant words in the page title
Put relevant words at the beginning of the text
in the page.
Put relevant words in several times.
Use relevant words as the name of web pages
i.e. “cat-food.html”
Error 404 tricks
Pop-up’s

Two kinds of pop-ups:




One that comes up when you visit a Web site.
One that comes up from another cause
(which we will not discuss in this chapter)
Web Pages consist of HTML tags which
describe how the information on the page
looks and the information itself.
None of this can cause a pop-up.
Pop-Ups Continued




Pop-ups are generated by scripts which
are part of Web Pages.
If you load my Web page and look at the
source you will see a <script> tag.
Script tags come in several types, the
most common of which is JavaScript.
JavaScript can be used to make Web
pages dynamic
Controlling Pop-Ups


Turn JavaScript off. Unfortunately this will
keep many Web sites from operating
properly.
Pop-up blockers built into Web browsers
can also be used but they also tend to
have problems.
E-Mail



MIME (multipurpose internet mail extensions) is
a standard that is used to send attachments to
e-mail messages.
MIME determines how certain files are
interpreted.
In general, today it’s probably better not to take
advantage of MIME’s capabilities since these
techniques can be used to send viruses.
What does an E-mail Message
Contain?



The message itself.
Header Information
Attachments
Attachments

Could be anything including sounds,
pictures, other multimedia, programs,
viruses, etc.
Header






Original To: / Deliver to:
From: / Reply to:
Subject:
Return path:
Message ID:
Other stuff
Spamming

Where do the Spammers get addresses:





Web sites
Newsgroups
From you
Purchase lists
Random addresses
How do you control Spam?




Don’t give out your e-mail address.
Keep several addresses including several
that you don’t use.
Firewalls
Spam filters
Spam

Legal system is largely ineffective
because:



Spam may originate from outside the country
Spam providers can be forged
Laws must be technology based.
Terminology




Event
Event-driven
programming
Indexing
Infinite Loop



MIME (Multipurpose
Internet Mail
Extension)
Spam
Web Crawler