Transcript The Web
The Web and E-mail
• Is The Web and The Internet the same
thing?
• No
• The Internet is the networking infrastructure
– Hardware and software that connects host
machines and enables communication
• Hardware includes:
– Links such as cables and wireless channels
– Nodes such as user computers, routers, and gateways
• Software includes TCP and IP protocols
Chapter7 The Web and E-mail
1
What is the Web?
• The web (World Wide Web) is a collection
of data (audio, video, text, etc.) typically
connected through hyperlinks
• The web is hosted by servers on the Internet
– Web sites are accessed using a protocol called
HTTP (HyperText Transport Protocol)
• The Internet hosts other protocols
– ftp (try frp://ftp.giga.net.tw for example)
Chapter7 The Web and E-mail
2
Web Basics
• What is a Web site?
– Collection of related and formatted information
• Formatted by html (hypertext media language) and
its derivatives (i.e., xml)
– Hypertext are data linked together by logical relationships
– Accessed by web browsers (client software)
– Controlled by web servers (computers) that
respond to client requests
– Sites are composed of web pages
Chapter7 The Web and E-mail
3
URL
• Universal Resource Locator
– Each web page has a unique URL
– http://www.fdu.edu – FDU’s domain address
– https://webmail.fdu.edu
• Secured http (SSL used for encryption)
– http://alpha.fdu.edu/~levine/survey/index.html
•
•
•
•
http: web protocol standard
alpha.fdu.edu - fdu’s web server
levine and survey – subdirectories (folders)
index.html -- default web page
Chapter7 The Web and E-mail
4
HTML, JavaScript, XML
• HTML – formatting & linking instructions
– Use JavaScript to allow client interaction
– HTML5 is the newest standard
• XML – data relationships, dynamic pages
• HTML format can be seen by “view source”
• Compare source code to web page
Chapter7 The Web and E-mail
5
Steps in an http data transfer
• Client inserts URL in browser address field
– Client’s browser, TCP/IP software and Ethernet
or wireless LAN card set up a connection with
server at requested URL
• Client software opens a socket (port is assigned by
OS) to connect to “listening” port (80, 443) at server
– Client clicks on a link (get command) for
downloading data to its machine
– Server sends requested page.
– Client and server disconnect
Chapter7 The Web and E-mail
6
Web surfing errors
• http status code 404 is returned if the site/
page is invalid
• Broken link is seen if a graphic does not
exist or client does not have correct access
rights
• Sometimes the server is busy or has crashed
and you will get a message that server may
be busy or temporarily down
Chapter7 The Web and E-mail
7
Browsers
• Internet Explorer, Mozilla Firefox, Google
Chrome, Apple Safari, Opera
– Netscape, first commercial browser with GUI,
was used by FDU from 1994 – 2009
• Based on Mosaic
– By 1997 Internet Explorer, bundled with
Windows, replaced Netscape as top browser
– Firefox and Chrome are open source; include
advanced security features
Chapter7 The Web and E-mail
8
Browsers and file formats
• Browsers need to be updated to handle new
file formats, patches as they are introduced
• Sometimes additional software is needed
– Adobe Reader (free software) handles PDF
files
– Adobe’s flash software for Flash (discontinued)
– Explorer has ActiveX component tags to
specify additional (helper) applications
Chapter7 The Web and E-mail
9
Web cache
• Pages that you download are stored in your
local browser cache for days or weeks
– Cache is used to make retrieval much faster in case you
reference the site again
– If you make changes on a web page, you are likely to
view the old copy. Try the reset arrows
– If you browse on a public computer, others can find
these pages
• Use Explorer/tools/delete browsing history
• Explorer/tools/Internet options/delete temporary files
– Caches are searched by forensic personnel
Chapter7 The Web and E-mail
10
What are cookies?
• Web servers store small sets of data in text
files on client hard disk for later use
– http is “stateless” – eases exchanging information
• Place items in Shopping cart one at a time
– Servers use cookies to monitor client behavior
• Pages visited
• Items purchased
– Cookies help collect information to facilitate targeting clients with
ads tailored to them
– Keep client information for later visits
Chapter7 The Web and E-mail
11
How to write html code?
• Use HTML conversion utilities
– Results may need to be changed
– Might be difficult to alter
• On-line page authoring tools are frequently
offered by ISPs that host your web pages
– Very limited templates
• Web authoring software
– Adobe Dreamweaver, SeaMonkey (open source)
Chapter7 The Web and E-mail
12
Web pages with text editors
• Learn to write html code
– Use a text editor for the framework
– Images, video, etc. are inserted separately
through links and tabs
<video>
Chapter7 The Web and E-mail
13
Basic HTML code
<html>
or <!DOCTYPE HTML -->
<head>
<title> My web page </title>
</head>
<body>
<i> This is my first web page </i>
<b> <br> This is my second line </b>
</body>
</html>
Chapter7 The Web and E-mail
14
Writing html script
• Open notepad
• Write your code
• Save as
• All files
• With htm extension
• Place in a location you recognize
• Keep both the directory and notepad open
• Execute the file; change notepad and redo
Chapter7 The Web and E-mail
15
Links to pages and images with
HTML
<img src =“picture.gif” > where picture.gif is
in the same directory as your file
In general, you must give path to the image.
<a href="environments/index.html">
Information on the Unix and Ada
environments at FDU</a>
Chapter7 The Web and E-mail
16
HTML scripts
• Scripts allow pages to react to user input
– “interactive”
• Script forms allow users to enter data
– Can limit the values that are entered
• Client side scripts are typically written in
JavaScript
• Server side scripts are typically written in
Perl, PHP, JAVA, C#
Chapter7 The Web and E-mail
17
Search Engines
• Search engines include:
– Google; bing; yahoo; ask; altavista
– They are automated tools
– They search huge data bases
• Some search engines are human-powered
– www.mahlo.com; www.chacha.com
• Screened by human experts
• Many sites have humans answering queries
Chapter7 The Web and E-mail
18
Process used by search engines
• Web crawlers (spiders) “methodically” visit
sites and download pages for analysis
– They start at well known sites; follow site links
– Algorithms eliminate loops, self references
• Different search engines visit somewhat
different sets of sites
• Only about 20% of pages are “seen”
– Password protected; dynamically created, etc.
Chapter7 The Web and E-mail
19
Search engine indexing
• Database contains lists of pages based on
key words
– Web designers try to ensure that key words are
placed near the top of the document
• Old comic books. Choose some key words.
–
–
–
–
Comics
Comic book
Superhero
Batman, Superman, Wonder Woman, Captain America
Chapter7 The Web and E-mail
20
Ranking and placement
• Search engines may sell “sponsored links”
– Top rows; right column
• Link popularity – links from and to popular
sites
• High rating by # of keywords found during
indexing
– Page author may “stuff” popular keywords (e.g.
sex) even though the page is not about sex
Chapter7 The Web and E-mail
21
Query processing
• (keyword) Queries typically produce many
thousands of results – how to narrow this?
– Common words (and, the, if) are ignored
– Variations of your keywords are also searched
• Appending * specifies searching for variations
– Exact phrase should be entered in quotes
– Near operator to specify that keywords should
be adjacent, such as library NEAR/15 congress
Chapter7 The Web and E-mail
22
Privacy and search engines
• Search engines keep information of user
queries (perhaps for limited time??)
• U.S. Department of Justice has requested
query databases, specifically for child
pornography
• AOL released a database on-line of 20
millions queries in 2006
– User ID, search keywords; date and time
Chapter7 The Web and E-mail
23
Search engines and user IDs
• Search engines assign user IDs to
computers during search
– User ID remains in cookies on your computer
• You can delete all cookies after each
session
• You can block cookies from specific sites
Chapter7 The Web and E-mail
24
Copying web material
• Obviously, sites frequently copy words
from each other without reference
– For homework or research it is plagiarism
• Words and pictures can be copied for
educational use (copy/paste option)
– Must cite the reference
– Use URL at the top of the page; title, author,
date if available
• Check link for terms of use
Chapter7 The Web and E-mail
25
E-Commerce
• E-commerce uses the Web and Internet for
transactions and information
– Businesses increase profit margin by cutting
costs (estimated that on-line methods cost about
10% of methods that require human interaction)
• Many telephone services today are automated
Chapter7 The Web and E-mail
26
Advertising on the Web
• Banner ads embedded in (top of) web pages
• Hover ads overlay web pages – client must
specifically close them
• Pop-up ads appears in a separate window
when client connects to a site
– Clicking on any of these connects client to
advertiser’s web site
• Hosting merchant is paid for click-through rate
Chapter7 The Web and E-mail
27
Blocking and deleting cookies
• Internet Explorer
– Tools/ Internet Options/ Privacy
• Pop-up blocker – turn-on (block most pop-ups)
• Advanced Privacy Settings/override automatic
» First-party cookies – accept/block/prompt choice
» Third-party cookies – block
– Tools/Internet Options/General
• Delete browsing history on exit
– Tools/Internet Options/Advanced/security/
– empty temp Internet files folder when browser is closed
Chapter7 The Web and E-mail
28
E-mail
• Anyone can use e-mail if they have access
to the web and have an e-mail account
– Yahoo, google, client’s ISP provide accounts
• E-mail messaging includes
–
–
–
–
cc
Forwarding, reply
Group emails
Attachments (MIME is an Internet protocol to
encapsulate all types of data as ASCII)
Chapter7 The Web and E-mail
29
Email formatting option
• Email software may allow Word or HTML
formatting
– Email can then contain graphics and other types
of non-ASCII formats
– Exposes receiver to malware
– Word and Excel have embedded Macros
Chapter7 The Web and E-mail
30
Netiquette for e-mail
• Include meaningful subject line
– Don’t reply with previous subject line
• Use upper and lower case letters
– Reserve all upper case for SHOUTING
• Check spelling
• Don’t “reply to all” unless you need that
• Notify recipients of previous emails if you
discover you have a virus.
Chapter7 The Web and E-mail
31
E-mail and privacy
• FBI uses commercial sniffers to monitor
emails of suspected criminals & others?
– USA Patriot Act 2001
– Software scans ALL emails passing through the
ISP, not only the suspected terrorist
• Employer has legal access to your email
– “If legitimate business need exists”
• Your email may be forwarded to others
Chapter7 The Web and E-mail
32
E-mail and privacy (cont.)
• It is easy for you to accidentally send an
email to the wrong address. Would an ISP
do that? I doubt it.
• ISPs store emails even if they have been
deleted.
• Schools store student email messages
• “Think of your email as a postcard.”
Chapter7 The Web and E-mail
33
Phishing and Pharming
• Phishing is a scam that uses fraudulent links
in emails or web sites that ask for
confidential information
– Web site typically looks legitimate
– Move cursor over link to see real URL
• Check that URL is EXACTLY the same as required
• Pharming poisons the DNS so that URL is
translated to a fake IP address – the URL
displayed will be correct; site will not be
Chapter7 The Web and E-mail
34
Packet sniffer & Spam
• Assume no encryption is used, a sniffer can
“see” and translate all traffic
– Steal credit card numbers
• Spam is unsolicited electronic mail – may
include requests for email passwords, etc.
• Security protocols to encrypt your data
include SSL, TLS, HTTPS
• Spam filters to block spam
Chapter7 The Web and E-mail
35