Origins of the Internet
Download
Report
Transcript Origins of the Internet
Introduction to Computer
Networks
2004, 劉震昌
Review of Lab#2 and
Homework#1
“Lab” means “Laboratory”, not “Label”.
Algorithm steps must be executed in turn.
You can not skip any step on your own
decision.
Why?
Please write your homework subject
correctly
No delay for homework
Outline
Origins of the Internet 網際網路的發源
Origins of the WWW (World Wide Web)
HTML (Hypertext Markup Language 超文件標示
語言?) guide
Searching the Web
Search engine (Web browser 網路瀏覽器)
Web directories
Origins of the Internet
Ref: Chap.2 on Comer’s book
Origins of the Internet
In 1969, US DoD’s ARPA(Advanced Research
Projects Agency) built the ARPANET
Only 4 nodes
De-centralized system
Data transmission
參考網站
Origins of the Internet (cont.)
1974, TCP/IP was developed and later
became a standard in 1983
TCP(Transmission Control Protocol)
IP(Internet Protocol)
網路通訊協定的重要性
Growth of ARPANET --> Internet
Internetworking
No organization owns or controls it
no. of computers
Growth of the Internet
1M = 1,000,000
計量單位
http://www.spes.tpc.edu.tw/handouts/B_Basic/ref.htm
log scale
Almost exponential growth
指數成長
Recently ignited by WWW and
economical activities
IP Service
Where is your computer on Internet ?
Current internet (IPv4)
32 bits to represent an IP address
Ex. 163.22.20.129
What is your computer’s IP address? ipconfig
163.22.20.129
163.22.22.119
163.22.20.118
Address Resolution Protocol
(ARP)
IP protocol address is an abstraction;
physical network hardware does not know
how to locate the computer from IP address
Techniques
table look-up
closed form computation
message exchange
Computers on the Net
Every Internet host has a unique IP address,
however, it is hard to remember. So we have
host name
e.g., arbor.watson.ibm.com is 9.2.13.20 and
arbor.ee.ntu.edu.tw is 140.112.21.236
Try: nslookup
Domain Name Server 網域名稱伺
服器
Host name is to be converted into IP address
Domain Name Servers (DNS)
containing a database (look-up table) for host
name to IP address mapping
there are many domain name servers
“.com”, “.gov”, “.edu”, “.tw”
Lab#3
Use the commands
ipconfig
nslookup
Internet application
telnet: A terminal emulation program for
TCP/IP networks such as the Internet
telnet
163.22.22.119
ftp (file transfer protocol)
163.22.22.119
(Run telnet server)
Origins of WWW
Ref: Chap. 32 on Comer’s book
Outline
Origins of WWW(World Wide Web)
Web browser
HTML(Hyper-Text Markup Language)
HTTP(Hyper-Text Transfer Protocol)
Origins of WWW
World Wide Web(WWW)
Proposed in 1989, by Tim Berners-Lee at
CERN(European Particle Research Center)
A large-scale, online repository of information
Develops interoperable technologies
(specifications, guidelines, software, and tools)
Currently, there is a W3C (WWW
consortium) doing these things
Origins of WWW (cont.)
Data format: HTML (HyperText Markup
Language)
Allow hypertext link (URL: Universal Resource
Locator) to other documents on Web
Protocol://computer_name:port/document_name
Protocol: HTTP (HyperText Transfer
Protocol)
Data exchange standard on Web
資料交換的共通格式與傳輸協定
Origins of WWW (cont.)
URLs
WWW
Internet
就像一個大的資料庫
分佈在Internet上
Web browser
tools to read HTML document
client
Web browser
click a link
display
server
Web server(ex. 跑IIS)
send request
find document
return HTML document
Connection terminated after receiving all items
Web browser (cont.)
Text mode browser: lynx
lynx http://www.csie.ncnu.edu.tw
Graphics mode browser
NCSA(National Center for Supercomputing
Applications) Mosaic by Marc Andreeson
Netscape
IE
Web browser (cont.)
Browser architecture
Document representation
Hypertext: textual information
Hypermedia: additional info., like images and
graphics
HyperXXXX: an abstract idea
A set of documents, and a document can contain
pointers to other documents
Page: a hypermedia document on the Web
Hypertext Markup Language
(HTML)
Markup Language: publishing hypertext in a
less detailed format
HTML
document
display results
may be different
HTML
Text file + tags
Tags: formatting the document
<Tagname>…text…</Tagname>
HTML layout
<HTML>
<HEAD>
<TITLE>
….title of the text….
</TITLE>
</HEAD>
<BODY>
…body of the document…
</BODY>
</HTML>
*良好的縮排便於人類
理解編輯
HTML layout (cont.)
<HTML><HEAD><TITLE>….title of the text….
</TITLE></HEAD><BODY>…body of the document…
</BODY></HTML>
HTML examples
Example1
Example2
Example3: embedding images
Example4: hypertext link(anchor 錨)
<a> ….anything…</a>
Any item can have a hypertext link
Lab#4 in the afternoon
http://www.csie.nctu.edu.tw/~jglee/teacher/content.htm
HTTP documents
See
http://ftp.ics.uci.edu/pub/ietf/http/
HTTP/1.0, RFC 1945, 1996
HTTP/1.1, RFC 2068, 1997
Searching the Web
Ref: Chapter 13 in
“Modern Information Retrieval”
Ricardo Baeza-Yates and Berthier Ribeiro-Neto
Outline
Measuring the Web
Methods for searching the Web
Search engines
Web directories
Searching the Web
WWW starts in 1989
Just the textual data is estimated to be in
the order of one terabyte
Goal: how to efficiently manage, retrieve and
filter information from the Web?
Challenges
Distributed data
High percentage of volatile data 易變資料
40% of the Web changes every month
Large volume
Unstructured and redundant data 重複資料
Data spans over many computers interconnected without
predefined topology
30% of Web pages are (near) duplicates
Heterogeneous data
Different languages
Measuring the Web
URLs
WWW
Web
server
*1998, 3M servers
3百萬
Internet
No. of servers =
1/10 no. of computers on Internet
Measuring the Web (cont.)
1998
5Kb per Web page on average
300M Web pages (3億…)
300M * 5Kb = 1.5 Terabytes
Grow at a rate of 20M pages per month
Growth of the Web
Web
pages
Million
Web
sites
300
200
100
1996
1997
1998
year
Methods for searching the Web
Search engines 搜尋引擎
Index the Web documents as a full-text
database
Alta Vista, Google, …
Web directories 入門網站目錄
Classify selected Web documents by
subject
Yahoo!
Search engines
搜尋引擎
Model the Web as a database
All queries must be answered without
accessing the Web pages
User
queries
database
Search engines (cont.)
AltaVista (www.altavista.com)
20 multi-processor machines
130 Gb of RAM each
Over 500 Gb of disk space each
75% resources on the query engine
The top search engines
Foreign
Google ( www.google.com )
www.yahoo.com
www.altavista.com
Inktomi ( www.inktomi.com )
Statistics on search engines
www.searchenginewatch.com
http://imt.net/~notess/search
Taiwan
Yahoo!/Kimo uses google
Openfind ( www.openfind.com.tw )(中正大學吳昇教授)
Yam ( www.yam.com.tw )
Search engines (cont.)
Centralized crawler-indexer architecture
Query
Engine
Index
database
User
Interface
Indexer
users
Crawler
Web
User Interface
Query interface
Keywords
Boolean operator
Answer interface
Rank the searched pages
Statistics about the term occurrence within the
document
Popularity
Hyperlink information
Query
Engine
Index
database
User
Interface
Indexer
users
Crawler
Web
Crawler
Robots, spiders(蜘蛛), wanderers,
walkers, and knowbots
Inspite of their name, the crawler runs on
a local system and sends requests to
remote Web servers
Method: start with a set of URLs, and
from there extract other URLs
Crawler (cont.)
How the Web is traversed, the index of a
search engine can be thought as analogous
to the stars in a sky
Invalid links in search engines vary from 2% to
9%
The current fastest crawlers are able to
traverse up to 10M Web pages per day
300M/10M = 30 days
Web directories 網站目錄
Classify the Web pages by categories
Directories are hierarchical taxonomies
that classify human knowledge
Yahoo! has close to 1M pages classified
How to classify pages?
Pages has to submitted to the Web
directories
Manually done by few people
Automatic classification is not yet mature
Not every page is classified
Some Web directories
Web directories URL
Yahoo!
LookSmart
Lycos Subjects
eBLAST
NewHoo
Magellan
Netscape
Snap
Web sites(K) Categories
www.yahoo.com
www.looksmart.com
a2z.lycos.com
www.eblast.com
www.newhoo.com
www.mckinley.com
www.netscape.com
www.snap.com
750
300
50
125
100
60
24
23
The power of search engine
I have found a homepage that contains the
solutions to the C textbook!!!
Who find the homepage and sends me email
first will get a bonus point…