Transcript Document
Module 3 - Internet
Search Engines
Search engine anatomy
Different search engines
Effective searching techniques
Search Engines
Need for it?
Multitude of web pages exist on the web.
How to locate the most relevant to your needs?
Anatomy of a Search Engine
Spider a.k.a robots, webbots
A program that traverses the web and stores the contents of
all searchable web pages.
Web sites can deny access to some resources.
Using a robots.txt file eg. Try http://www.usask.ca/robots.txt
User-agent: *
Disallow: /testing
Anatomy…
Spider…
Indexing Software
Indexes the web pages into a easily searchable
database collection
Interface for queries
Allows users to enter keywords and other combinations.
Searches are performed within the indexed database
Different Search Engines
www.yahoo.com
Directory listing organised into various categories
“Yet Another Hierarchical Officious Oracle”
Gulliver’s travels- ‘yahoo’
www.altavista.com
Yellow pages in our phone book.
All page are hand linked
“a view from above”
First truly huge collection of indexed database of web pages
www.google.com
“googol”: 1 followed by 100 zeros
Top search engine today - over 100 million queries a day.
Why Google?
Relevant results are ranked at the top (first) page of a
query.
Why relevance is important?
Typical user rarely goes beyond the first page
How is relevance measured?
Number of links that point to the same page.
Not just by the number of times a keyword is repeated.
Careful here: If enough people say a lie to be true, it
becomes the truth. - Goebbelsian Lies
Googlebomb: “talentless hack”
Googlewhack: ‘the search for the one’!!
Eg. ceremonial overstuffing
Effective Searching
Composing the right keywords in the query
Saves time and frustration
AND OR NOT
AND: combines two keywords
specifies that both keywords should be found on the
resulting web page
OR: combines two keywords
Specifies one or both keywords to be found on the web
page
NOT: operates on a single keyword
Ensures that this keyword should not be found in any page
returned.
Examples: vacation london OR paris
bass AND fishing NOT music
Effective Searching..
+/- signs
+ indicates a keyword must be present in the result
- indicates a keyword must not be present
The signs are usually stuck to the keyword
Example: +bass +fish –music
star wars episode +1
Quotation marks “ ”
Groups a set of keywords and the resulting page should
have these in the exact same order
Can be used in combination with other methods
Examples: “star wars episode 1”
“to kill a mocking bird” -movie
Networking and
Telecommunication
Topics
Linking Up: Network Basics
Connecting to the Internet
Networks: Near and Far
Communication Software
Linking Up: Network Basics
A computer network is any system of two or
more computers that are linked together.
How do networks impact systems?
People share computer hardware, thus reducing
costs
People share data and software programs, thus
increasing efficiency and production
Linking Up: Network Basics
Internet is a network of networks
Globally connected network that links various
organisations and individuals.
Web is not Internet.
WWW is one particular usage of internet.
Email, FTP (File Transfer Protocol) are other such
uses.
Connecting to the Internet
The amount of information that can be
transmitted in a given amount of time is
defined as the bandwidth
Impacted by:
Physical media that make up the network
Amount of network traffic
Software protocols of the network
Communication á la Modem
A modem is a hardware device
that connects a computer’s serial
port to a telephone line (for remote
access).
Modulator-demodulator
May be internal on the system
board or external modem sitting in
a box linked to a serial port.
Modem transmission speed is
measured in bits per second (bps)
and generally transmit at 28,000
bps to 56.6K bps
Connecting to the Internet
Direct connections using T1 or T3 lines.
Dial up connections
1.5Mpbs to 45 Mbps
Modems
Broadband connections
DSL Digital Subscriber Line 300Kbps to 1.3Mbps
Cable Modems 10Mbps.
Networks: Near and Far……
Networks Near and Far
Local-area network (LAN)
Computers are linked within a
building or cluster of buildings.
Each computer and peripheral
is an individual node on the
network.
Nodes are connected by
cables which may be either
twisted pair (copper wires) or
coaxial cable.
Wide-Area Networks
A network that extends over a
long distance.
Each network site is a node
on the WAN network
Made up of LANs linked by
phone lines, microwave towers,
and communication satellites.
Data is transmitted over common pathways called
a backbone.
CANet3
http://www.canet3.net/stats/CAnet3map/CAnet3map.htm
CANet3: Canadian backbone
Protocols for
Communication……
Communication Software
Protocol - set of rules for the exchange of
data between a terminal and a computer or
between two computers
TCP/IP Transmission Control Protocol / Internet
Protocol
Messages are broken into Packets - 1500 bytes
Packets are numbered and sent over the network
Communication Software
IP defines the addressing system
128.236.24.161 - 4 bytes, 0 to 255
Every packet includes the source IP, destination
IP and the packet number (7 of 13)
TCP is an end-to-end protocol.
packets are reliably transmitted from one
computer to another.
Lost packets are re-transmitted.
Communication Software
Communication software establishes a protocol
that is followed by the computer’s hardware
Different forms:
Client/server model - one or more computers act as
dedicated servers and all the remaining computers act as
clients
Web server and client browsers
Peer-to-peer model - every computer on the network is
both client and server
Napster, Gnutella
Many networks are hybrids, using features of the
client/server and peer-to-peer models
Client/Server Model
Server software responds to
client requests by providing
data
Client software sends requests
from the user to the server
eg. http://www.cs.usask.ca
Internet Addresses…
Internet Addresses
The host is named using DNS (domain name
system), which translates IP addresses into a
string of names.
Address: 128.233.130.63 is www.cs.usask.ca
Address: 216.239.51.101 is www.google.com
Easier to remember strings of alphabets than
numbers.
Internet Domains
Top level domains include:
.edu - educational sites
.com - commercial sites
.gov - government sites
.mil - military sites
.net - network administration sites
.org - nonprofit organizations
.ca - Canada
Addressing Computers
Unique IP numbers
DNS servers
Arranged in a hierarchy - 4 top level servers in US
Multiple computers can be mapped on to the same domain
name
Eg. www.yahoo.com
Gateways
Need for it? – similar to the house address
Takes care of routing packets in and out of a LAN
Routers
Takes care of routing packets across multiple network nodes
Addressing Persons
Examples:
User President whose mail
is stored on the host
[email protected] whitehouse in the
government domain of USA
[email protected]
User abc123 at the server for
Computer Science,
University of Saskatchewan,
Canada.
Internet Email Addresses
An Internet address includes:
[email protected]
username is the person’s “mailbox”
hostname is the name of the host computer and is
followed by one or more domains separated by periods:
– host.subdomain.domain : @mail.usask.ca
– host.domain : @hotmail.com
– host.subdom.subdom.domain : @finance.sk.gov.ca
Web Addresses
Dissecting Web Page address:
Path to the host
http://
www.vote-smart.org/
Protocol for Web
pages
help/database.html
Resource Page
Addressing Resources
URL: Uniform Resource Locator
Web: http://www.cs.usask.ca/index.html
FTP: ftp://ftp.cs.usask.ca
File transfer protocol (FTP) allows users to download
files from remote servers to their computers and to upload
files.
Telnet: telnet://scrooge.usask.ca
A Web server stores Web pages and sends pages to client Web
browsers.
Allows users to login into remote computers.
Other resources like Gopher, NNTP - newsgroups
Cookies
Cookies: what are they?
Are files created on your computer by a website to store
information about you.
To accept or not ?
Benefits:
stores some of the personal information (repeat info)
allows pages to be customised to your preferences
Eg. Layouts, advertisements…
Privacy issues.
Do you want your browsing patterns to be used by a
company/organisation?
Email, Viruses and
Internet Issues
Topics
E-mail: Access Protocols
Other Internet Applications: Chat,
Newsgroups
Netiquette: some tips
Intranets and Extranets
Viruses
Internet: Ethical and Political issues
Email on the Internet
Email formats include:
ASCII text--can be viewed by any mail client program
HTML--displays text formatting, pictures, and links to
Web pages
SMTP – Simple Mail Transfer Protocol
Asynchronous communication form
UUCP – Unix to Unix Copy
Email on the Internet
What appears on the
screen depends on the
type of Internet
connection you have and
the mail program you
use.
Popular graphical email
programs include
Eudora, Outlook and
Netscape Communicator.
Email on the Internet
IMAP Vs POP:
Internet Message Access Protocol Vs Post Office
Protocol
Messages remain on the email server Vs
messages are downloaded to your computer and
deleted in the mail server.
Online Vs Offline access.
Retrieve messages in any order Vs “in-order” retrieval
Limit set by your e-mail server Vs number of
messages is limited by your hard-disk size.
Mailing Lists & Network News
Mailing lists allow you to participate in email
discussion groups on special-interest topics.
E-mails are sent to the whole group
A newsgroup is a public discussion on a particular
subject consisting of notes written to a central Internet
site and redistributed through a worldwide newsgroup
network called Usenet.
Protocol used NNTP – Network News Transport Protocol
I-HELP is a similar application. - More like a message board.
Could be local interest too: usask.forsale
Real-Time Communication
Users are logged in at the same time.
Instant Messaging for exchanging instant
messages with on-line friends and co-workers
Chat Rooms for conversing with multiple people
in real-time
Internet telephony (IP telephony) for longdistance toll-free telephone service
Videoconferencing for two-way meetings
Rules of Thumb: Netiquette
Say what you mean and say it with care.
Keep it short and to the point.
Proof-read your messages.
Learn the “nonverbal” language of the Net. :)
Keep your cool.
Don’t be a source of spam (Internet junk mail).
Lurk before you leap.
Check your FAQs (Frequently Asked Questions)
Give something back.
Intranets and Extranets
Intranets are self-contained intraorganizational networks that offer email,
newsgroups, file transfer, Web publishing
and other Internet-like services.
Firewalls prevent unauthorized communication
and secure sensitive internal data
Gateways where the firewalls exist, act as the
gate keeper.
Intranets and Extranets
Extranets are private TCP/IP networks
designed for outside use by customers,
clients and business partners of the
organisation.
Electronic data interchange - EDI - a set of
specifications for ordering, billing, and paying for
parts and services over private networks
Viruses
Viruses are programs that could damage your data
and hinder a computer’s normal functioning.
Activate itself : executable files, boot sector, macros
Replicate itself: through e-mail attachments
Do “something”: destroy contents
Trojan horses are malicious programs disguised as
useful software.
Worms are programs that could travel across the
network and replicate themselves.
Anti-Virus programs check for known viruses
Strains are identified by “unique” strings and their actions.
Internet Issues: Ethical and
Political Dilemmas
Copyright Laws: how do they apply for online
content?
Filtering software to combat inappropriate content
Especially across international boundaries.
Parental controls.
Digital cash to make on-line transactions easier and
safer
Encryption software to prevent credit card theft
Digital signatures to prevent email forgery
Digital divide: computer haves from have-nots.
Next Class
HTML
This text coded as HTML ..
<H1>Welcome to Computer Confluence</H1>
<b>Publishing on the Web</b>
Appears like this on
the screen …