Transcript Document

Module 3 - Internet
Search Engines
Search engine anatomy
Different search engines
Effective searching techniques
Search Engines

Need for it?



Multitude of web pages exist on the web.
How to locate the most relevant to your needs?
Anatomy of a Search Engine

Spider a.k.a robots, webbots
 A program that traverses the web and stores the contents of
all searchable web pages.
 Web sites can deny access to some resources.
 Using a robots.txt file eg. Try http://www.usask.ca/robots.txt


User-agent: *
Disallow: /testing
Anatomy…


Spider…
Indexing Software


Indexes the web pages into a easily searchable
database collection
Interface for queries


Allows users to enter keywords and other combinations.
Searches are performed within the indexed database
Different Search Engines

www.yahoo.com

Directory listing organised into various categories





“Yet Another Hierarchical Officious Oracle”
Gulliver’s travels- ‘yahoo’
www.altavista.com



Yellow pages in our phone book.
All page are hand linked
“a view from above”
First truly huge collection of indexed database of web pages
www.google.com


“googol”: 1 followed by 100 zeros
Top search engine today - over 100 million queries a day.
Why Google?

Relevant results are ranked at the top (first) page of a
query.




Why relevance is important?
 Typical user rarely goes beyond the first page
How is relevance measured?
 Number of links that point to the same page.
 Not just by the number of times a keyword is repeated.
 Careful here: If enough people say a lie to be true, it
becomes the truth. - Goebbelsian Lies
Googlebomb: “talentless hack”
Googlewhack: ‘the search for the one’!!
 Eg. ceremonial overstuffing
Effective Searching

Composing the right keywords in the query


Saves time and frustration
AND OR NOT




AND: combines two keywords
 specifies that both keywords should be found on the
resulting web page
OR: combines two keywords
 Specifies one or both keywords to be found on the web
page
NOT: operates on a single keyword
 Ensures that this keyword should not be found in any page
returned.
Examples: vacation london OR paris
 bass AND fishing NOT music
Effective Searching..

+/- signs





+ indicates a keyword must be present in the result
- indicates a keyword must not be present
The signs are usually stuck to the keyword
Example: +bass +fish –music
 star wars episode +1
Quotation marks “ ”



Groups a set of keywords and the resulting page should
have these in the exact same order
Can be used in combination with other methods
Examples: “star wars episode 1”
 “to kill a mocking bird” -movie
Networking and
Telecommunication
Topics




Linking Up: Network Basics
Connecting to the Internet
Networks: Near and Far
Communication Software
Linking Up: Network Basics


A computer network is any system of two or
more computers that are linked together.
How do networks impact systems?


People share computer hardware, thus reducing
costs
People share data and software programs, thus
increasing efficiency and production
Linking Up: Network Basics

Internet is a network of networks


Globally connected network that links various
organisations and individuals.
Web is not Internet.
 WWW is one particular usage of internet.
 Email, FTP (File Transfer Protocol) are other such
uses.
Connecting to the Internet

The amount of information that can be
transmitted in a given amount of time is
defined as the bandwidth

Impacted by:
 Physical media that make up the network
 Amount of network traffic
 Software protocols of the network
Communication á la Modem

A modem is a hardware device
that connects a computer’s serial
port to a telephone line (for remote
access).

Modulator-demodulator

May be internal on the system
board or external modem sitting in
a box linked to a serial port.

Modem transmission speed is
measured in bits per second (bps)
and generally transmit at 28,000
bps to 56.6K bps
Connecting to the Internet

Direct connections using T1 or T3 lines.


Dial up connections


1.5Mpbs to 45 Mbps
Modems
Broadband connections


DSL Digital Subscriber Line 300Kbps to 1.3Mbps
Cable Modems 10Mbps.
Networks: Near and Far……
Networks Near and Far
Local-area network (LAN)

Computers are linked within a
building or cluster of buildings.

Each computer and peripheral
is an individual node on the
network.

Nodes are connected by
cables which may be either
twisted pair (copper wires) or
coaxial cable.
Wide-Area Networks


A network that extends over a
long distance.
Each network site is a node
on the WAN network


Made up of LANs linked by
phone lines, microwave towers,
and communication satellites.
Data is transmitted over common pathways called
a backbone.

CANet3
http://www.canet3.net/stats/CAnet3map/CAnet3map.htm
CANet3: Canadian backbone
Protocols for
Communication……
Communication Software

Protocol - set of rules for the exchange of
data between a terminal and a computer or
between two computers

TCP/IP Transmission Control Protocol / Internet
Protocol
 Messages are broken into Packets - 1500 bytes
 Packets are numbered and sent over the network
Communication Software

IP defines the addressing system



128.236.24.161 - 4 bytes, 0 to 255
Every packet includes the source IP, destination
IP and the packet number (7 of 13)
TCP is an end-to-end protocol.


packets are reliably transmitted from one
computer to another.
Lost packets are re-transmitted.
Communication Software


Communication software establishes a protocol
that is followed by the computer’s hardware
Different forms:

Client/server model - one or more computers act as
dedicated servers and all the remaining computers act as
clients
 Web server and client browsers

Peer-to-peer model - every computer on the network is
both client and server
 Napster, Gnutella

Many networks are hybrids, using features of the
client/server and peer-to-peer models
Client/Server Model
Server software responds to
client requests by providing
data
Client software sends requests
from the user to the server
eg. http://www.cs.usask.ca
Internet Addresses…
Internet Addresses

The host is named using DNS (domain name
system), which translates IP addresses into a
string of names.
 Address: 128.233.130.63 is www.cs.usask.ca
 Address: 216.239.51.101 is www.google.com
 Easier to remember strings of alphabets than
numbers.
Internet Domains
Top level domains include:
 .edu - educational sites
 .com - commercial sites
 .gov - government sites
 .mil - military sites
 .net - network administration sites
 .org - nonprofit organizations
 .ca - Canada
Addressing Computers

Unique IP numbers


DNS servers



Arranged in a hierarchy - 4 top level servers in US
Multiple computers can be mapped on to the same domain
name
 Eg. www.yahoo.com
Gateways


Need for it? – similar to the house address
Takes care of routing packets in and out of a LAN
Routers

Takes care of routing packets across multiple network nodes
Addressing Persons
Examples:
User President whose mail
is stored on the host
[email protected] whitehouse in the
government domain of USA
[email protected]
User abc123 at the server for
Computer Science,
University of Saskatchewan,
Canada.
Internet Email Addresses
An Internet address includes:
[email protected]
 username is the person’s “mailbox”
 hostname is the name of the host computer and is
followed by one or more domains separated by periods:
– host.subdomain.domain : @mail.usask.ca
– host.domain : @hotmail.com
– host.subdom.subdom.domain : @finance.sk.gov.ca
Web Addresses
Dissecting Web Page address:
Path to the host
http://
www.vote-smart.org/
Protocol for Web
pages
help/database.html
Resource Page
Addressing Resources


URL: Uniform Resource Locator
Web: http://www.cs.usask.ca/index.html


FTP: ftp://ftp.cs.usask.ca


File transfer protocol (FTP) allows users to download
files from remote servers to their computers and to upload
files.
Telnet: telnet://scrooge.usask.ca


A Web server stores Web pages and sends pages to client Web
browsers.
Allows users to login into remote computers.
Other resources like Gopher, NNTP - newsgroups
Cookies
Cookies: what are they?
Are files created on your computer by a website to store
information about you.
To accept or not ?
Benefits:
stores some of the personal information (repeat info)
allows pages to be customised to your preferences
Eg. Layouts, advertisements…
Privacy issues.
Do you want your browsing patterns to be used by a
company/organisation?
Email, Viruses and
Internet Issues
Topics






E-mail: Access Protocols
Other Internet Applications: Chat,
Newsgroups
Netiquette: some tips
Intranets and Extranets
Viruses
Internet: Ethical and Political issues
Email on the Internet

Email formats include:

ASCII text--can be viewed by any mail client program

HTML--displays text formatting, pictures, and links to
Web pages

SMTP – Simple Mail Transfer Protocol

Asynchronous communication form
UUCP – Unix to Unix Copy

Email on the Internet

What appears on the
screen depends on the
type of Internet
connection you have and
the mail program you
use.

Popular graphical email
programs include
Eudora, Outlook and
Netscape Communicator.
Email on the Internet

IMAP Vs POP:





Internet Message Access Protocol Vs Post Office
Protocol
Messages remain on the email server Vs
messages are downloaded to your computer and
deleted in the mail server.
Online Vs Offline access.
Retrieve messages in any order Vs “in-order” retrieval
Limit set by your e-mail server Vs number of
messages is limited by your hard-disk size.
Mailing Lists & Network News

Mailing lists allow you to participate in email
discussion groups on special-interest topics.


E-mails are sent to the whole group
A newsgroup is a public discussion on a particular
subject consisting of notes written to a central Internet
site and redistributed through a worldwide newsgroup
network called Usenet.



Protocol used NNTP – Network News Transport Protocol
I-HELP is a similar application. - More like a message board.
Could be local interest too: usask.forsale
Real-Time Communication

Users are logged in at the same time.




Instant Messaging for exchanging instant
messages with on-line friends and co-workers
Chat Rooms for conversing with multiple people
in real-time
Internet telephony (IP telephony) for longdistance toll-free telephone service
Videoconferencing for two-way meetings
Rules of Thumb: Netiquette









Say what you mean and say it with care.
Keep it short and to the point.
Proof-read your messages.
Learn the “nonverbal” language of the Net. :)
Keep your cool.
Don’t be a source of spam (Internet junk mail).
Lurk before you leap.
Check your FAQs (Frequently Asked Questions)
Give something back.
Intranets and Extranets

Intranets are self-contained intraorganizational networks that offer email,
newsgroups, file transfer, Web publishing
and other Internet-like services.


Firewalls prevent unauthorized communication
and secure sensitive internal data
Gateways where the firewalls exist, act as the
gate keeper.
Intranets and Extranets

Extranets are private TCP/IP networks
designed for outside use by customers,
clients and business partners of the
organisation.

Electronic data interchange - EDI - a set of
specifications for ordering, billing, and paying for
parts and services over private networks
Viruses

Viruses are programs that could damage your data
and hinder a computer’s normal functioning.






Activate itself : executable files, boot sector, macros
Replicate itself: through e-mail attachments
Do “something”: destroy contents
Trojan horses are malicious programs disguised as
useful software.
Worms are programs that could travel across the
network and replicate themselves.
Anti-Virus programs check for known viruses

Strains are identified by “unique” strings and their actions.
Internet Issues: Ethical and
Political Dilemmas

Copyright Laws: how do they apply for online
content?


Filtering software to combat inappropriate content





Especially across international boundaries.
Parental controls.
Digital cash to make on-line transactions easier and
safer
Encryption software to prevent credit card theft
Digital signatures to prevent email forgery
Digital divide: computer haves from have-nots.
Next Class
HTML
This text coded as HTML ..
<H1>Welcome to Computer Confluence</H1>
<b>Publishing on the Web</b>
Appears like this on
the screen …