Module 3 Powerpoint slides - Initial Set Up

Download Report

Transcript Module 3 Powerpoint slides - Initial Set Up

Module 3
Internet
Search Engines

Need for them?



Millions of pages on web.
How to locate the most relevant?
Allow you to enter a search request (what
you want to look for) and return a list of
webpages matching your query
Anatomy of a Search Engine

Spider (a.k.a robots, webbots)



Traverses the web and stores the contents of all searchable web
pages.
(this happens *before* you ever enter your query)
Websites can decide to deny access to some resources
• Using a robots.txt file eg. http://www.usask.ca/robots.txt

Indexing Software



Indexes the web pages into searchable database
(this happens *before* you ever enter your query)
Query Interface


Allows users to enter keywords and other combinations.
Searches are performed within the indexed database
Different Search Engines

www.yahoo.com

Directory listing organised into various categories
• Yellow pages in our phone book.
• All page are hand linked


www.altavista.com



“Yet Another Hierarchical Officious Oracle”
“a view from above”
First truly huge collection of indexed database of web
pages
www.google.com


“googol”: 1 followed by 100 zeros
Top search engine today - over 100 million queries a
day.
Why Google?

Results ranked by relevance

Important because a typical user rarely goes beyond
the first page

How is relevance measured?



# of links that point to the same page.
Not just # of times a keyword is repeated.
Careful here: If enough people say a lie to be true,
it becomes the truth.
• Googlebomb: “talentless hack”

Googlewhack: ‘the search for the one’!
• Eg. dogmatism unicyclist
Effective Searching

Select the right keywords
• Saves time and frustration

AND OR NOT




AND: combines two keywords
• specifies that both keywords should be found on the resulting
web page
OR: combines two keywords
• Specifies one or both keywords to be found on the web page
NOT: operates on a single keyword
• Ensures that this keyword should not be found in any page
returned.
Examples:
• vacation london OR paris
• plane AND geometric NOT air
Effective Searching

+/- signs




+ indicates a keyword must be present in the result
- indicates a keyword must not be present
The signs are usually stuck to the keyword
Example: +bass +fish –music
• star wars episode +1

Quotation marks “ ”



Groups a set of keywords so that the resulting page
should have these in the exact same order
Can be used in combination with other methods
Examples: “star wars episode 1”
• “to kill a mockingbird” -movie
Networking and
Telecommunication
Linking Up: Network Basics
 A computer
network is any system of two
or more computers that are linked
together.
 Advantages


to networks:
People share computer hardware, thus
reducing costs
People share data and software programs,
thus increasing efficiency and production
The Internet

Internet is a network of networks


Globally connected network that links
various organisations and individuals.
The Web is not the Internet.
• WWW is one particular usage of
internet.
• Email, FTP (File Transfer Protocol) are
other such uses.
Connecting to the Internet
 Bandwidth:
The amount of information
that can be transmitted in a given amount
of time

Impacted by:
• Physical media that make up the network
• Amount of network traffic
• Software protocols of the network
Connecting to the Internet
 Dial up connections
• Modems
 Broadband connections
• DSL Digital Subscriber Line 300Kbps to 1.3Mbps
• Cable Modems 10Mbps.
 Direct connections using
• 1.5Mbps to 45 Mbps
T1 or T3 lines.
Communication á la Modem

A modem is a hardware device
that connects a computer to a
telephone line (for remote
access).



Modulator-demodulator
May be internal on the system
board or external modem sitting in
a box linked to a serial port.
Modem transmission speed is
measured in bits per second
(bps) and generally transmit at
28,000 bps to 56.6K bps
Networks: Near and Far…
Networks Near and Far
Local-area network (LAN)



Computers are linked within a
building or cluster of
buildings.
Each computer and
peripheral is an individual
node on the network.
Nodes are connected by
cables
Wide-Area Networks


WAN is a network that
extends over a long
distance.
Each network site is a node
on the WAN network
Made up of LANs linked by
phone lines, microwave
towers, and communication
satellites.


Data is transmitted over common pathways called
a backbone.
• CANet4
 One of world’s fastest, fiber-optic
Networks

Intranets are self-contained intra-organizational
networks that offer email, newsgroups, file
transfer, Web publishing and other Internet-like
services.


Could include LANs and WANs
Firewalls prevent unauthorized communication
with the outside world and secure sensitive
internal data

Gateways act as the gate keepers, letting some
things through the firewall and stopping others.
Communication Protocols
Communication Software
 Protocol
- set of rules for the exchange of
data between computers

TCP/IP Transmission Control Protocol /
Internet Protocol
• Messages are broken into Packets - 1500 bytes
• Packets are numbered and sent over the network
Communication Software

IP defines the addressing system




128.236.24.161 - “dotted quad”, 0 to 255
Each computer on the internet has an IP address
Every packet includes the source IP, destination IP
and the packet number (7 of 13)
TCP is an end-to-end communication protocol.


packets are reliably transmitted from one computer
to another.
Lost packets are re-transmitted.
Communication Software

Communication software establishes a
protocol that is followed by the computer’s
hardware

Different forms:



Client/server model - one or more computers act as dedicated
servers and all the remaining computers act as clients
• Web server and client browsers
Peer-to-peer model - every computer on the network is both
client and server
• Napster, Gnutella
Many networks are hybrids, using features of the client/server
and peer-to-peer models
Client/Server Model
Server software responds to
client requests by providing
data
Client software sends
requests from the user to the
server
Internet Addresses…
Addressing Computers

Unique IP numbers


Gateways


Need for it? – similar to the house address
Takes care of routing packets in and out of a LAN
Routers

Takes care of routing packets across multiple network
nodes
Internet Addresses

DNS (Domain Name System) translates IP
addresses into a string of names




Address: 128.233.130.63 is www.cs.usask.ca
Address: 216.239.51.101 is www.google.com
Easier to remember strings of alphabets than
numbers.
DNS servers

Arranged in a hierarchy - 4 top level servers in US
Internet Domains
Internet
Top






addresses are classified by Domains
level domains include:
.edu - educational sites
.com - commercial sites
.gov - government sites
.mil - military sites
.org - nonprofit organizations
.ca - Canada
Multiple
computers can be mapped on to the
same domain name

Eg. www.yahoo.com
Web Addresses
Dissecting Web Page address:
Path to
the host
http://
www.vote-smart.org/ help/database.html
Protocol for
Web pages
Resource
Page
Addressing Resources

URL: Uniform Resource Locator

A web address like:
•


A Web server stores webpages and sends pages to client
web browsers on demand.
FTP: File Transfer Protocol



http://www.cs.usask.ca/index.html
ftp://ftp.cs.usask.ca
allows users to download and upload files between
remote servers and their computers
Telnet:


telnet://scrooge.usask.ca
Allows users to login into remote computers.
WWW
World Wide Web not the same
as the internet?
WWW – a definition

The World Wide Web is part of the internet. It
is a collection of multimedia documents created
by organizations and users worldwide.
Documents are linked in a hypertext web that
allows users to explore them with simple mouse
clicks
Surfing the Web

Browser




lets you look at and navigate info on the WWW
Uses HTTP to communicate with web servers
E.g. Netscape Navigator, Internet Explorer,
Mozilla, Opera
HTTP


HyperText Transfer Protocol
A set of rules for exchanging files on the WWW
Cookies
 Cookies are files created on your computer by
a website to store information about you.
 To accept or not ?
 Benefits:
 stores some of the personal information (repeat info)
 allows pages to be customised to your preferences
 Eg. Layouts, advertisements…
 Privacy issues.
 Do you want your browsing patterns to be used by a
company / organisation?
Email
Your computer
Gets
email
Sends to
Bob’s mail
server
Email
Write email
to Bob
Press “Send”
Receives
email
Waits until
Bob logs on,
sends email
Bob’s computer
Your mail server
Bob’s mail server
Other mail servers
Bob logs on
Checks email
Receives
your
message
Email on the Internet

What appears on the
screen depends on the
type of Internet
connection you have
and the mail program
you use.

Popular graphical
email programs
include Eudora,
Outlook and Netscape
Communicator.
Addressing Persons
Examples:
[email protected]
[email protected]
User President whose
mail is stored on the host
whitehouse in the
government domain of
USA
User abc123 at the server
for Computer Science,
University of
Saskatchewan, Canada.
Email Protocols
POP
Online/offline access
Downloaded
locally
IMAP
Remain on server
Header
Queued on server,
downloaded,
Receiving messages transferred in
choose messages
same order
to transfer
Sending messages
Use SMPT –
different protocol
Done through
IMAP
Size of mailbox
Limited only by
local HD
Determined by
server
Asynchronous Communication
 Mailing


lists
allow you to participate in email discussion
groups on special-interest topics.
emails are sent to the whole group
 Message


boards and Newsgroups
public discussion on a particular subject
consisting of notes written to a central Internet
site
I-help is a message board
Real-Time Communication

Instant Messaging


Chat Rooms


for conversing with multiple people in real-time
Internet telephony


exchanging instant messages with online friends and
co-workers
Used for local and for long-distance toll-free
telephone service
Videoconferencing

for remote face-to-face meetings
Netiquette, Viruses, and
Internet Issues
Email Netiquette

Never say in email something you wouldn’t want quoted
in the news

Cool down, don’t send spur-of-the-moment messages you’ll
soon regret

Give context with Subject lines

Use > to quote messages in responses

Forward messages sparingly



Do not Spam
Actually violation of copyright law
Proof-read!
Email Netiquette

URLs:




Place URLs in < > if they are long
Do not place punctuation marks immediately after a
URL
www.cs.usask.ca.
Attachments:



Send sparingly
Consider whether recipient will be able to read
Don’t overload inboxes
Netiquette


Learn the non-verbal language of the Net :)
Emotions

Surround words by * to *emphasize*

Use CAPITAL letters sparingly
• It is considered SHOUTING and RUDE to write in ALL CAPS


Emoticons
• :) :-)
:(
;-)
:)-
:P
Acronyms
• BTW, LOL, FYI, ROTFL, IMHO, <g>, TTYL
Netiquette - forwards





“forward this to 100 people and Bill Gates will give you $1000”
“Deodorant causes breast cancer”
“This kid is dying – forward to everyone you know as her last wish”
“Sign this petition to save the people of so-and-so”
“Internet cleaning day – please unplug your computer”
THESE ARE HOAXES –
DO NOT FORWARD OR BELIEVE!!!!
If you are still unsure, CHECK FIRST!
http://hoaxbusters.ciac.org
http://www.symantec.com/avcenter/hoax.html
http://www.urbanlegends.com/
Online Community terms…

Lurking:


Spamming:


A program running in the background that detects spam and
deletes it.
Newbie:


Posting a message numerous times, taking up room and
annoying other users
Cancelbot:


Reading, “listening” to the conversation without taking part
Someone who is new to being online (or email/internet/etc)
Flaming:

An angry, nasty, typically vulgar response to someone
Viruses

Viruses


programs that could damage your data and hinder a computer’s
normal functioning.
Can:
• Activate itself : executable files, boot sector, macros
• Replicate itself: through e-mail attachments
• Do “something”: destroy contents

Trojan horses


Worms


malicious programs disguised as useful software.
travel across the network and replicate themselves
Antivirus and Security programs check for known viruses
and protects against attacks
Internet Issues: Ethical and
Political Dilemmas

Freedom of Speech?



Privacy


Should there be limits on what info can be gathered about you
online?
Digital divide


Should what is said on the internet be subject to any laws?
Parental controls?
tech haves vs. have-nots
Intellectual Property:


How do copyright laws apply to online content? Across
international boundaries?
Does the end-use of the copied material make a difference?
Corresponding Textbook Readings
14 – 18
 P. 36 – 38
 P. 250 – 256
 P. 265 – 277
 P. 288 – 295
 P. 299 – 300
 P. 316 – 318
 P.
To Know – Module 3
 Keywords
 How
does the internet and web work?
 How do search engines work?
 How does email work?
 Netiquette
 How do viruses work?
 What are some ethical/political/social
issues with respect to the internet?