The Digital Deluge - Systems and Computer Engineering

Download Report

Transcript The Digital Deluge - Systems and Computer Engineering

The Digital Deluge
Lecture 4
Learning in Retirement
David Coll
Professor Emeritus
Department of Systems and Computer
Engineering
Winter 2009
Computer Communications
A parallel system of modern networks
started with what some call
“computer communications”,
more properly referred to as
“packet-switched communications” that
grew out of a solution to requirements of the
US DoD for command and control
messaging networks that could survive
nuclear war.
Bursty Data
• The idea of sending information in
packets is based on the observation that
computer data is “bursty”.
• This made perfectly good sense – at the
time.
• After all, the only time when data had to
be communicated was when you hit the
“ENTER” key on your keyboard and the
contents of the keyboard memory buffer
had to be sent to the mainframe, or when
a screen full of characters had to be sent,
or an email sent.
• So, knowing that data is bursty one can
package each burst of data with the
address of the sender and the address of
the recipient written on the face of the
envelop and give it to the network to deliver.
Circuit Sharing
• The idea of using packets – short blocks of
data together with addressing information –
for communications between computers, in
a day when telephone or data line charges
were high, allowed the line to be shared in a
simple way – first you, then me:
• the Aphonse-Gaston protocol for getting
through a narrow doorway.
The Origins of Internetworking
• As one of those confluences that occur in
technological evolution, there were groups
of researchers with common backgrounds
(MIT, Berkley, Stanford) who had used
computers with serial character
communications
– teletype machines as terminals.
• They had extended the access to their
own computers locally over twisted-pair
wiring and remotely over telephone lines
with rudimentary modems, as a
convenience that led to “time-shared”
computing.
• So, they took to interconnecting their local
networks across the continent to keep in
contact, and – being lazy souls developed the software that enabled their
activities for exchanging email and files.
• On the assumption that computer output
came only in bursts
– when the “Enter” key was depressed
• they invented ways
– that met the DoD requirements to transmit
data in short, intermittent packages –
called packets.
• the idea grew …
• and became a dominant mode of data
communications.
• Having a network that was designed for
“bursty” data created a problem when it
came to transmitting “streaming” data, like
voice, audio or television.
• So, means were developed to effectively
create “circuits” through the packet
networks
• By (for nerds) allocating resources at each
switch to allow unimpeded transmission
through them for every stream.
TCP/IP
• From this beginning, the internetworking
protocols have grown into what is now
known as the TCP/IP* or Internet Protocol
Suite.
• Simply referred to as “IP”
• The governing body for this suite of
protocols is the IETF – the Internet
Engineering Taskforce.
• The effort was funded, and enabled by the
US DoD through ARPA.
• The packet-switching concept grew into a
suite of protocols, called the Department of
Defense Protocol Standards, developed as
part of the ARPANET.
• *TCP/IP: Transmission Control Protocol/Internet Protocol
DoD Requirements
• The DoD requirements originated with
ARPANET and DDN to meet DOD
requirements
–
–
–
–
–
–
Survivability
availability
security
network interoperability
handle surge traffic
support priority traffic
The Internet
• The Internet, or more correctly,
• a digital packet-switched network of
networks using the TCP/IP set of
protocols,
• is
– A collection of hosts (computers and
other information devices)
– connected by a variety of digital
communications networks, consisting
of
• digital communication links and
• switches, which are called routers.
An Internet
Protocols
• The transmission of Information on the
Internet as binary information, 1’s and 0’s,
follows rules specified by standardized
procedures called protocols.
The TCP/IP Suite of Protocols
• Originated with ARPANET and DDN to
meet US DoD requirements
– Survivability
• no central point of failure security
– Network interoperability
• accommodate heterogeneous networks and
equipment
– Ability to handle surge traffic
– Allow priority
– Be always available
Sending Information on the Internet
• Every entity (computer or router) on
the Internet has an address.
• The address is a 32-bit binary number
– 10000110010101011111111011100011
• The IP address is usually expressed
in dotted decimal form
– 134.117.254.227
• Part of the address is the NetID and
the remainder the HostID
Packets
• The message to be sent is packaged into
packets
• A packet is a like an envelope that holds a
block of data
• Each packet includes
– The address of the sender
– The address of the destination
• If the message is short
– it is sent in one packet
• If not
– it is fragmented and
– sent in a secession of packets.
Routing (Switching)
• Packets are sent independently from router
to router
• Each router determines the next router to
be used by consulting a local directory
called a routing table
Routing
• OK, so how do routers know the routes to
support this internetworking idea?
• Each router in the network has software
that asks every adjacent router to provide
routing information about all the routers that
it is connected to
• This takes place – continuously throughout the entire internetwork!
• This is one of those ideas that is
• Great in concept,
• Obviously impractical in practice
• BUT the method used in the Interent
As the Post Office Does
• The first station (router) knows which
circuit to put your letter (packet) on to
get it on its way to the local
distribution center (switch) which
knows witch circuits go to Montreal
and points East and North, and which
go to Toronto, and points West and
South.
• A similar situation is true of every
switch the packet goes through.
When it gets to its final destination,
the local host gets it to you.
Recent Status
• The telecommunication industry and its
clients – the world’s telephone companies –
using circuit-emulating, TDM digital
networks, and
• Computer operators using best effort,
TCP/IP packet switched routers to
interconnect networks, mostly consisting of
Ethernet-based local area networks.
More Recent Status
• Recently (which means really recently) we
have the computer communications
community, i.e., the Internet providers
• moving very quickly
• to introduce new technology (equipment and
methodologies) that improves the quality of
service in ‘best effort’ TCP/IP networks
• so that they can carry the output of
continuous sources such as voice and video.
• This is done by creating the equivalent
of end-to-end circuits in the IP
networks.
Convergence
The world’s networks are converging
towards the use of an ever-expanding
TCP/IP suite of protocols with new
mechanisms to provide circuit-like
connections that are capable of
carrying streaming information such
as voice and television to subscribers.
Convergence (again)
• Convergence used to be a future for
communications.
• Today it is the present.
• We are moving to a single portal
– For all services
• radio, TV, telephone, email, file transfer, www
access, space management, personal
command and control, etc.
– Offered by a single network
• wired or wireless or both.
• The improvement in quality has
been so rapid in fact that the world’s
telecommunications companies
(carriers) and cable TV companies
have deployed packet switches in
their core networks and use IP
LAN* technologies (Ethernet) in
their wired and wireless local loops.
• The introduction of IP technology
into the carrier networks has also
been driven by the flexibility and
power it affords.
*Local Area Networks
THE INTERNET
TCP/IP
PROTOCOLS
Domain
Name
Service
DNS
PACKET-SWITCHED
NETWORKS
email: Simple Mail Transport Protocol
(SMTP)
File Transfer Protocol (FTP)
telnet: remote terminal access
Internet Control Message (ICMP) Protocol
The World Wide Web
• The World Wide Web is a system of
inter-linked, hypertext documents
accessed via the Internet.
• With a web browser, a user can view
web pages that may contain text, images,
videos, and other multimedia and
navigates between them using
hyperlinks.
• The World Wide Web was created in 1989
by Berners-Lee and Walker from the UK,
and R. Cailliau from Belgium, working at
CERN in Geneva, Switzerland.
The WWW – according to Webopedia
• A system of Internet servers that
support specially formatted documents.
• The documents are formatted in a
markup language called HTML
(Hypertext Markup Language) that
supports links to other documents, as
well as graphics, audio, and video files.
• This means you can jump from one
place in a document to another, or to
another document, simply by clicking
on activated spots in the document.
The WWW and The Internet
• The World Wide Web, often just
called “The Web”, and The
Internet are not the same thing.
• In fact, an Internet and “The
Internet” are not the same thing.
• There are public and private
Internets.
As Webopedia states it:
• “The World Wide Web, or simply Web, is a
way of accessing information over the
medium of the Internet.
• It is an information-sharing model that is
built on top of the Internet. .
Internet Services
• The Web is just one of the ways that
information can be disseminated over the
Internet.
• The Web is only one of many services that
use the communications network of the
public Internet.
• Others include email, instant
messaging,Voice Over IP (VoIP), file
transfer, Video-on-Demand, RSS feeds
…and so on
THE WORLD
WIDE WEB
HYPERTEXT DOCUMENTS
HTML
HTTP
THE INTERNET
TCP/IP
PROTOCOLS
PACKET-SWITCHED
NETWORKS
DNS
Again, and moving on …
• There are applications* called web
browsers that make it easy to find
information on the World Wide Web,
and to display it.
• Two of the most popular browsers are
Microsoft's Internet Explorer and
Mozilla Firefox.
• Others are
– Safari,
– Opera, and
– Netscape
* “application” is another word for
Web Browser
• Web browsers allow a user to quickly and
easily access information provided on many
Web pages at many websites by traversing
these links.
• Web browsers format HTML information for
display, so the appearance of a Web page
may differ between browsers.
Market Share - Browsers
September 23, 2008
Data creation outstrips storage for first time: IDC -It's not
all doom and gloom, however
By Jon Brodkin, Framingham | Monday, 24 March, 2008
• “Digital information is being created at a
faster pace than previously thought, and for
the first time, the amount of digital
information created each year has exceeded
the world's available storage space,
according to a report from analyst firm IDC.
• "This is our first time ... where we couldn't
store all the information we create even if
we wanted to," states the EMC-sponsored
report, titled The Diverse and Exploding
Digital Universe.
• “The amount of information created,
captured and replicated in 2007 was 281
exabytes (or 281 billion gigabytes), 10%
more than IDC previously believed — and
more than the 264 exabytes of the
estimated available storage on hard drives,
tapes, CDs, DVDs and memory.
• “IDC revised its estimate upward after
realising it had underestimated shipments
of cameras and digital TVs, as well as the
amount of information replication.
• “The 2007 total is well above that of 2006,
when 161 exabytes of digital information
was created or replicated.
The world isn't actually running out of
storage space, IDC notes, because a lot of
digital information doesn't need to be
stored.
• “Examples include
– radio and TV broadcasts consumers listen to
and watch but don't record,
– voice call packets that aren't needed when a
call is over, and
– surveillance video that isn't saved.
• “But the gap between available storage and
digital information will only grow, making it
that much harder for vendors and users to
efficiently store information that does need
to be archived.
• “In 2011 there will be nearly 1,800 exabytes
of information created, twice the amount of
available storage, IDC predicts.
• “One long-term experiment planned for the
soon-to-open Large Hadron Collider, the
world's biggest particle acclerator [sic], in
Switzerland by itself will create an amazing
300 exabytes of data per year, IDC says.
• “EMC's president of content management,
Mark Lewis, doesn't think the world will ever
hit the point where the world's available
storage is exceeded by the amount of
information organisations need to store.
• “Organisations and their employees create
about a third of new data, but organisations
are ultimately responsible for maintaining
the security, privacy and reliability of 85% of
all data, according to IDC.
•
• “About 70% of new information is created
when individuals take actions, such as
–
–
–
–
snapping pictures,
making VoIP calls,
uploading content to YouTube and
sending emails.
• “But more than half of the information
related to individuals isn't directly created by
them.
•
Sources
• BioInformatics
• In the last few decades, advances in
molecular biology and the equipment
available for research in this field have
allowed the increasingly rapid
sequencing of large portions of the
genomes of several species.
• In fact, to date, several bacterial
genomes, as well as those of some
simple eukaryotes (e.g., or baker's
yeast) have been sequenced in full.
• The Human Genome Project, designed to
sequence all 24 of the human
chromosomes, is also progressing. Popular
sequence databases, such as GenBank
and EMBL, have been growing at
exponential rates. This deluge of
information has necessitated the careful
storage, organization and indexing of
sequence information. Information science
has been applied to biology to produce the
field called Bioinformatics.
• The simplest tasks used in bioinformatics
concern the creation and maintenance of
databases of biological information. Nucleic
acid sequences (and the protein sequences
derived from them) comprise the majority of
such databases. While the storage and or
ganization of millions of nucleotides is far
from trivial, designing a database and
developing an interface whereby
researchers can both access existing
information and submit new entries is only
the beginning.
• The most pressing tasks in bioinformatics
involve the analysis of sequence
information. Computational Biology is the
name given to this process, and it involves
the following:
• Finding the genes in the DNA sequences of
various organisms
• Developing methods to predict the structure
and/or function of newly discovered proteins
and structural RNA sequences.
• Clustering protein sequences into families
of related sequences and the development
of protein models.
• Aligning similar proteins and generating
phylogenetic trees to examine evolutionary
relationships.
Digital Shadow
• “Rather, the bulk of this digital content is a
person's "digital shadow", information about
individual human beings sitting in
cyberspace.
–
–
–
–
digital surveillance photos,
web search histories,
banking and medical records and
general backup data
• all contribute to someone's digital shadow.
• “Here's a quick look from IDC at how a few
businesses and industries contribute to
growing data volumes
• Wal-Mart refreshes its customer databases
hourly, adding a billion new rows of data
each hour to a data warehouse that already
holds 600 terabytes.
• The oil and gas industry is developing a
"digital oilfield" to monitor exploration
activity. Chevron's system accumulates 2
terabytes of new data each day.
• -- The utility industry may develop an
"intelligent grid" with millions of sensors in
the distribution system and power meters.
-- Manufacturing companies are rapidly
deploying digital surveillance cameras and
RFID tracking.
-- YouTube's 100 million users create nearly
as much digital information as all medical
imaging operations.