Updating Computer Science Education

Download Report

Transcript Updating Computer Science Education

Updating Computer Science
Education
Jacques Cohen
Brandeis University
Waltham, MA
USA
January 2007
Topics




Preliminary remarks
Present state of affairs and
concerns
Objectives of this talk
Trends (hardware, software, networks,
others)


Illustrative examples
Suggestions
Present state of affairs and
concerns


Huge increase in PC and internet
usage.
Decreasing enrollment.
(USA mainly)
Possible Reasons





Previous high school preparation
Bubble burst (2000) + outsourcing
Widespread usage of computers
by lay persons
Interest in interdisciplinary topics
(e.g., biology, business,
economics)
Public perception about:
What is Computer Science?
The Nature of Computer Science

Two main components:
Theoretical
and
Mathematics and
Experimental
Engineering
What characterizes CS is the notion of

Algorithms
Emphasis on the discrete and logic
An interdisciplinary approach with
other sciences may well revive the
interest on the continuous (or use of
qualitative reasoning)
Related fields

Sciences in general (scientific
computing),






Management,
Psychology (human interaction),
Business,
Communications,
Journalism,
Arts, etc.
The role of Computer Science
among other sciences
(How we are perceived by the other sciences)


In physics, chemistry, biology, nature
is the ultimate umpire.
Discovery is paramount
In math and engineering: aesthetics,
ease of use, acceptance,
permanence, play key roles
Uneasy dialogue with
biologists

It is not unusual to hear from a
physicist, chemist or biologist:
“If computer scientists do not get
involved in our field, we will do it
ourselves!!”

It looks very likely that the
biological sciences (including, of
course, neuroscience) will
dominate the 21st century
Differences in approaches


Most scientific and creative discoveries
proceed in a bottom-up manner
Computer scientists are taught to
emphasize top-down approaches

Polya’s “How
to solve it” often mentions
First specialize then generalize.

Hacking is beautiful (mostly bottom-up)
Objectives


Provide a bird’s eye view of what is
happening in CS education (USA) and
attempt to make recommendations
about possible directions. Hopefully,
some of it would be applicable to
European universities.
Premise
Changes ought to be gradual and
depend on resources and time
constraints
First we have to observe current trends
Generality, Storage, Speed, Networks, others.

Trying to make sense of present
directions.
Difficult and risky to foresee future, e.g., PC
(windows, mouse), internet, parallelism


Topics influencing computer science
education.
Trends in hardware, software,
networks.
Huge volume of data
(terabytes and petabytes)



Statistical nature of data
Clustering, classification
Probability and Statistics become
increasingly important
Trend towards generality





Need to know more about what is
going on in related topics
A few examples:
Robotics and mechanical engineering
Hardware, electrical engineering,
material science, nanotechnology
Multi-field visualization (e.g., medicine)
Biophysics and bioinformatics
Nature of data structures





Sequences (strings), streams
Trees, DAGs, and Graphs
3D structures
Emphasis in discrete structures
Neglect of the continuous should
be corrected (e.g., use of MatLab)
Trends on data growth
How Much Information Is There In the World?


The 20-terabyte size of the Library
of Congress derived by assuming that
LC has 20 million books and each
requires 1 MB. Of course, LC has much
other stuff besides printed text, and
this other stuff would take much more
space.
From Lesk
http://www.lesk.com/mlesk/ksg97/ksg.html
Library of Congress data
(cont)
1. Thirteen million photographs, even if
compressed to a 1 MB JPG each, would be 13
terabytes.
2. The 4 million maps in the Geography Division
might scan to 200 TB.
3. LC has over five hundred thousand movies;
at 1 GB each they would be 500 terabytes
(most are not full-length color features).
4. Bulkiest might be the 3.5 million sound
recordings, which at one audio CD each,
would be almost 2,000 TB.
This makes the total size of the Library perhaps
about 3 petabytes (3,000 terabytes).
How Much Information Is There In the World?
Lesk’s Conclusions

There will be enough disk space and
tape storage in the world to store
everything people write, say, perform
or photograph. For writing this is true
already; for the others it is only a year
or two away.
Lesk’s Conclusions (cont)

The challenge for librarians and
computer scientists is to let us find the
information we want in other people's
work; and the challenge for the
lawyers and economists is to arrange
the payment structures so
that we are encouraged to use the
work of others rather than re-create it.
The huge volume of data
implies:





Linearity of algorithms is a must
Emphasis in pattern matching
Increased preprocessing
Different levels of memory transfer
rates
Algorithmic incrementality (avoid redoing
tasks)



Need of approximate algorithms
(optimization)
Distributed computing
Centralized parallelism (Blue Gene, Argonne)
The importance of pattern matching
(searches) in large number of items
Pattern matching has to be “tolerant” (approximate)
Find closest matches (dynamic programming,
optimization)






Sequences
Pictures
3D structures (e.g. proteins)
Sound
Photos
Video
Trends in computer cycles (speed)

Moore’s law appears to be applicable until at least
2020
Use of supercomputers

(2006)
Researchers at Los Alamos National
Laboratory have set a new world's record by
performing the first million-atom computer
simulation in biology. Using the "Q Machine"
supercomputer, Los Alamos computer
scientists have created a molecular
simulation of the cell's protein-making
structure, the ribosome. The project,
simulating 2.64 million atoms in motion,
is more than six times larger than any
biological simulations performed to date.
Graphical visualization of the
simulation of a Ribosome at work
Network transmission speed
(Lambda Rail Net)

USA backbone
Trends in Transmission Speed

The High Energy Physics team's
demonstration achieved a peak
throughput of 151 Gbps and an
official mark of 131.6 Gbps
beating their previous mark for
peak throughput of 101 Gbps by
50 percent.
Trends in Transmission Speed
II

The new record data transfer
speed is also equivalent to
serving 10,000 MPEG2 HDTV
movies simultaneously in real
time, or transmitting all of the
printed content of the Library
of Congress in 10 minutes.
Trend in Languages


Importance of scripting and string
processing
XML, Java C++, Trend towards
Python, Matlab, Mathematica
No ideal languages
No agreement of what the first
language ought to be
A recently proposed
language (Fortress 2006)

From Guy Steel, The Fortress Programming Language, Sun MicroSystemshttp://iic.harvard.edu/documents/steeleLecture2006public.pdf
Fortress Language
(Sun, Guy Steele)
Meta-level approach to
teaching



Learn 2 or 3 languages and assume that
expertise in other languages can be
acquired on the fly.
Hopefully, the same will occur in learning a
topic in depth. Once in-depth research is
taught using a particular area it can be
extrapolated to other areas.
Increasing usage of canned programs or
data banks Typical examples: GraphViz,
WordNet
Trends in Algorithmic
Complexity




Overcoming the scare of NP
problems
(it happened before with
undecidability)
3-SAT lessons
Mapping polynomial problems
within NP
Optimization, approximate or
random algorithms
Three Examples


Example I The lessons of BLAST
(preprocessing, incrementability, approximation)
Example II The importance of analyzing
very large networks.
(probability, sensors, sociological implications)

Example III Time Series.
(data mining, pattern searches, classification)
Example I
(History of BLAST)
sequence alignment

Biologists matched sequences of
nucleotides or aminoacids empirically
using Dot Matrices
Dot matrices
No exact matching
Alignment with Gaps
Dynamic Programming
Approach
Dynamic Programming
2
complexity O(n )
Two solutions with gaps
Complexity can be exponential for
determining all solutions
The BLAST approach
complexity is almost linear
Equivalent Dot Matrices would have the
size
3 billion columns (human genome)
and
Z rows where Z is the size of the
sequence being matched against a
genome (possibly tens of thousands)
BLAST Tricks




Preprocessing
Compile the locations in a genome
containing all possible “seeds”
(combinations of 6 nucleotides or
aminoacids)
Hacking
Follow diagonals as much as possible (Blast
strategy)
Use dynamic programming as a last resort
Lots of approximations but a
very successful outcome






No multiple solutions
BLAST may not find best matches
The notion of p-values becomes very
important (probability of matches in random
sequences)
Tuning of the BLAST algorithm parameters
Mixture of hacking and theory
Advantage: satisfies incrementability
Example II
(Networks and Sociology)
Money travels (bills)
Probabilities
P(time,distance)
Money travels



The entire process could be
implemented using sensors.
Mimics spread of disease.
The impact of computing will go
deeper into the sciences and
spread more into the social
sciences (Jon Kleinberg, 2006)
Example III (Time Series)
Illustrates data mining and how
much CS can help other sciences
Slides from

Dr Eamonn Keogh
University of California. Riverside,CA
Examples of time series
Time Series (cont 1)
Time Series (cont 2)
Time Series (cont 3)
Time Series (cont 4)
Time Series (cont 5)
Using Logic Programming in
Multivariate Time Series (Sleep Apnea)
from G Guimarães and L. Moniz Pereira
Eve nt2
9000
Eve nt3
Eve nt5
8000
Eve nt Ta ce t
7000
No ribca ge a nd a bdomina l
move me nts without s noring
S trong ribca ge a nd a bdomina l
move me nts
Re duce d ribca ge a nd a bdomina l
move me nts without s noring
Ta ce t
6000
5000
4000
No a irflow without s noring
3000
S trong a irflow with s noring
2000
Ta ce t
1000
Airflow
Ribca ge move me nts
04:00:00
04:00:05
04:00:10
04:00:14
04:00:19
04:00:24
04:00:28
04:00:33
04:00:38
04:00:43
04:00:48
04:00:52
04:00:58
04:01:02
04:01:07
04:01:12
04:01:16
04:01:21
04:01:26
04:01:31
04:01:36
04:01:40
04:01:45
04:01:50
04:01:55
04:02:00
04:02:04
04:02:09
04:02:14
04:02:19
04:02:24
04:02:28
04:02:33
04:02:38
04:02:43
04:02:48
04:02:53
04:02:58
04:03:02
04:03:07
04:03:12
04:03:16
04:03:21
04:03:26
04:03:31
04:03:36
04:03:40
04:03:46
04:03:50
04:03:55
04:04:00
0
Abdomina l move me nts
S noring
Back to curricula
recommendations
 Present
status (USA)
and suggested changes
Current recommended
curricula ACM, SIGCSE 2001 (USA)
1. Discrete Structures (43 core hours)
2. Programming Fundamentals (54 core hours)
3. Algorithms and Complexity (31 core hours)
4. Programming Languages (6 core hours)
5. Architecture and Organization (36 core hours)
6. Operating Systems (18 core hours)
7. Net-Centric Computing (15 core hours)
8. Human-Computer Interaction (6 core hours)
9. Graphics and Visual Computing (5 core hours)
10. Intelligent Systems (10 core hours)
11. Information Management (10 core hours)
12. Software Engineering (30 core hours)
13. Social and Professional Issues (16 core hours)
14. Computational Science (no core hours)
From Domik G.: Glimpses into the Future of Computer Science
Education University of Paderhor, Germany
Changing Curricula

Two extremes
Increased Generality and Limited Depth
Limited Generality
and Increased Depth
The two extremes in graphical form
D
Breadth
(
generality)
Depth
The MIT pilot program for
freshmen

At MIT there is a unified EECS
department
Two choices for the first year course:


Robotics using probabilistic
Bayesian approaches (CS)
Study of cell phones inside out (EE)
Concrete suggestions I




Teaching is inextricably linked to research.
Time and resources govern curriculum changes.
Gradual changes are essential.
Avoid overlap of material among different

required courses.
If possible introduce an elective course on

Current trends in computer science.
Deal with massive data even in intro courses.
Concrete suggestions II
When teaching algorithms stress
the potential of:
 Preprocessing
 Incrementality
 Parallelization
 Approximations
 Taking advantage of sparseness
Concrete suggestions III






Emphasize probability and
statistics
Bayesian approaches
Hidden Markov Models
Random algorithms
Clustering and classification
Machine learning and Data Mining
Finally, …

Encourage interdisciplinary
work.
It will inspire new directions in
computer science.
Thank you!!
Future of Computer Intensive
Science in the U.S. (Daniel Reed 2006)


Ten years – a geological epoch on the computing time scale. Looking
back, a decade brought the web and consumer email, digital cameras and
music, broadband networking, multifunction cell phones, WiFi, HDTV,
telematics, multiplayer games, electronic commerce and computational
science.
It also brought spam, phishing, identity theft, software insecurity,
outsourcing and globalization, information warfare and blurred work-life
boundaries. What will a decade of technology advances bring in
communications and collaboration, sensors and knowledge management,
modeling and discovery, electronic commerce and digital entertainment,
critical infrastructure management and security?

What will it mean for research and education?

Daniel A. Reed is the director of the Renaissance Computing Institute. He also is Chancellor's Eminent Professor
and Vice-Chancellor for Information Technology at the University of North Carolina at Chapel Hill.
Cyberinfrastructure and Economic
Curvature Creating Curvature in a Flat
World (Singtae Kim, Purdue, 2006)


Cyberinfrastructure is central to scientific advancement in the
modern, data-intensive research environment. For example, the
recent revolution in the life sciences, including the seminal
achievement of sequencing the human genome on an accelerated
time frame, was made possible by parallel advances in
cyberinfrastructure for research in this data-intensive field.
But beyond the enablement of basic research, cyberinfrastructure is
a driver for global economic growth despite the disruptive 'flattening'
effect of IT in the developed economies. But even at the regional
level, visionary cyber investments to create smart infrastructures will
induce 'economic curvature' a gravitational pull to overcome the
dispersive effects of the 'flat' world and the consequential
acceleration in economic growth.
Miscellaneous I







Claytronics
Game theory (economics - psychology)
Other examples in bioinformatics
Beautiful interaction between sequence
(strings) and structures
Reverse engineering
In biology Geography and Phenotype
(external structural appearance) are of
paramount importance
Systems Biology
Miscellaneous II


Cross word puzzle using Google
Skiena and statistical NLP