How is new knowledge discovered?
Download
Report
Transcript How is new knowledge discovered?
Statistics in WR: Lecture 1
• Key Themes
– Knowledge discovery in hydrology
– Introduction to probability and statistics
– Definition of random variables
• Reading: Helsel and Hirsch, Chapter 1
How is new knowledge discovered?
After completing the Handbook of
Hydrology in 1993, I asked myself the
question: how is new knowledge
discovered in hydrology?
I concluded:
• By deduction from
existing knowledge
• By experiment in a
laboratory
• By observation of the
natural environment
Deduction – Isaac Newton
• Deduction is the classical path
of mathematical physics
– Given a set of axioms
– Then by a logical process
– Derive a new principle or
equation
• In hydrology, the St Venant
equations for open channel
flow and Richard’s equation
for unsaturated flow in soils
were derived in this way.
Three laws of motion and law of gravitation
http://en.wikipedia.org/wiki/Isaac_Newton
(1687)
Experiment – Louis Pasteur
• Experiment is the
classical path of
laboratory science – a
simplified view of the
natural world is
replicated under
controlled conditions
• In hydrology, Darcy’s law
for flow in a porous
medium was found this
way.
Pasteur showed that microorganisms cause disease &
discovered vaccination
Foundations of scientific medicine
http://en.wikipedia.org/wiki/Louis_Pasteur
Observation – Charles Darwin
• Observation – direct
viewing and
characterization of
patterns and phenomena
in the natural
environment
• In hydrology, Horton
discovered stream scaling
laws by interpretation of
stream maps
Published Nov 24, 1859
Most accessible book of great
scientific imagination ever written
Conclusion for Hydrology
• Deduction and
experiment are
important, but
hydrology is primarily an
observational science
• discharge, water quality,
groundwater,
measurement data
collected to support
this.
Great Eras of Synthesis
• Scientific progress
occurs continuously, but
there are great eras of
synthesis – many
developments
happening at once that
fuse into knowledge
and fundamentally
change the science
2020
2000
Hydrology (synthesis of water
observations leads to
knowledge synthesis)
1980
1960
Geology (observations of
seafloor magnetism lead to
plate tectonics)
1940
1920
1900
Physics (relativity, structure of
the atom, quantum
mechanics)
Hydrologic Science
It is as important to represent hydrologic environments precisely with
data as it is to represent hydrologic processes with equations
Physical laws and principles
(Mass, momentum, energy, chemistry)
Hydrologic Process Science
(Equations, simulation models, prediction)
Hydrologic conditions
(Fluxes, flows, concentrations)
Hydrologic Information Science
(Observations, data models, visualization
Hydrologic environment
(Physical earth)
A sea change in computing
Massive Data Sets
Federation, Integration, Collaboration
Evolution of Many-core and
Multicore
Parallelism everywhere
The power of the
Client + Cloud
Access Anywhere, Any Time
There will be more scientific
data generated in the next
five years than in the history of
humankind
What will you do with
100 times more
computing power?
Distributed, loosely-coupled,
applications at scale
across all devices
will be the norm
Slide from Jeff Dozier, UCSB
Emergence of a fourth research
paradigm
1.
Thousand years ago – Experimental Science
–
2.
Description of natural phenomena
Last few hundred years – Theoretical Science
–
3.
Newton’s Laws, Maxwell’s Equations…
Last few decades – Computational Science
–
4.
Simulation of complex phenomena
Today – Data-Intensive Science
–
Scientists overwhelmed with data sets
from many different sources
•
•
•
–
Data captured by instruments
Data generated by simulations
Data generated by sensor networks
eScience is the set of tools and technologies
to support data federation and collaboration
•
•
•
For analysis and data mining
For data visualization and exploration
For scholarly communication and dissemination
(With thanks to Jim Gray)
Slide from Jeff Dozier, UCSB
2
.
4G
c2
a
a
3
a2
Data Cube – What, Where, When
Time, T “When”
A data value
D
Space, L “Where”
Variable, V
“What”
Continuous Space-Time Data Model -NetCDF
Time, T
Coordinate
dimensions
{X}
D
Space, L
Variables, V
Variable dimensions
{Y}
Discrete Space-Time Data Model
Time, TSDateTime
TSValue
Space, FeatureID
Variables, TSTypeID
Hydrologic Statistics
Time Series Analysis
Geostatistics
Multivariate analysis
How do we understand space-time correlation fields
of many variables?
288 USGS sites with flow and Nitrogen data
These sites are ones that were used for the Sparrow
model that continue to be operational to 2008
http://water.usgs.gov/nawqa/sparrow/
Colorado River at Austin, Tx
(08158000)
Mean Annual Flow
Mean Annual Flow, Colorado River at Austin
(1929-2008)
8000
7000
Discharge (cfs)
6000
5000
4000
3000
2000
1000
0
1920
1930
1940
1950
1960
1970
1980
1990
2000
2010
2020
Is there a relation between flow
and water quality?
Mean Annual Flow, Colorado River at
Austin (1929-2008)
6000
4000
2000
0
1920
Colorado River at Austin
1940
1960
1980
2000
3.52020
3
Total Nitrogen (mg/l)
Discharge (cfs)
8000
2.5
Total Nitrogen in water
2
1.5
1
0.5
0
Jun-68
Dec-73
May-79
Nov-84
May-90
Oct-95
Are Annual Flows Correlated?
Correlation of Annual Flows
(Colorado River at Austin)
8000
7000
Last Year's Discharge (cfs)
6000
5000
4000
3000
2000
1000
0
0
1000
2000
3000
4000
5000
This Year's Discharge (cfs)
6000
7000
8000