PPT - Climate Code Foundation

Download Report

Transcript PPT - Climate Code Foundation

Better Science with Python
Nick Barnes at AMS, 2012-01-24
climatecode.org
Copyright Climate Code Foundation, license CC-BY
1
What is the CCF?
• A UK non-profit founded in 2010;
• “to promote the public understanding of climate science…”
• … through software activities.
•
•
•
•
Continuing projects started in 2008;
A few software consultants, currently unpaid part-time;
Advisory committee of a dozen experts;
A growing network of climate scientists.
Nick Barnes at AMS, 2012-01-24
climatecode.org
2
What is the problem?
Scientists have to write code, but:
• They aren’t well-trained;
• They aren’t properly rewarded;
• There is no incentive to publish it.
So science code looks like the industry 30 years ago:
• No version control or configuration management;
• No issue systems or defect tracking;
• No automated testing or test-driven development.
Critically: code is being written for computers, not people.
Nick Barnes at AMS, 2012-01-24
climatecode.org
3
Clear Climate Code
• Project started in 2008.
• Over-riding goal is clarity: code which interested members
of the public can download, run, read and understand.
• Open-source, of course.
• First target NASA GISTEMP:
• ccc-gistemp.googlecode.com
• 12 KLOC of Fortran (etc).
• became 3678 lines of Python
• (including 1500 of docstrings)
• fixed minor bugs.
• fosters new science:
• one paper out now, more draft.
Nick Barnes at AMS, 2012-01-24
climatecode.org
4
Why clarity?
• Original motivation was to answer critics:
•
•
•
•
Not the real code;
Can’t be run;
Contains “obvious bugs”;
“divinci code written by the shortbus crew.”
• But also a key message of software engineering:
Your target audience is people, not compilers
• Those people are, most often, yourselves.
Nick Barnes at AMS, 2012-01-24
climatecode.org
5
What is clarity?
def step1(record_source):
"""An iterator for step 1. Produces a stream of
`giss_data.Series` instances.
:Param record_source:
An iterable source of `giss_data.Series` instances (which it
will assume are station records).
"""
records = comb_records(record_source)
helena_adjusted = adjust_helena(records)
combined_pieces = comb_pieces(helena_adjusted)
without_strange = drop_strange(combined_pieces)
for record in alter_discont(without_strange):
yield record
Nick Barnes at AMS, 2012-01-24
climatecode.org
6
Clear how?
def step1(record_source):
"""An iterator for step 1. Produces a stream of
`giss_data.Series` instances.
:Param record_source:
An iterable source of `giss_data.Series` instances (which it
will assume are station records).
"""
records = comb_records(record_source)
helena_adjusted = adjust_helena(records)
combined_pieces = comb_pieces(helena_adjusted)
without_strange = drop_strange(combined_pieces)
for record in alter_discont(without_strange):
yield record
Nick Barnes at AMS, 2012-01-24
climatecode.org
7
Clear to whom?
def step1(record_source):
"""An iterator for step 1. Produces a stream of
`giss_data.Series` instances.
:Param record_source:
An iterable source of `giss_data.Series` instances (which it
will assume are station records).
"""
records = comb_records(record_source)
helena_adjusted = adjust_helena(records)
combined_pieces = comb_pieces(helena_adjusted)
without_strange = drop_strange(combined_pieces)
for record in alter_discont(without_strange):
yield record
Nick Barnes at AMS, 2012-01-24
climatecode.org
8
Unclear how?
def step1(record_source):
"""An iterator for step 1. Produces a stream of
`giss_data.Series` instances.
:Param record_source:
An iterable source of `giss_data.Series` instances (which it
will assume are station records).
"""
records = comb_records(record_source)
helena_adjusted = adjust_helena(records)
combined_pieces = comb_pieces(helena_adjusted)
without_strange = drop_strange(combined_pieces)
for record in alter_discont(without_strange):
yield record
Nick Barnes at AMS, 2012-01-24
climatecode.org
9
Unclear how?
for m in range(12):
sum_new = 0.0
# Sum of data in new
sum = 0.0
# Sum of data in average
count = 0
# Number of years where both new and average are valid
for a,n in itertools.izip(average[first_year*12+m: last_year*12: 12],
new[first_year*12+m: last_year*12: 12]):
if invalid(a) or invalid(n):
continue
count += 1
sum += a
sum_new += n
if count < min_overlap:
continue
bias = (sum-sum_new)/count
Nick Barnes at AMS, 2012-01-24
climatecode.org
10
Clarity enables new science
• By promoting “computational thinking” (Wing, NSF),
• Clear code raises new questions…
•
•
•
•
•
Airport-only trends?
Effect of US data?
Effect of restricting to long-record stations?
Use of land data for ocean cells?
Adding more data scraped from met sites?
• …and helps answer them…
• …for both original authors and others.
Nick Barnes at AMS, 2012-01-24
climatecode.org
11
Why Python?
• Syntax:
•
•
•
•
Very small and simple core language;
Clear syntax (compared with Perl, C++, Fortran, etc);
Indentation for blocks (huge win although often derided);
No type declarations or decorations;
• Semantics:
•
•
•
•
Garbage collection: no code for memory management;
First-class functions.
“Duck-typing” for maximum code flexibility and re-use;
A simple object system;
• Library (“batteries included”):
•
•
•
•
A huge amount of useful functionality;
Kept out of the way of the core language: explicit import;
Great documentation;
One great way to do it (not TMTOWTDI).
Nick Barnes at AMS, 2012-01-24
climatecode.org
12
Wait, there’s more:
• Open-source:
•
•
•
•
•
•
•
•
Zero cost;
No licensing trap, for you or your audience;
Future-proof.
“Interpreted” (i.e. has a really good REPL);
Long-lived and stable;
Very portable (and easy to install);
Easy interfaces to other languages and systems;
Terrific eco-system;
• A BDFL who is right much more often than he is wrong;
• And probably more.
Nick Barnes at AMS, 2012-01-24
climatecode.org
13
So: Why not Python?
• Performance;
• Concurrency;
• Many things not in the library (and may never be);
•
… so there’s more than one way to do it!
• Package management (TMTOWTDI!);
• Some unpleasant corners (e.g. @decorators, **kwargs,
old-style classes);
• 2 vs 3;
• Stability not as good as traditional languages;
• Language direction: (e.g. lambda deprecated!).
Nick Barnes at AMS, 2012-01-24
climatecode.org
14
So: Why not Python?
• Performance;
• Concurrency;
• Many things not in the library (and may never be);
•
… so there’s more than one way to do it!
• Package management (TMTOWTDI!); Use a distribution?
• Some unpleasant corners (e.g. @decorators, **kwargs,
old-style classes);
• 2 vs 3;
• Stability not as good as traditional languages;
• Language direction: (e.g. lambda deprecated!).
Nick Barnes at AMS, 2012-01-24
climatecode.org
15
So: Why not Python?
• Performance;
• Concurrency;
• Many things not in the library (and may never be);
•
… so there’s more than one way to do it!
• Package management (TMTOWTDI!); Use a distribution?
• Some unpleasant corners (e.g. @decorators, **kwargs,
old-style classes);
Of Python 3?
• 2 vs 3;
• Stability not as good as traditional languages;
• Language direction: (e.g. lambda deprecated!).
Nick Barnes at AMS, 2012-01-24
climatecode.org
16
So: Why not Python?
• Performance;
• Concurrency;
• Many things not in the library (and may never be);
•
… so there’s more than one way to do it!
• Package management (TMTOWTDI!); Use a distribution?
• Some unpleasant corners (e.g. @decorators, **kwargs,
old-style classes);
Of Python 3?
• 2 vs 3;
• Stability not as good as traditional languages; Committed to
• Language direction: (e.g. lambda deprecated!). Compatibility.
Nick Barnes at AMS, 2012-01-24
climatecode.org
17
So: Why not Python?
• Performance;
• Concurrency;
With a new implementation?
• Many things not in the library (and may never be);
•
… so there’s more than one way to do it!
• Package management (TMTOWTDI!); Use a distribution?
• Some unpleasant corners (e.g. @decorators, **kwargs,
old-style classes);
Of Python 3?
• 2 vs 3;
• Stability not as good as traditional languages; Committed to
• Language direction: (e.g. lambda deprecated!). Compatibility.
Nick Barnes at AMS, 2012-01-24
climatecode.org
18
A great language is just the start
• Vital software development skills and tools:
•
•
•
•
•
•
•
•
•
Version control;
Defect tracking;
Code inspection;
Automated testing;
Automated building;
Bundling and delivery;
Documentation;
Team-work;
Publication.
• Many free integrated suites of tools, online and offline.
• Beware: “You can write FORTRAN in any language.”
Nick Barnes at AMS, 2012-01-24
climatecode.org
19
Google Summer of Code
• Google pays students to write code ($5000 for 3 months);
• Any open-source project;
• Our 2011 projects:
•
•
•
•
Hannah Aizenman:
Common Climate Project;
Filipe Fernandes: Extensions to ccc-gistemp;
Daniel Rothenberg:
Homogenization;
(these names might look familiar if you were here yesterday).
• 2012?
•
•
•
•
Program to be announced soon (late Jan);
we hope to be accepted as a mentoring org (March);
then we will welcome student proposals,
or collaborations with scientists.
Nick Barnes at AMS, 2012-01-24
climatecode.org
20
Open Science
• Accelerating trend towards more openness in science.
• Redefining publication:
•
•
•
•
•
•
•
•
•
Open Access;
Open Data;
Open Knowledge;
Open Notebooks;
Data-driven intelligence;
Workshops, conferences, summits;
There’s a war on: PRISM, RWA;
Policy studies at AAAS, NSF, Royal Society, etc;
But no coherent message about open software in science.
• Michael Nielsen: Reinventing Discovery
Nick Barnes at AMS, 2012-01-24
climatecode.org
21
Science Code Manifesto
Code:
All source code written specifically to process data
for a published paper must be available to the
reviewers and readers of the paper.
Copyright: The copyright ownership and license of any released
source code must be clearly stated.
Citation: Researchers who use or adapt science source code in
their research must credit the code's creators in resulting
publications.
Credit:
Software contributions must be included in systems of
scientific assessment, credit, and recognition.
Curation: Source code must remain available, linked to related
materials, for the useful lifetime of the publication.
Nick Barnes at AMS, 2012-01-24
sciencecodemanifesto.org
22
Future Plans
• Changing policies:
• Transparency;
• Rewards for all research products.
• Training scientists:
• Basic techniques (testing, version control, agile, etc);
• Code publication and reuse.
• Providing resources:
• White papers, blog posts;
• Directories.
• Building networks, partnering with institutions;
• Leading by example:
• ccc-gistemp;
• ccf-homogenization;
• etc….
Nick Barnes at AMS, 2012-01-24
climatecode.org
23
Questions?
Nick Barnes at AMS, 2012-01-24
climatecode.org
24
Funding
•
•
•
•
•
•
•
I say "non-profit". Approximately “non-revenue".
All accounts open.
Total revenue to date
£7037.94 (+ GSoC students).
Total costs to date £3888.55 (as of 2011-11-18).
All work unpaid (not counting GSoC students).
Personal lost income to date probably £30-40K.
Funding model seeks £150K-£500K annually from
corporate or NGO sponsorship (plus some project money
from academic collaborations).
• Too much? Not enough? Depends who you ask.
• Open to suggestions!
Nick Barnes at AMS, 2012-01-24
climatecode.org
25