eu_20110919_scidsl - Oregon State University

Download Report

Transcript eu_20110919_scidsl - Oregon State University

Obstacles and opportunities
with using visual and domain-specific
languages in scientific programming
Michael Jones, Christopher Scaffidi
School of Electrical Engineering and Computer Science
Oregon State University
Scientists as end-user programmers
• Professional objectives
– Discovering knowledge
– Disseminating knowledge
– Funding knowledge creation
• Reasons for programming
– Data acquisition
– Data analysis
– Data visualization
2
Photo credit: Microsoft Office (2010)
Background  Study  Results  Conclusion
Results from prior empirical studies
• Testing
– Often reliant on visualization, rarely systematic
• Reuse
– Usually white-box rather than black-box
• Languages
– Usually procedural and compiled
– Rarely changes during a project
3
Background  Study  Results  Conclusion
Few mentions of visual or
domain-specific languages (DSLs)
• Three studies mentioned…
– Matlab was used to pre-process data
– Hypercard, LabVIEW, spreadsheets were used
• More widespread use might be expected!!!
– Matlab: “enjoys wide usage among scientists”
– LabView: over 1 million users worldwide
– Excel: 500 million users… maybe some scientists?
4
Background  Study  Results  Conclusion
We need more details
• Key research questions:
– To what extent are DSLs used by
scientists?
– What concerns do scientists have
about using DSLs and other
languages?
5
?
Background  Study  Results  Conclusion
Interviews of scientists
• Artifact walkthrough
– Interviewee identified a recent programming project
– We didn’t mention visual/domain-specific languages!
– The selected project served as an anchor for questions
• Interviewed 10 scientists
– One later requested that his data not be used
– Recruited by emails to physical/natural scientists
6
Background  Study  Results  Conclusion
Participant and project overview
Programming
experience (yrs)
Job title
15
Professor
7
Field
Project age
(years)
Biology
6
5
Scientist
15
Scientist
Bioinformatics
6
1
Graduate
student
Bioinformatics
3
15
Scientist
Meteorology
18
4
Professor
Chemistry
2
1
25
15
Professor
Scientist
Scientist
Chemistry
Physics
Physics
11
2
6
Background  Study  Results  Conclusion
Participants and their projects
• Most participants were end-user programmers
– 6 produced code for own use
– 2 produced code for others’ use
• Each program performed scientific modeling
– Mining historical data to generate models
• E.g., mining weather + disease data to create models
– Running models to make predictions + visualizations
• E.g., simulating chemical reactions
8
Background  Study  Results  Conclusion
DSLs were indeed in widespread use
Language / tool
# projects using it
Excel
2
Matlab
2
Minitab
1
Sigma Plot
1
ArcGIS
1
Flash
1
Silverlight
1
MayaVi
1
Only 2 of 8 projects had exclusively relied on non-DSLs (C, C++, Perl)
9
Background  Study  Results  Conclusion
Most projects demonstrated a language
transition
• Addition or replacement of languages
– 5 of 8 projects added one or more DSL
– 4 of 8 projects added a non-DSL
• Lots of “language thrashing”
– Biologists: Excel, Matlab, C, Perl, Minitab, Sigma Plot
– Chemists: Fortran, C, Perl, Java, PHP, ArcGIS
• What concerns motivated language choices?...
10
Background  Study  Results  Conclusion
Four concerns that drove language choices
5
# projects
4
3
DSLDSL+
TraditionalTraditional+
2
1
0
11
Background  Study  Results  Conclusion
6
Concern #1: Need for specific functionality
Physical
measurements
Historical
data
Hypothetical
values
12
1. Pre-process by
transforming
(often DSL)
# projects
5
4
3
2
1
0
2. Instantiate, run
models (not DSL)
Models’
output/predictions
DSL Trad.
3. Post-process:
transform or
visualize
(often DSL)
Background  Study  Results  Conclusion
Concern #2: Complexity
6
• Complaints about complexity
# projects
– Traditional languages:
poor fit, unintuitive
5
– Regardless of language:
algorithmic complexity
3
2
1
0
– When mixing languages:
Hard to trace data dependencies and
relationships
13
4
DSL Trad.
Background  Study  Results  Conclusion
6
# projects
Concern #3: 5Cost and time
4
DSLDSL+
TraditionalTraditional+
3
• Tendency to grab what was
familiar
2
– Little resistance to licensing
1 costs
0
– Often reliant on what colleagues
recommended (including grad students)
DSL Trad.
• Led to some unpleasant surprises
– Often reliant on outside consultants acting
as advisors or developers
• Big trouble when grants run out!
14
Background  Study  Results  Conclusion
Concern #4: Performance
6
5
# projects
• Traditional languages
were
4
preferred over3DSLs
DSLDSL+
TraditionalTraditional+
2
– One project team
used parallel
programming;1 one team wanted it
– Challenges:
0
DSL Trad.
• Shared servers
• Time for revising code
• Insufficient funds
15
Background  Study  Results  Conclusion
Two non-language concerns
• Lack of version control
– Diving into coding
– No planning ahead for managing versions
– No planning ahead for team coordination
• Lack of documentation
– No proactive recognition of the need
– Insufficient secondary notation support in DSLs
– Heavy reliance on naming conventions
16
Background  Study  Results  Conclusion
Key results
• DSLs are…
– Alive and well among scientists
– Often used for data transformation and visualization
• Opportunities to help scientists with…
– Selecting appropriate languages
– Tracing data flow among multiple programs
– Improving performance
– Managing different versions
17
Background  Study  Results  Conclusion
Questions?
Princeton pulsar lab, where I worked with astrophysicists (1996-98)
18
Photo credit: colleague Ingrid Stairs (2006)
Background  Study  Results  Conclusion