Transcript Slide 1

Data Curation Issues and
Challenges
ARL/CNI Fall Forum 2008
Sayeed Choudhury
[email protected]
Data Flow (Levels of Data)
Pixel data collected
by telescope
Sent to Fermilab
for processing
Beowulf Cluster
produces catalog
Loaded in a
SQL database
Courtesy of
Alex Szalay
Sayeed Choudhury
ARL/CNI Fall Forum 2008
Key Considerations
• Work with existing scientific systems
• Consider gateways for these systems as part of
infrastructure development
• Focus on both human and technical
components of infrastructure
• Human interoperability is more difficult than
technical interoperability
• Trust
Sayeed Choudhury
ARL/CNI Fall Forum 2008
Questions (1)
• How do we transfer principles into new
practices, especially given scale and
complexity?
• What are the fundamental differences
between data and collections? Human
readable vs. machine readable?
• What about the “cloud” or the “crowd”?
• Can flickr help us with data curation?
Sayeed Choudhury
ARL/CNI Fall Forum 2008
Questions (2)
• How does a partnership audit data (and
associated services) distributed across the
network?
• Are audits about “completeness” or perhaps
about transparency and reliability?
• Where are the existing data curators? Maybe
we shouldn’t use the terms data librarian or
data scientist or humanist.
Sayeed Choudhury
ARL/CNI Fall Forum 2008
Questions (3)
• What are the requirements? Are there
common requirements, which may be most
appropriate area for libraries?
• Are there unifying concepts or themes? “One
scientist’s noise is another scientist’s signal…”
• What are we trying to sustain? Data?
Scholarship? Our organizations?
Sayeed Choudhury
ARL/CNI Fall Forum 2008