Data Management Needs and Challenges for Telemetry Scientists
Download
Report
Transcript Data Management Needs and Challenges for Telemetry Scientists
Data Management Needs and
Challenges for Telemetry Scientists
Josh M London
Wildlife Biologist, Polar Ecosystems Program
National Marine Mammal Laboratory
NOAA NMFS Alaska Fisheries Science Center
Temptation to
identify biologists
as the source for
the raw data
The Tip of a Complex Iceberg
Publications
Contract reports
Status/Listing Review
derived products
movement model
data quality control synthesis
Data Management
Narrowing Bottleneck
Many biologists lack the
skills and training for
effective, scalable
database design and data
management practices
Deployment of tags (location, age/sex, time)
tag design/vendor
tag programming
opportunistic vs. planned
hypothesis agency needs/mandates
funding initiatives
Field Work
and
Study Design
Field Work & Tag Deployment
When? Where?
Which Tag/Vendor?
Which Age? Which Sex?
(Do we have a choice?)
Tag Programming
Deployment Length
(attachment type)
Limited Tools for Managing Raw
Telemetry Data
‘raw’ data
via Argos as CSV/Text
Process w/ Vendor Software
(behavior data)
Typically output as CSV
Field data about animal (e.g.
ID, species, sex, age, health)
needs
Explore ‘raw’ data
Address hypotheses
Visualize movement/use
Synthesize w/ dependent (e.g.
health, age) and independent
data (e.g. other animals,
remote sensed)
Biologists Not Trained in Large Scale
Data Management
Biologists
Excel and/or Access
ESRI ArcMap (shapefiles)
Google Earth
Mouse Click Interaction
Programming (visual basic, R,
python) recipe driven … not
developers
Data Manager
Postgres/PostGIS, Oracle,
MySQL, SQL Server
Normalization and Efficient
Design
Scripting, Jobs, Transactions
Data Integrity
Automation, Reproducible
My Perspective
To address complex questions related to marine mammal telemetry and
understanding animal ecology, I had to become more of a data manager
…And, in the process, I’ve become less of a biologist
Start (2006)
Current System
Argos Monthly CDs
SatPack Access Database
Excel Files (limited to
56k)
Large, Flat Tables
No Central Repository
Nightly FTP Argos Push
Nightly Data Processing
CSV/External Oracle Table
PL/SQL Procedures
Developed/Designed with
Training via Google Search
My Perspective
Current Limitations
Data access requires a minimum level of technical
skills (basic SQL, Oracle framework, Oracle APEX, R
spatial tools, ArcMap)
Single Point of Access/Failure (me)
Limited Documentation of Design
Design May Not be Optimal/Appropriate
Main Objective to Provide Data to Analysts – Not
necessarily designed for providing data to public
My Perspective
Greatest Needs – Research Program
Data Management and Design Consultation
Data Design & Documentation Portal
(user-friendly metadata)
Low Tech Exploration Tools
Database and Application Developers
(data flow and data input)
Training Opportunities
My Perspective
Greatest Needs – External to Program?
Provide Meaningful Public Access to Data
A Clear Data Sharing Policy w/ Best Practices
Encourage/Facilitate Scientific Collaboration
Meet Agency Needs and Requirements
How to Communicate Scientific Knowledge in the
Modern/Digital Age–sharing knowledge/expertise just
as important as sharing data
Publish Data Once
My Perspective
Challenges / Road Blocks
Limited Funds and Priorities – appropriate resources
for doing the priority analysis and science not
available, let alone the resources to distribute data
responsibly
Database design/management often in the hands of
the least skilled users
IT Policies, Investments, and Infrastructure Varied
Across Institutions
No standard(s) for communicating and sharing ‘raw’
animal telemetry data. What is ‘raw’ data?