Enhanced Data Description Presentation

Download Report

Transcript Enhanced Data Description Presentation

Enhanced Data Description
for End Users
ScribeKey, LLC
Brian Hebert, Solutions Architect
www.scribekey.com
ScribeKey Project Experience
• Global FGDC Metadata
production for large
commercial data
provider(s)
• Federal Agency
Assistance: Assess,
describe, and standardize
large collection of
geospatial datasets
200+ Countries
72 Layers
100s of Attributes
100s of Domains
Quarterly Updates
• Experience with data
cleansing, metadata,
integration, presentation,
application development.
www.scribekey.com
50+ States
400 Layers
1000s of Attributes
100s of Domains
Annual Updates
2
Goal: Make Data Easy to Understand and Use
• Data users today have more information than
ever to keep track of.
• Individual provider data may be just part of
larger data use and mission.
• Learning about data can take considerable time
and effort.
• How to best help data customer understand
and use data the most effectively?
• Reduce the learning curve.
www.scribekey.com
3
Multiple Data Description Sources
Website
Documentation
Metadata
Email
Tech Support
User
Data Itself
Users learn how to use data through a variety of sources
www.scribekey.com
4
Data Description Checklist
• Is there a Data User Guide? A glossary
and index?
• Are primary data categories and entities
fully described?
• Are all acronyms, abbreviations, provider
vocabulary terms explained?
• Are short, cryptic database field names
and values explained?
• Are data types, lengths, keys, nulls
allowed, formats, lists clear to help user
form SQL queries?
• Is FGDC/ISO Metadata available?
• Are sample values and data profiles
available?
• Are data presentations, maps, symbols,
reports prepared for quick start?
• All this info in one place?
www.scribekey.com
Meaning
Structure
Contents
Complete metadata describes
Meaning, Structure, and
Contents.
Maximize understanding by end
user to help write queries/reports.
5
Solution: Lightweight HTML Data Dictionary
Full descriptions of data categories, entities, attributes, domain values.
Information integrated from documentation, data profiles, metadata, and
data provider website. Available as stand alone HTML or on web site.
www.scribekey.com
6
Dataset Overview
A Library Science
Indexing/Abstracting
approach is taken to
ensure the most
important and useful
information is seen first.
Focus here is on clearly
describing top level data
categories, layers and
tables.
Key data provider
terminology and
concepts are explained.
www.scribekey.com
7
Layer and Table Details
Includes Name, Geometry Type,
Definition, Attribute List,
Keywords, and link to standard
FGDC/ISO Metadata
Drill down to review Attributes
and Domains
FGDC metadata is typically
organized and accessed as set
of separate XML documents.
ScribeKey’s approach integrates
these separate documents,
making all information available
at a single access point.
Search/Highlight/Filter/Sort
www.scribekey.com
8
Attributes and Domain Values
Core Data Info: All dataset
metadata including Data
Type, Length, Format,
Nulls Allowed, Primary
and Foreign Keys, Join
Information, Sample
Values, Percent Complete.
This data profiling
information is essential
for end user wanting to
generate information
products as reports, maps,
charts, and graphs from
SQL queries.
www.scribekey.com
9
Helping with the Data Provider/End User
Communication Gap
“Impute
FROMHN
EDGES
ADDRFN
Internal
Point
MTFCC
S1100”
Provider
Language
User
Language
“Layer
Table
Attribute
Map
Symbol
Centroid
Join
Report”
Data providers and users have different languages and
understandings of data. Use of keywords, aliases, and
definitions in data dictionary helps bridge this gap;
provides a translation
10
www.scribekey.com
How Does Data Profiling Help?
NUM FIELD
1 DatasetId
2 DatabaseName
3 TableName
4 RecordCount
5
6
DESCRIPTION
A unique identifier for the dataset
The name of the source database
The name of the source database table
The number of records in the table
ColumnCount
The number of columns in the table
NumberOfNulls The number of null values in the table
An essential tool for enhanced metadata: shows end user actual sample
values, data types, lengths, formats, percent complete, etc. This
valuable contents information is typically not found in metadata.
www.scribekey.com
11
ScribeKey Metadata Generation
•
•
•
Sample data is reviewed
and profiled. Any
metadata is imported into
repository.
From profile, existing user
documentation, technical
support staff, and website,
a metadata repository is
populated and metadata
document templates are
developed.
FGDC/ISO Metadata
generated, as XML/HTML
reports, from metadata
repository.
Metadata
Templates
Metadata
Templates
Metadata
Repository
Metadata
Export
App
PDF
FGDC XML
www.scribekey.com
HTML
DOC
12
Map, Query, Report Preparation
.MXD Preparation
Metadata Layers
Prepared for end user
quick start: can include
symbol set up,
joins/relates, maps,
queries, reports,
Use metadata to create GIS
layers to allow variety of
map presentations,
reports, etc. to summarize
and highlight datasets by
metadata values.
www.scribekey.com
13
The Geospatial Metadata Repository
METADATA
REPOSITORY
Data Layers
Metadata
A
B
C
A
Enhanced User
Views
B
C
Pivot Tables
Areas
Documents
A
Assessments
Data
Dictionary
B
Entities
C
Attributes
A
B
C
Domains
Derivative
Datasets
Meta-Maps
Schemas
The Metadata Repository, implemented as an RDMBS, is populated
with automated tools then used to generate metadata outputs, data
dictionary content, schemas, maps, etc.
www.scribekey.com
14
Recap: ScribeKey Data Description Support
•
Generate or Upgrade FGDC/ISO Metadata
•
Profile Data to provide user with actual contents information
•
Help develop Data User Guides (PDF) and Website Copy
•
Help author Indexes, Abstracts, and Glossaries
•
Integrate multiple and separate data description materials in a
single lightweight HTML front end.
•
Help prepare ArcMap, .mxd, symbols, joins, reports, and maps
•
Result: Data is as easy to understand and use as possible
www.scribekey.com
15
About www.scribekey.com
•
ScribeKey, LLC: Massachusetts
Corporation
•
Brian Hebert, PMP, 30+ years
designing and building desktop and
web DB/GIS solutions
•
Extensive experience producing
metadata and data dictionaries for
data providers and end users
•
Extensive experience with data
integration, data quality
assessments, data cleansing, ETL,
and application development with
ESRI/ArcObjects, .NET, SQL, XML,
HTML
•
Small focused teams, template
approach, quick turnarounds,
practical approach
www.scribekey.com
16