SASDAY_Maricom_InfoMaps_Final_20110930
Download
Report
Transcript SASDAY_Maricom_InfoMaps_Final_20110930
SAS Enterprise Business
Intelligence (SAS/EBI) Information
Maps Best Practices
CMS SAS Users Group Conference
Learn more about THE POWER TO KNOW®
October 17, 2011
Presented by:
Letha Christian, Lead SAS/EBI Implementation Project
Vivek Sethunatesan, Lead BI Architect
Maricom Systems, Inc.
1
Agenda
•
•
•
•
•
•
•
•
•
Objective
SAS/EBI Project Implementation
Integrated Data Repository (IDR) and Challenges
SAS/EBI Information Maps and Features
Design Approach and Best Practices for Part D
Information Map
SAS Information Map Studio
Verification of Information Map and Challenges for
Development
Best Practices for Information Map Development
Questions
2
Objective
This presentation provides an overview of
Information Map development in support
of the SAS/EBI Implementation
project. The presentation will touch on the
development challenges faced and
overcome, and will also provide an
overview of recommended best practices.
3
SAS/EBI Project Implementation
The SAS/EBI project has three major goals:
1. Implement a hardware infrastructure that supports
the SAS/EBI environments in the CMS Data Center.
2. Implement the SAS/EBI Tool Suite. This provides
additional tools for CMS users that will allow decision
makers to spend less time looking for answers and
more time driving strategic decisions.
3. Implement SAS/EBI capabilities to provide query and
reporting functionality for data that primarily resides
in the IDR and uses a Teradata database. Connectivity
to other data sources such as Oracle and DB2 are also
available.
4
Integrated Data Repository (IDR)
• Contains claims, provider, and beneficiary data from 2006 to the
present in a highly normalized Teradata Data Warehouse.
• Contains millions of beneficiaries and billions of claims.
• Consists of detailed Part A, B, and D data, including every version
of Claims Data.
• Consists of supporting reference data for Beneficiary, Provider,
Drug, Contract, Geography, and Product (HCPCS, Diagnosis, Rev
Center, IDE, and Procedure).
• Contains V2 views as the access layer for applications and users.
• Supports As-Is and As-Was reporting.
• Provides reporting capabilities based on National Provider
Identifier (NPI) crosswalk data.
• Contains drug data from Medi-Span® and First DataBank.
5
Integrated Data Repository (IDR)
6
IDR Data Model for Medicare View
7
IDR Challenges for Business
Reporting
• Users must write their own queries to pull information
from the IDR.
• Users need a strong understanding of the data
sources, relationships, and join conditions.
• Possibility of missing joins, business rules, and
inconsistent results.
• Increased level of effort.
• Complexity of software – not simple drag and drop.
• Performance degradation can result when pulling
detailed level data that is not required.
• May require increased capacity requirements for the
IDR.
8
SAS/EBI Information Maps
• Provides (business) metadata about the data contained
within a data warehouse (IDR).
• Provides information to the user about how data tables are
related.
• Acts as a semantic layer built on top of the Medicare Virtual
Data Mart (VDM).
• Ensures users are querying the IDR in a consistent and
accurate manner.
• Created and managed by information architects/
development teams with input from information
consumers.
• Information Maps change as the IDR changes. These
changes require a structured approach for implementation.
9
SAS/EBI Information Maps
• Generates join conditions, data summarizations, and query
code against the IDR (using Information Map Studio).
• Provides security and guard rails.
• Allows targeted view of data.
• Allows use of data filters (pre-defined or custom).
• Allows for ‘in database processing’ and data summarization
at the appropriate levels, based on the requirements (e.g.,
state-level summarized data).
• Reduces report building and analysis time.
• Supports a wide range of users.
• Must be used as data sources in SAS Web Report Studio.
• Integrates across the SAS Enterprise suite of products.
10
Information Map Design Approach
• Data sources for Information Maps were predetermined to be the IDR Medicare VDM. The
initial approach was to design by functionality and
subject area. The first map was developed for Part
D subject area.
• Used SAS Information Map Studio to build and
manage Information Maps.
• Validated access of Information Maps using
business queries through SAS Web Report Studio
and Enterprise Guide applications.
11
Information Map Development
Best Practices
• Developed customized labels that were
understandable by business users.
• Established business rules that provided
consistent delivery of information by using
standard filters and calculations.
• Provided appropriate joins within the data
sources so SAS could automatically
generate the appropriate query code.
12
Part D Information Map
• Accesses Medicare Part D claims information only.
• Thirty-seven commonly used Part D data
elements. Data elements use IDR naming
conventions understood by business users.
• Data elements grouped by business area, such as
Provider, Beneficiary, Geographic, and Drugs.
• Contains four pre-defined filters ̶ Final Action,
Claim Type, Reconciliation Year, and Date Ranges ̶
which will be used to limit the data results. Use
of these filters will be critical to overall
performance due to the size of the IDR tables.
13
Part D Information Map
• Provides calculations for the following aggregates and
metrics:
–
–
–
–
–
–
–
Total Drug Cost
Distinct Beneficiary Count
Count of PDEs
Claim Total Drug Costs
Claim Line Amount Totals
Contract PBP Amount Totals
NDC Amount Totals
• Over 120 joins have been mapped to support the join
relationships of tables in the IDR Part D Information
Map. SAS will use these join relationships to help
generate SQL to satisfy end-user requests.
14
Information Map Studio Presentation
SAS Information Map Studio provides an easy-to-use
interface that allows development teams to manage
Information Maps.
15
Information Map Studio Properties
The Properties tab enables developers to select attributes for each
element within the Information Map. Developers can also describe
and determine the use of each element.
16
Information Map Studio Relationships
The Relationships tab enables developers to define the
join criteria for each table included within the
Information Map.
17
Validating the Part D Information
Map
• Ensured all joins defined in the Information Map
are supported by the IDR relational data model.
• Solicited queries from business users from their
previous reports.
• Used queries generated for the Part D Dashboard
and validated the data results against the results
using the Information Map.
• Documented the performance of the queries
across environments executed to note the
behavior of different filters and aggregations.
• Performed User Acceptance Testing with users
familiar with Part D data and received sign-off .
18
Challenges of Information Map
Development
• Determining the Information Map data source.
– Use of detailed level data warehouse results in complexities
in SQL generations and performance implications.
• Deriving requirements was challenging due to
aggressive timelines and the availability of
stakeholders.
• Obtaining valid/realistic SQL for testing.
– Limited query reports on Part D.
• Environment/performance issues.
– Data availability in the DEV, VAL, and PRD environments.
– Different data volumes across environments.
• Scope of model, 120+ joins to verify – detailed
analysis.
19
SAS/EBI Best Practices
and Considerations When
Accessing BIG DATA
20
Typical Access Using SAS/EBI Tools
Information Delivery Portal
End users
SAS Enterprise Guide
SAS Web Report Studio
SAS Power Users
SAS Standard Users/Power Users
Information Maps
Ad hoc and
extracts
SAS
Tool
Developers
Ad hoc and
canned reports
Data Mart
Data Architects
21
Information Maps Best Practices Overview
• Need to ensure that the time to return data does
not suffer as the complexity and size of the data
source grows.
• Predefined filters should be used to limit the data
returned.
• New calculated columns bring consistency to
reporting.
• Additional guidelines should be considered for
larger data sources.
• Appropriate measures should be taken to
thoroughly test Information Maps from a
performance perspective.
22
Accessing Source Data Using SAS
Information Maps
METADATA
Data Source
CLIENT ACCESS
Data Mart Views
INFORMATION MAP
Categories, Measures
Filters , Relationships etc..
SAS LIBNAME Engine
Translates to SQL
Data returned to client
23
Factors that Influence Information
Map Performance
BIG DATA
Data Source Design
Work done is SAS
Server vs. Database
(Cube, VDMs, PDMs, SAS/EBI Data Access via
SAS Datasets)
Information Maps
Number of database calls
when executing reports
24
Information Map Data Source
Considerations
Data sources can serve one Information Map or support multiple
Information Maps.
25
SAS OLAP Cubes
SAS OLAP Cubes - An OLAP cube provides access to data that has
been summarized into multi-dimensional views and hierarchies.
Accessing data from an OLAP cube is typically fast because most
resource-intensive calculations are performed as part of the
generation of the summarized data.
26
Teradata Stored Procedures
Teradata Stored Procedures - In more complicated
database structures, it may be necessary to create
multiple sub-queries or other complex code. In this case,
an optimally tuned Teradata stored procedure can be
used. Once created, it can be accessed like any other
table or a database view from the Information Map
because the Libname Engine sees it like a single database
table.
Information
Maps
VDMs
Teradata Stored Procedures
27
Information Map Data Source
Considerations
De-normalized Views - In SAS/EBI, the Libname Engine decides
how to request data from Teradata; that is, whether to send
one query or send multiple queries and join the results on the
SAS side. If there is only one table/view to query from, then
the Libname Engine sends a single query to the database.
PRVDR_DM_VIEW
PRVDR_DM_VIEW is created by
Joining all xref tables and PRVDR.
28
SAS Datasets
SAS Datasets - SAS datasets are similar to Teradata tables
but are stored in a SAS Server location. They can be used
across multiple projects in Information Map Studio with
similar properties as a Teradata table and can be created
using Enterprise Guide.
Information
Maps
EG Process
Flow
SAS datasets
29
Physical Data Marts
Physical Data Marts - Physical Data Marts are the best
practice structures for supporting multiple users and
applications with business-critical data. They generally
take the form of Star Schema designs that aggregate
detail data from the main tables and views.
Source Database Foundation
Information
Maps
30
SAS Stored Process
SAS Stored Process - SAS code or program flow that is
registered in the Metadata server using the Management
Console. Stored processes can have parameters for user
input and produce various output such as datasets,
reports, or charts. Once created, they can be called from
the Enterprise Guide or Information Map.
31
Web Report Studio - Best Practices
The following WRS options can help the performance of a
report.
• Report Layout
– Use Group Breaks to eliminate scrolling. WRS will only query
and process one element of the Group Break field at a time,
so performance is an added benefit.
– Cross-tabs are memory intensive; use list reports, if possible.
List reports will paginate the output one page at a time,
greatly reducing system overhead.
– Limit size to a maximum two pages of generated output. Use
Group Breaks in conjunction with the Cross-tab, or in
conjunction with a list report. If the total output (including all
Group Breaks) is over 20 pages, consider scheduling the
report. If the total output is over 100 pages, do not use WRS.
32
Web Report Studio - Best Practices
• Scheduling
– If the report is to be used by separate groups (Bene state,
Bene City, etc.) and does not require a prompted response
from the user, you can schedule the WRS report and use the
distribution option to deliver a PDF to a subscription channel.
• Prompts
– Do not store pick-list values that will change frequently within
the Information Map. You will need to re-deploy the
Information Map every time the list needs updating.
– Avoid using database-driven lists greater than 3000. Dynamic
lists do not paginate; use a user-entered filter.
– Do not use the main/fact table as a source for a dynamic picklist filter. The system needs to use a ‘select distinct’ and if it is
doing that on a multi-million row table, users will wait too
long for the pick-list to generate.
33
SAS BI Performance Tuning Conclusion
• SAS has many options for building robust BI
applications.
• Thorough analysis must be conducted in order to
determine which options should be leveraged.
Level of
Development
Level of
Administration
Reusability
Teradata Stored Procedures
Medium
Medium
Medium
SAS Stored Process
Medium
Medium
Low
SAS Datasets
Medium
Medium
Medium
SAS OLAP Cubes
High
High
Medium
IDR De-normalized Views
Low
Low
High
IDR Physical Data Marts
High
High
High
Data Sources
34
Questions?
35