Technology of Data Analytics
Download
Report
Transcript Technology of Data Analytics
Technology of Data
Analytics
INTRODUCTION
OBJECTIVE
Data Analytics mindset – shallow and wide, deep when
you need it
Quick overview, useful tidbits, provide a jumping off
point
AGENDA/ TOPICS
Excel
Tableua
VBA
Hadoop
Access
Analytical Packages: R/ SAS/ SPSS/ Minitab
SQL
SQUARE 1
Business and Technology
Entity
Attributes
Schema
Relational Database
ETL - Extract Transform Load
Data Mining
START WITH EXCEL
It’s the easiest and most available platform
Can teach others to maintain
Collect
Store
Analyze
Report/ Visualize
•Data Validation Drop
Downs
•vLookups
•Formulas
•If, And
•Pivot Table
•Charts
•Conditional
Formatting
•Offset
VISUAL BASIC FOR APPLICATIONS
Microsoft language
Object Oriented- noun.verb; noun.adjective=“adjective”
Record macro and play around
Modules and Userforms
Cell Referencing - cells(x,y).select
For loop – for index=startingnumber to ending number
If logicalstuff then stuff that happens end if
Use it for:
Moving data
Changing charts
GOOGLE DOCS: COLLECTION
Somebody already did everything for you
Google people are smarter than you
You can use the interwebs: instead of local drive
ACCESS
Beginning of databasing
Table
View
• Like Excel
spreadsheet
• Tightly defined
values
allowed
• Pulling info
from tables
using logic
• A lasting query
that is used to
populate
reports
Form
• Data input
Report
• Generates
reports
SQL
Big Boy Access
Same as Access without the bumpers and hand holding
Real deal use in software world
Can be used for maintenance and diagnosing software
back ends
Table
• Like Excel
spreadsheet
• Tightly defined
values allowed
View
• Pulling info from
tables using logic
• A lasting query
that is used to
populate reports
Query
• Viewing data
Stored Procedures
• Loading and
moving data
• I don’t really know
SRS
• Web based
reports
TABLEAU
Connections
Worksheets
Views
Dashboards
Stories
HADOOP
Virtualizes multiple computers/ servers to create a cloud computing
unit
Hadoop Common – contains libraries and utilities needed by other Hadoop
modules.
Hadoop Distributed File System (HDFS) – a distributed file-system that stores
data on commodity machines, providing very high aggregate bandwidth
across the cluster.
Hadoop YARN – a resource-management platform responsible for
managing compute resources in clusters and using them for scheduling of
users' applications.
Hadoop MapReduce – a programming model for large scale data
processing.
Get started at: http://hadoop.apache.org/docs/current/
Analyze: SAS/ R/ SPSS/ Minitab
S.A.S.
R
• Academic/
Common
• Open source
S.P.S.S
Minitab
• IBM
• Analytical
Excel
Other
iTunes U: Data Visualization
CoursEra: Introduction to Data Science
Code Academy: other programming languages
EDUCATION PROJECTS
Open Source Education – BDAA Book of Knowledge
Stats Cheat Sheet
Excel Guide
SQL Guide
How to Guides in General….