Amazon mL – MODEL

Download Report

Transcript Amazon mL – MODEL

PREDICTIVE ANALYTICS
AN INTRO TO AWS MACHINE
LEARNING
AGENDA
• About me
• Predictive Analytics
• Amazon Machine Learning (ML)
• Amazon ML – Key Concepts
• Amazon ML – Datasources
• Amazon ML – Models
• Amazon ML – Evaluations
• Amazon ML – Demo
ABOUT ME
Naveen VK
•
Principal Architect at NVISIA, a regional software development company
•
Worked for NVISIA for over 17 years
•
Designed and built custom multi-tier applications using Java Enterprise stack for various companies
•
Involved in entire application development lifecycle including requirements gathering, architecture, design,
implementation, integration, testing and deployment
•
Some clients: ETF - State of WI, American Family, Harley Davidson, Cumulus Media
•
Currently working at ETF (Employee Trust Fund)
• Manage pensions, insurance and other benefits for state and local employees
• Involved in multiple projects (5) and currently supporting multiple applications (7)
•
Has deep expertise in databases like Oracle (since 1994) and DB2 (since 1999) and with SQL queries and
PL/SQL stored procedures
•
3 fun facts about myself
2
NVISIA® Confidential 2016
PREDICTIVE ANALYTICS
What is Predictive Analytics? Some use cases/examples
3
NVISIA® Confidential 2016
PREDICTIVE ANALYTICS
What is it?
•
Mining data, using statistical algorithms and machine learning to predict trends or probabilities
•
Use historical data and patterns in historical data to predict future
•
Create models based on patterns in data to predict the probability of something happening in the future
•
The better the model and the training data, the better the prediction
Examples
•
Is this email spam?
•
Will this product sell?
•
How many units of this product will sell?
•
Is this product a piece of clothing, a book or a movie?
•
What price will this house sell for?
•
What will be the temperature here tomorrow?
4
NVISIA® Confidential 2016
AMAZON MACHINE LEARNING
(ML)
What is it? When to use it?
5
NVISIA® Confidential 2016
AMAZON MACHINE LEARNING (ML)
•
AWS (Amazon Web Service) cloud-based service for predictive analytics
•
Use tools and wizards to create machine learning models
•
Use simple APIs to obtain predictions for your application
•
No need to write custom code or have supporting infrastructure
•
Finds patterns in your existing data
•
Use models to process new data and generate predictions
When to use ML?
•
ML is not a solution for every type of problem
• A target value can be determined by coding simple rules, computations and steps without any data-driven
learning
•
Use ML when the rules cannot be programmed easily
• Too many factors
• Too many overlapping rules
• Too much fine tuning of rules
•
Use ML when the solution cannot be scaled
• 100s of Millions vs. 100s (Example: manual vs. automated spam filter)
6
NVISIA® Confidential 2016
AMAZON ML – KEY CONCEPTS
Terms and concepts
7
NVISIA® Confidential 2016
AMAZON ML – KEY CONCEPTS
Datasources
•
Contains metadata associated with data inputs to the ML
•
Speadsheets, CSV files, Streaming data, Relational data base
ML Models
•
Patterns in data to generate predictions
Evaluations
•
Measure the quality of ML models
Batch Predictions
•
Multiple data inputs aka batch data
•
Asynchronous
Realtime Predictions
•
Individual data inputs
•
Synchronous
8
NVISIA® Confidential 2016
AMAZON ML – DATASOURCES
Details of datasources in Amazon ML
9
NVISIA® Confidential 2016
AMAZON ML – DATASOURCES
•
In Amazon ML, a datasource contains only the metadata about the actual input data
•
Actual data may be stored in
• Amazon S3 buckets
• Amazon Redshift Databases
• MySQL databases in Amazon Relational Database Service (RDS)
• Amazon Kinesis
•
Attributes
• Column headings represent attributes
• Unique
• Required
•
Target Attribute
• The data that is being predicted
• Training data has a target attribute that has already been predicted (required in training data)
•
Observation
• Single row of data
•
Input data
• All observations aka Rows in spreadsheet/csv file or database
10
NVISIA® Confidential 2016
AMAZON ML – DATASOURCES CONTINUED
•
Schema
• All attributes and corresponding data-types of input data
•
Location
• Location of input data stored in, say, Amazon S3 bucket
•
Row ID
• Attribute flagged to be included in prediction output
• Helps cross-reference the prediction with the observation
• Unique for each observation
• Optional
•
Datasource Name
• Human readable name of the datasource
• Optional
•
Statistics
• Summary stats for each attribute of input data
•
Status
• All attributes and corresponding data-types of input data
11
NVISIA® Confidential 2016
AMAZON ML – MODEL
Details of mathematical model in Amazon ML
12
NVISIA® Confidential 2016
AMAZON ML – MODEL
•
In Amazon ML, a model finds patterns in data and generates predictions
•
Three distinct types of models
• Binary
• Multiclass
• Regression
•
Type of model chosen based on the type of target to predict
•
Binary Model
• Predicts values that has 1 of 2 states: true/false, 1/0, win/lose, alive/dead, pass/fail, healthy/sick
• Uses industry-wide standard learning algorithm called Binary Logistic Regression Algorithm
• Statistical model used to predict the probability of a binary response based on certain variables
• Examples
• Is this email spam?
• Will this product sell?
•
Multiclass Model
• Predicts values that belong to a pre-defined, limited set of states (1 of 3 or more states)
• Uses industry-wide standard learning algorithm called Multinomial Logistic Regression Algorithm
• Examples
• Is this product a book, a movie or apparel?
• Is this movie a thriller, a documentary or a comedy?
13
NVISIA® Confidential 2016
AMAZON ML – MODEL
•
Regression Model
• Predicts a numeric value
• For regression problems
• Uses industry-wide standard learning algorithm called Linear Regression Algorithm
• Statistical model to predict the value of y based on a number of variables x1, x2, x3, etc.
• Examples:
• What will the temperature be tomorrow?
• How many units of this product will sell?
• How much will this house sell for?
•
Recipe
• Attributes and attribute transformations available to train the model
•
Model size
• In MB
• Directly proportional to patterns stored in model
•
Number of passes
• The number of times the datasource is used when training the model
•
Regularization
• ML technique to get higher quality models
14
NVISIA® Confidential 2016
AMAZON ML – EVALUATIONS
Evaluate the model in Amazon ML
15
NVISIA® Confidential 2016
AMAZON ML – EVALUATIONS
•
In Amazon ML, an evaluation measures the quality of the ML model
• Need to evaluate a model to determine if it will do a good job predicting the target on new/future data
• Need training data where target is already predicted to train/evaluate a model
•
Max size of training data: 100KB
•
Model Insight
• Amazon ML will provide metrics and insights to review accuracy of the model
• Overall success metric of the model
• Visualizations to explore accuracy of model
• Alerts to check validity of evaluation
•
Focus on Binary Insights only for this presentation
16
NVISIA® Confidential 2016
AMAZON ML – EVALUATIONS – BINARY INSIGHTS
•
Prediction score
• Actual output of the binary prediction
• Indicates the system’s certainty that the given observation has target value of 1
• Output scores of observations is between 0 & 1
• Default threshold score aka cut-off is 0.5, this can be changed
• Any observation that scores above cut-off is predicted as target=1 and below cut-off is predicted as 0
•
Correct predictions
• True Positive (TP)
• Predicted value of target = 1, true value of target = 1
• True Negative (TN)
• Predicted value of target = 0, true value of target = 0
•
Incorrect predictions
• False Positive (FP)
• Predicted value of target = 1, true value of target = 0
• False Negative (FN)
• Predicted value of target = 0, true value of target = 1
•
Area Under the Curve (AUC)
• Measures the ability of the model to make a correct prediction
• AUC near 1 indicates model is highly accurate (near 0s?)
17
NVISIA® Confidential 2016
AMAZON ML – EVALUATIONS – BINARY INSIGHTS – AUC (AWS TUTORIAL)
18
NVISIA® Confidential 2016
AMAZON ML – DEMO – BINARY MODEL
•
Demo
• Simple – predicting will this product sell?
• Not so simple – predicting will this person survive?
•
Checklist
• Predictive Analytics
• Amazon Machine Learning (ML)
• Amazon ML – Key Concepts
• Amazon ML – Datasources
• Amazon ML – Models
• Amazon ML – Evaluations
• Amazon ML – Demo
•
Pricing
• https://aws.amazon.com/machine-learning/pricing/
• Data analysis and model building: @0.42/hr
• Batch predictions: $0.10/nearest 1000 (rounded up to the next 1000)
• Realtime predictions: $0.0001/transaction (rounded to nearest penny)
• S3 Standard storage: $0.03/TB/month
•
Questions
19
NVISIA® Confidential 2016
THANK YOU FOR COMING
Links:
http://docs.aws.amazon.com/machine-learning/latest/dg/what-is-amazon-machine-learning.html
https://www.kaggle.com/
Contact Info:
Linked-In: Naveen VK
Email: [email protected] (work)
[email protected] (personal)
Github: https://github.com/navnoon23/