Analytics and Data Science

Download Report

Transcript Analytics and Data Science

Understanding the field & setting expectations
ANALYTICS AND DATA SCIENCE
BACKGROUND

Personal



Academic


International
UNT Alumni (Mathematics)
Economics & Mathematics
Professional

Academic Research, Hilton, Ansira, Sabre
ANALYTICS & DATA SCIENCE DEFINED



Analytics: Discovery and communication of meaningful patterns
in Data
Data Science: The novel application of algorithms and statistical
techniques to solve business problems.
Reality: Different meanings at different companies


A relatively new field


The culture of the company determines the nature of work that you do
Most Companies are in the process of defining their analytics strategy
Titles common to the field:

Data Scientist, Analytics Consultant, Statistical Modeler, Risk Analyst, Statistician.
TYPE OF PROBLEMS TYPICALLY ENCOUNTERED


Forecasting
“Predictive Analytics”: Classification



Customer Retention/ Churn Modeling





Who is likely to leave for a competitor
Recommendation Engines


Logistic Regression, SVM, Random Forest, Gradient Boosting
Fraud, Customer Acquisition
Netflix Challenge
Customer Choice Modeling

What will people buy

Multinomial Logit Model
Optimization
Market Mix Modeling
Clustering/ Market Basket Analysis
DATABASES & BIG DATA

Most Companies house their data in relational databases





Hadoop -An open source distributed framework for storing and processing
large amounts of data







Oracle, Teradata, IBM DB2, Microsoft SQL
SQL queries used to retrieve data
SQL: a basic entry level requirement to work in this field
Most of tasks require significant amounts of time and energy combining tables and data
Petabytes
Java based
Map-Reduce
Pig, Hive-SQL syntax-Facebook, Impala-SQL syntax, Spark
Spark – UTD offers a Spark Course
HTML
JSON
PROGRAMMING LANGUAGES

Statistical Programming Languages





R- Open Source, easy to learn, unparalleled no. of packages and
functionality, Memory Limitations.
SAS – Very Common in Businesses but losing popularity, expensive,
losing market share to R, handles large data sets well.
Python – Versatile, reasonable no. of packages, R’s biggest
competitor.
Matlab – More common in Engineering field.
General Programming Languages



JAVA – Not knowing java has cost me at least 4 jobs.
C/ C++ - For writing faster R programs
Scala – Spark more common among people on the forefront of development
INTERNATIONAL STUDENTS

Search for positions you are overqualified for.


State your status as soon as possible


Some companies have policies against hiring international students.
myvisajobs.com



More likely to sponsor you
See companies that are sponsoring
See salaries for negotiation purposes
Others.
THINGS YOU MUST HAVE UNDER YOUR BELT

SQL


Experience with Large Data Sets



Get exposure
JAVA


Specialize in something
Linux Experience


Take courses
Free courses at UNT
Very Strong in at least one area (Optimization, Forecasting, Classification)


10k records is no large
SAS/ R


Fundamental Requirement
Learn it.
Multiple Projects (At least 3)- Code Research Paper, Apply a technique to company data,
participate in Kaggle, do internship.
RECRUITING

Universities






Companies


UTD – School of Management/ Operations Research
OSU (Oklahoma) – Analytics and Data Mining Programs
UNT-Economics
SMU- Statistics
Economics, Mathematics, Statistics, Operations Research, Computer
Science, Engineering.
AT&T, Sabre, Epsilon, Amazon,
AnalyticRecruiting.com (lots of Phone Interviews),
Kforce.com (Very Promising and takes care of Visa issues)
MISCELLANEOUS

Kaggle.com





Internships are extremely important








The Home of Data Science
Company recruiting & Pays winners
Many Kaggle winners manage Analytics teams
Compete! Get recognized.
AT&T, Sabre, Epsilon, Amazon, Santander, Capital One in Plano
Companies prefer to hire Mathematicians
Never accept first offer
Jumping around vs. Staying at one company
They always divide by 2
Dallas R user group- Network
Meetup.com – Network
Informs local chapter
BOOKS



The Elements of Statistical Learning: Data Mining, Inference
and Prediction.
The Art of R Programming
The Theory and Practice of Revenue Management
THANK YOU!