Nalluri_Radha_Presentationx

Download Report

Transcript Nalluri_Radha_Presentationx

CPSC8985 FA 2015
Team C3
DATA MIGRATION FROM
RDBMS TO HADOOP
By
Naga Sruthi Tiyyagura
Monika Rallabandi
Radhakrishna Nalluri
Introduction
 Data data more data several petabytes (PB) of data id transferring
every day. Oracle, IBM, Microsoft and Teradata own a large portion
of the information on the planet.
 The bigger the volume of information moves from Oracle to DB2 or other is
testing assignment for the business.
 IT teams are burdened with ever-growing requests for data.
 Decision makers become frustrated because it takes hours or days
to get answers to questions, if at all.
 Traditional architectures and infrastructures are not up to the
challenge.
Abstract
 Current data is available in the RDBMS databases like oracle, SQL
Server, MySQL and Teradata.
 We are planning to migrate RDBMS data to big data which is
support NoSQL database and contains verity of data from the
existed system it’s take huge resources and time to migrate pita
bytes of data.
 Time and resource may be constraints for the current migrating
process
 The Apache Hadoop software library is a framework that allows for
the distributed processing of large data sets across clusters of
computers using simple programming models.
Proposed System
 By utilizing Sqoop we will import information from a social database
framework into HDFS.
 Sqoop will read the table column by-line into HDFS. The yield of this
import procedure is an arrangement of documents containing a
duplicate of the foreign made table.
 Thus, the yield will be in different documents. These documents may
be delimited content records or paired .
 In the wake of controlling the foreign records with Hive we will have
an outcome information set which you can then fare back to the
social database.
Structure
Real-time
Hadoop
cluster
Script writers
Database
Data in MySQL
File
Web servers
Hadoop Hive
FLOW
Step 1: Convert the data into files by using Sqoop
sqoop import --connect jdbc:mysql://localhost/gsuproj --username
sruthi --password sruthi --table pagelinks --target-dir sqoop-data
Step 2: Store file into Hadoop cluster
hadoop fs -copyFromLocal /root/pagelinks
hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/pag
elinks
Step 3: Read Data from HIVE
CREATE external TABLE pagelinks (
pl_from string,
pl_namespace string,
pl_title string,
pl_from_namespace string )
Row Format Delimited fields terminated by '~'
LOCATION hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse;
LOAD DATA INPATH hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse
INTO TABLE pagelinks;
Advantages
 Scalable It can store and distribute very large sets across hundreds
of the inexpensive servers that operate In parallel.
 Flexible Access different types of data (Structured and
unstructured)
 Resilient to failure Data is sent to an individual node and also
replicated to other nodes in the cluster ,another copy
available
for use
 Fast Analysis Unique storage methods is based on a distributed file
system . Efficiently process TB of data in just minutes and PB in hours
 Cost effective