Chapter 7. Importing and Exporting Data from Various DBs

Download Report

Transcript Chapter 7. Importing and Exporting Data from Various DBs

Chapter 7.
Importing and Exporting Data
from Various DBs
인공지능연구실
G201549028
조찬연
Artificial Intelligence Laboratory
Contents
 Understanding different types of files
Learning about data files as database
MySQL
Excel
MongoDB
SQLite
PostgreSQL
Hive
HBase
Artificial Intelligence Laboratory
Learning about data files as database
R studio supports following database system by realated R packages
Database system name
Useful R packages/ function
 Text files
 Text data files
 MySQL
 RMySQL
 Excel
 Xlsx
 Mongo
 Rmongo
 SQLite
 RSQLite
 PostgreSQL
 RPostgreSQL
 HDFS
 RHDFS
 Hive
 Rhive
 HBase
 RHbase
Big Data Analytics with R and Hadoop, 2013, p 81
Learning about data files as database
Purpose of this chapter
While dealing with the data analytics activities,
we need to do data importing, loading, or exporting.
Sometimes the same operations need to be
iterated with R programming language. So, we can
use the available R function for performing the
same data activities.
Artificial Intelligence Laboratory
Understanding different types of files
Importing the data into R - CSV, txt, .RDATA, .rda
 CSV
(Comma Separated Values)
 txt
(with Tab Separated Values)
 .RDATA
(R’s native data format)
 .rda
(R’s native data format)
Artificial Intelligence Laboratory
Understanding different types of files
CSV
 CSV
(Comma Separated Values)
A comma-separated values (CSV) (also sometimes called characterseparated values) file stores tabular data (numbers and text) in plain-text
form. Plain text means that the file is a sequence of characters, with no
data that has to be interpreted as binary numbers. A CSV file consists of
any number of records, separated by line breaks of some kind; each record
consists of fields, separated by some other character or string, most
commonly a literal comma or tab. Usually, all records have an identical
sequence of fields.
https://en.wikipedia.org/wiki/Comma-separated_values
Understanding different types of files
CSV - Import
Artificial Intelligence Laboratory
Understanding different types of files
CSV - Export
Artificial Intelligence Laboratory
Understanding different types of files
.RDATA
 .RDATA
(R’s native data format)
RDATA: Here, the .RDATA format is used by R for storing the workspace
data for a particular time period. It is considered as image file. This will
store/retrieve all of the data available in the workspace.
example
Artificial Intelligence Laboratory
Understanding different types of files
.RDATA - Export
Artificial Intelligence Laboratory
Understanding different types of files
.rda
 .rda
(R’s native data format)
.rda: This is also R's native data format, which stores the specific data
variable as per requirement.
example
Artificial Intelligence Laboratory
Understanding different types of files
.rda - Export
Artificial Intelligence Laboratory
Understanding MySQL
MySQL is the world’s second most widely used
relational database management system(RDBMS)
and most widely used open-source RDBMS.
Since both R and MySQL both are open source,
they can be used for building the interactive web
analytic applications.
https://en.wikipedia.org/wiki/MySQL
Understanding MySQL
Installing MySQL on Linux
Download suitable TAR file from www.mysql.com
shell> groupadd mysql
shell> useradd -r -g mysql mysql
shell> cd /usr/local
shell> tar zxvf /path/to/mysql-VERSION-OS.tar.gz
shell> ln -s full-path-to-mysql-VERSION-OS mysql
shell> cd mysql
shell> chown -R mysql mysql
shell> scripts/mysql_install_db --user=mysql
shell> chown -R root
shell> chown -R mysql data
shell> bin/mysqld_safe --user=mysql
shell> cp support-files/mysql.server /etc/init.d/mysql.server
http://dev.mysql.com/doc/refman/5.6/en/binary-installation.html
Understanding MySQL
Installing RMySQL
install.package(“RMySQL”)
Artificial Intelligence Laboratory
Understanding MySQL
Learning to list the tables and their structure
Artificial Intelligence Laboratory
Understanding MySQL
Importing the data into R
Artificial Intelligence Laboratory
Understanding MySQL
Understanding data manipulation
Artificial Intelligence Laboratory
Understanding Excel
Data manipulation, Exporting the data to excel
Artificial Intelligence Laboratory
Understanding MongoDB
MongoDB is one of many cross-platform document-orineted
databases. Classified as a NoSQL database, MongoDB
eschews the traditional table-base relational database
structure in favor of JSON-like documents with dynamic
schemas, making the integration of data in certain types of
applications easier and faster.
Supported with C, C++, C#, Erlang, Haskell, Java, JavaScript,
Perl, PHP, Python, Ruby, and Scala.
http://dev.mysql.com/doc/refman/5.6/en/binary-installation.html
Understanding MongoDB
Installing MongoDB on Linux
Download suitable TAR file from www.mongodb.org
shell> tar zxvf /path/to/mongoDB-VERSION-OS.tar.gz
shell> ln -s full-path-to-mongoDB-VERSION-OS mongoDB
shell> cd mongoDB
shell> mkdir /data/db
shell> ./mongod
# start a Mongo console
shell> ./mongo
Artificial Intelligence Laboratory
Understanding MongoDB
Mapping SQL to MongoDB
SQL
MongoDB
 Database
 Database
 Table
 Collection
 Index
 Index
 Row
 Document
 Column
 Field
 Joining
 Embedding & Linking
Artificial Intelligence Laboratory
Understanding MongoDB
Understanding data manipulation
Artificial Intelligence Laboratory
Understanding MongoDB
Understanding data manipulation
Artificial Intelligence Laboratory
Understanding MongoDB
Understanding data manipulation
Artificial Intelligence Laboratory
Understanding SQLite
SQLite is a relational database management system contained in a C programming
library. In contrast to other database management systems, SQLite is not a clientserver database engine. Rather, it is embedded into the end program.
http://en.wikipedia.org/wiki/SQLite
Understanding SQLite
Installing SQLite
Download suitable TAR file from www.SQLite.org
shell> tar zxvf /path/to/SQLite-VERSION-OS.tar.gz
shell> ln -s full-path-to-mongoDB-VERSION-OS SQLite
shell> cd SQLite
# start a SQLite console
shell> ./SQLite3
Artificial Intelligence Laboratory
Understanding SQLite
Understanding data manipulation
Artificial Intelligence Laboratory
Understanding SQLite
Understanding data manipulation
Artificial Intelligence Laboratory
Understanding PostgreSQL
PostgreSQL, often simply “Postgres”, is an object-relational database management
system(ORDBMS) with an emphasis on extensibility and standards-compliance. As a
database server, its primary function is to store data securely, supporting best
practices, and to allow for retrieval at the request of other software applications. It
can handle workloads ranging from small single-machine applications to large
Internet-facing applications with may concurrent users. Recent versions also provide
replication of the database itself for availability and scalability.
http://en.wikipedia.org/wiki/PostgreSQL
Understanding PostgreSQL
Installing PostgreSQL
Create the file and add a line to /etc/apt/sources.list.d/pgdg.list
deb http://apt.postgresql.org/pub/repos/apt trusty-pgdg main
shell> wget -quiet -0 - http://www.postgresql.org/media/keys/ACCC4CF8.asc |
\ sudo apt-key add sudo apt-get update
shell> sudo apt-get install postgresql-9.4
# add user and start a postgresql console
shell> su - postgres createuser
shell> sudo -u postgres psql
Artificial Intelligence Laboratory
Understanding PostgreSQL
Installing RPostgreSQL and importing the data
shell> sudo apt-get install libpq-dev
Artificial Intelligence Laboratory
Understanding PostgreSQL
Installing RPostgreSQL and importing the data
Artificial Intelligence Laboratory
Understanding PostgreSQL
Installing RPostgreSQL and importing the data
Artificial Intelligence Laboratory
Understanding PostgreSQL
Installing RPostgreSQL and importing the data
Artificial Intelligence Laboratory
Understanding Apache Hive
Apache Hive is a data warehouse infrastructure built on top of Hadoop for
providing data summarization, query, analysis. While initially developed by
Facebook, Apache Hive is now used and developed by other companies such as
Netflix. Amazon maintains a software fork of Apache Hive that is included in
Amazon Elastic MapReduce on Amazon Web Services
http://en.wikipedia.org/wiki/PostgreSQL
Understanding Apache Hive
Installing Apache Hive on Linux
Download suitable TAR file from http://hive.apache.org
shell> tar zxvf /path/to/apache-hive-VERSION-OS.tar.gz
shell> ln -s full-path-to-apache-hive-VERSION-OS apacheHive
shell> cd apacheHive
shell> ./hiveserver2
Artificial Intelligence Laboratory
Understanding Apache Hive
Understanding data manipulation
Artificial Intelligence Laboratory
Understanding Apache HBase
HBase is an open source, non-relational, distributes database modeled after
Google’s BigTable and written in java. It is developed as part of Apache Software
Foundation’s Apache Hadoop project and runs on top of HDFS, providing BitTablelike capabilities for Hadoop.
http://en.wikipedia.org/wiki/PostgreSQL
Understanding Apache HBase
Installing Apache Hbase, thrift on Linux
Download suitable TAR file from http://hbase.apache.org
shell> tar zxvf /path/to/hbase-VERSION-OS.tar.gz
shell> ln -s full-path-to-hbase-VERSION-OS hbase
shell> cd hbase
shell> ./start-hbase.sh
Download suitable TAR file from http://thrift.apache.org
shell> tar zxvf /path/to/thrift-VERSION.tar.gz
shell> cd full-path-to-thrift-VERSION-OS
shell> ./configure
shell> make
shell> make install
Artificial Intelligence Laboratory
Understanding Apache HBase
Installing Apache Hbase, thrift on Linux
shell> sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev
libboost-system-dev libboost-filesystem-dev libevent-dev automake libtool flex bison
pkg-config g++ libssl-dev
shell> wget http://ftp.debian.org/debian/pool/main/a/automake1.14/automake_1.14.
1-3_all.deb
sudo dpkg -i automake_1.14.1-3_all.deb
Download suitable TAR file from https://github.com/RevolutionAnalytics
Artificial Intelligence Laboratory
Understanding Apache HBase
Importing data in Apache HBase
Artificial Intelligence Laboratory