Tutorial 3 - cse, hkust

Download Report

Transcript Tutorial 3 - cse, hkust

MySQL
Building a Web Crawler in Python
CSE, HKUST
Feb 20
Review: Some useful tips…
Remember that YOU DON’T NEED TO INSTALL PYTHON
if you are a MAC/Linux/Unix user…
• Check whether you have Python installed in your computer first
• Type ‘python’ in your Terminal
Review: Some useful tips…
Some people are using Python 3 and may have problem using urllib2
• Reason: The urllib2 module has been split across several modules in
Python 3 named urllib.request and urllib.error
• Change your code to:
import urllib.request
wp = urllib.request.urlopen("http://xxx.com")
pw = wp.read()
print(pw)
• Useful link: http://stackoverflow.com/questions/2792650/python3error-import-error-no-module-name-urllib2
Review: Some useful tips…
Some people have problems when installing pip or other packages
• There is “permission denied” problem, such as:
OSError: [Errno 13] Permission denied:
'/lib/python2.7/site-packages/requests'
• Use this command in your Terminal:
sudo chown -R (your_user_name) /absolute/path/to/directory
e.g. sudo chown -R $User Library/python2.7/xxxx
sudo chown -R $USER /Library/Python/2.7/
• Useful link: https://github.com/pypa/pip/issues/3761
Review: Some useful tips…
How to use BeautifulSoup?
• BeautifulSoup is just a tool for you to maintain the html in a convenient way
• You also need to be familiar with String operation in Python (mentioned in Tutorial 1)
• Official documents of BeautifulSoup:
• https://www.crummy.com/software/BeautifulSoup/bs4/doc/#
• Official documents of String in Python:
• https://docs.python.org/2/library/string.html
• (An easier tutorial) https://www.tutorialspoint.com/python/python_strings.htm
Review: Some useful tips…
A Good website for learning and practising Python
• https://www.codecademy.com/learn/python
Database Introduction
Useful links:
https://www.tutorialspoint.com/mysql/mysql-introduction.htm
https://www.w3schools.com/sql/default.asp
Database Introduction
Database: A database is a collection of tables, with related data.
Table: A table is a matrix with data. A table in a database looks like a simple spreadsheet.
Column: One column (data element) contains data of one and the same kind, e.g. the column ContactName.
Row: A row (= tuple, entry or record) is a group of related data, e.g. all the data of one customer.
Primary Key: A primary key is unique. A key value can not occur twice in one table. With a key, you can find
at most one row.
Primary Key
Column
Row
MySQL Installation
• All downloads for MySQL are located at: https://dev.mysql.com/downloads/mysql/
• Pick the version number for MySQL Community Server you want and, as exactly as
possible, the platform you want.
• During the installation, there will be a message from the installer. Don’t miss it!!!
e.g.
This is your originsl password!!!!
1euUtGy.+U(k
MySQL Installation
• For Mac user, you need to start the mysql services in System Preference:
MySQL Installation
• First find your installation path!
• (Mine is /usr/local/mysql/bin/mysql)
• Then type these two commands in your Terminal:
• alias mysql=/usr/local/mysql/bin/mysql
• alias mysqladmin=/usr/local/mysql/bin/mysqladmin
• (So that you don’t need to type a lot of things when you use mysql every time…)
Try to Use MySQL!!
• Log in to MySQL using the original password:
• The command in Terminal is:
• mysqladmin -u root -p
• (Then type your original password…)
• Successfully log in!
Try to Use MySQL!!
• If you want to reset your password…
• The command in Terminal is:
• mysqladmin -u root -p password
How to use it?
• The command in Terminal is:
• mysql -u root -p
• (Then Enter your password)
Show databases
• After login to mysql, first check the databases you have:
• Command:
• show databases;
• So by Default there are already 4 databases…
Create database
• You can create a new database
• Command:
• create database (your_database_name);
Use database
•
•
•
•
•
You need to go into a database, then create table inside this database
Command:
use (your_database_name);
You can check the tables inside this database using:
Show tables;
Create table
• You can create table using the command:
• CREATE TABLE table_name (column_name column_type);
• E.g.
CREATE TABLE tutorials_tbl(
tutorial_id INT NOT NULL AUTO_INCREMENT,
tutorial_title VARCHAR(100) NOT NULL,
tutorial_author VARCHAR(40) NOT NULL,
submission_date DATE,
PRIMARY KEY ( tutorial_id ) );
Create table
• E.g.
CREATE TABLE tutorials_tbl(
tutorial_id INT NOT NULL AUTO_INCREMENT,
tutorial_title VARCHAR(100) NOT NULL,
tutorial_author VARCHAR(40) NOT NULL,
submission_date DATE,
PRIMARY KEY ( tutorial_id ) );
• Here few items need explanation:
• Field Attribute NOT NULL is being used because we do not want this field to be NULL.
So if user will try to create a record with NULL value, then MySQL will raise an error.
• Field Attribute AUTO_INCREMENT tells MySQL to go ahead and add the next
available number to the id field.
• Keyword PRIMARY KEY is used to define a column as primary key. You can use
multiple columns separated by comma to define a primary key.
Create table
• E.g.
Describe table
• Command:
Describe (table_name)
Insert Query
• Command:
INSERT INTO table_name ( field1, field2,...fieldN )
VALUES ( value1, value2,...valueN );
• E.g.
• INSERT INTO tutorials_tbl (tutorial_title, tutorial_author,
submission_date)
VALUES ("Learn PHP", "John Poul", NOW());
• INSERT INTO tutorials_tbl (tutorial_title, tutorial_author,
submission_date)
VALUES ("Learn MySQL", "Abdul S", NOW());
Select Query
• Command:
SELECT field1, field2,...fieldN table_name1, table_name2...
[WHERE Clause] [OFFSET M ][LIMIT N]
• You can use one or more tables separated by comma to include various conditions
using a WHERE clause, but WHERE clause is an optional part of SELECT command.
• You can fetch one or more fields in a single SELECT command.
• You can specify star (*) in place of fields. In this case, SELECT will return all the fields.
• You can specify any condition using WHERE clause.
• You can specify an offset using OFFSET from where SELECT will start returning
records. By default offset is zero.
• You can limit the number of returns using LIMIT attribute.
Select Query
Select Query
Other Useful Queries…
https://www.tutorialspoint.com/mysql/index.htm
So How can We Use MySQL in Python?
Just simply install and import the MySQL package!!
• Method 1: Use pip in Ternimal (if pip is already installed). Recommend! Easiest way to install!
$ pip install MySQL-python
• Method 2: Download the package from the website and install it More difficult…
• https://sourceforge.net/projects/mysql-python/?source=typ_redirect
• Unzip the file and install it in your Terminal
$ tar zxvf MySQL-python-1.2.2.tar.gz (This is a command used for unzip the
file. If you have upzip manually then you can skip this one)
$ cd MySQL-python-1.2.2
$ python setup.py build
$ python setup.py install
So How can We Use MySQL in Python?
For other operating systems…
• If you are using Ubuntu 14, Ubuntu 16, Debian 8.6 (jessie):
$ sudo apt-get install python-pip python-dev libmysqlclient-dev
• Fedora 24:
$ sudo dnf install python python-devel mysql-devel redhat-rpm-config gcc
•
Starting from a fresh Ubuntu 14.04.2 system, these two commands were needed:
$ apt-get install python-dev libmysqlclient-dev
$ pip install MySQL-python
Test if your MySQLdb has been installed
• No error!
A simple example for using MySQLdb in Python
• Write your Python code to create tables in MySQL!
import MySQLdb
db =
MySQLdb.connect(host='localhost',user='root',passwd=’(your_password)',
port=3306,db=’(your_database_name)',charset="utf8mb4")
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS "+table_name)
sql= """CREATE TABLE """+table_name+""" (
tutorial_id INT NOT NULL AUTO_INCREMENT,
tutorial_title VARCHAR(100) NOT NULL,
tutorial_author VARCHAR(40) NOT NULL,
submission_date DATE,
PRIMARY KEY ( tutorial_id ) );"""
print sql
cursor.execute(sql)
print 'create finish!'
db.close()
A simple example for using MySQLdb in Python
• Write your Python code for insert query in MySQL!
import MySQLdb
db =
MySQLdb.connect(host='localhost',user='root',passwd=’(your_password)',
port=3306,db=’(your_database_name)',charset="utf8mb4")
cursor = db.cursor()
sql1= """INSERT INTO """+table_name+"""
(tutorial_title, tutorial_author, submission_date)
VALUES ("Learn PHP", "John Poul", '2017-02-23');"""
print sql
try:
cursor.execute(sql)
db.commit()
print 'insert success!'
except:
# Rollback in case there is any error
db.rollback()
print 'insert fail!'
db.close()
A simple example for using MySQLdb in Python
• Write your Python code for select query in MySQL!
import MySQLdb
db =
MySQLdb.connect(host='localhost',user='root',passwd=’(your_password)',
port=3306,db=’(your_database_name)',charset="utf8mb4")
cursor = db.cursor()
cursor.execute("SELECT * from tutorials_tbl;")
for record in cursor.fetchall():
print str(record[0]) + ', '+ str(record[1]) + ', '+ str(record[2]) + ', '+ str(record[3])
print 'finish'
db.close()
A simple example for using MySQLdb in Python
• Write your Python code for select query in MySQL!
import MySQLdb
db =
MySQLdb.connect(host='localhost',user='root',passwd=’(your_password)',
port=3306,db=’(your_database_name)',charset="utf8mb4")
cursor = db.cursor()
cursor.execute("SELECT * from tutorials_tbl;")
for record in cursor.fetchall():
print str(record[0]) + ', '+ str(record[1]) + ', '+ str(record[2]) + ', '+ str(record[3])
print 'finish'
db.close()
Demo files
• mysqlTest.py
• It demonstrates how to use Python to create/insert/select from tables in MySQL
• Remember to fill in your password and db name in the code before you run it!!!
Demo files
• stockCrawler.py
html
Website
1. Get the
website
Useful
Info 3. Store in
2. Parsing
Using Regular
expression
(regux)
TXT
txt
db
4. Read
from TXT
and Store
in MySQL
Demo files
• stockCrawler_withoutTxtBackup.py
html
Website
1. Get the
website
Useful
Info
2. Parsing
Using Regular
expression
(regux)
db
3. Store in
MySQL
Demo files
• stockCrawler_bs.py
html
Website
1. Get the
website
Useful
Info 3. Store in
2. Parsing
Using
BeautifulSoup
TXT
txt
db
4. Read
from TXT
and Store
in MySQL
Demo files
• stockCrawler_bs_withoutTxtBackup.py
html
Website
1. Get the
website
Useful
Info
2. Parsing
Using
BeautifulSoup
db
3. Store in
MySQL