Deep Learning with Python

Download Report

Transcript Deep Learning with Python

Deep Learning
with Python
파이썬(python)이란?
• 1991년 Guido van Rossum이 발표한 인터프리터 언어
• Google의 3대 개발언어(C/C++, Java, Python)
파이썬의 특징
• 가독성
• 풍부한 라이브러리
• 접착성
• 무료
• 유니코드
파이썬의 종류
• Cpython
• C로 작성된 인터프리터
• Jython
• 자바 가상 머신용 인터프리터
• IronPhthon
• .Net과 Mono용 인터프리터. C#으로 구현됨
• PyPy
• 파이썬으로 작성된 파이썬 인터프리터
설치 및 개발환경
• http://www.python.org/downloads
파이썬 커맨드라인
파이썬 IDLE
자료형 - 수치
자료형 - 문자열
자료형 – List, Set, Tuple, Dictionary
Shallow/Deep Copy
Function
Modules and Packages
•
–
Python modules “package program code and data for reuse.”
(Lutz)
Similar to library in C, package in Java.
•
Python packages are hierarchical modules (i.e., modules that
contain other modules).
•
Three commands for accessing modules:
1.
2.
3.
import
from…import
reload
Modules and Packages: import
• The import command loads a module:
# Load the regular expression module
>>> import re
• To access the contents of a module, use dotted names:
# Use the search method from the re module
>>> re.search(‘\w+’, str)
• To list the contents of a module, use dir:
>>> dir(re)
[‘DOTALL’, ‘I’, ‘IGNORECASE’,…]
Modules and Packages
from…import
• The from…import command loads individual functions and
objects from a module:
# Load the search function from the re module
>>> from re import search
• Once an individual function or object is loaded with
from…import, it can be used directly:
# Use the search method from the re module
>>> search (‘\w+’, str)
Import vs. from…import
Import
from…import
• Keeps module
functions separate
from user functions.
• Requires the use of
dotted names.
• Works with reload.
• Puts module
functions and user
functions together.
• More convenient
names.
• Does not work with
reload.
Modules and Packages: reload
• If you edit a module, you must use the reload command
before the changes become visible in Python:
>>> import mymodule
...
>>> reload (mymodule)
• The reload command only affects modules that have been
loaded with import; it does not update individual functions
and objects loaded with from...import.
NumPy
• Fundamental package for scientific computing with Python
• It contains among other things:
•
•
•
•
a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number
capabilities
NumPy: Functions & Attributes
NumPy: N-dimensional array object
파이썬과 자연언어처리
• Python is a great language for NLP:
• Simple
• Easy to debug:
• Exceptions
• Interpreted language
• Easy to structure
• Modules
• Object oriented programming
• Powerful string manipulation
Introduction to NLTK
• The Natural Language Toolkit (NLTK) provides:
• Basic classes for representing data relevant to natural language
processing.
• Standard interfaces for performing tasks, such as tokenization,
tagging, and parsing.
• Standard implementations of each task, which can be combined to
solve complex problems.
NLTK: Example Modules
• nltk.token: processing individual elements of text, such as
words or sentences.
• nltk.probability: modeling frequency distributions and
probabilistic systems.
• nltk.tagger: tagging tokens with supplemental information,
such as parts of speech or wordnet sense tags.
• nltk.parser: high-level interface for parsing texts.
• nltk.chartparser: a chart-based implementation of the parser
interface.
• nltk.chunkparser: a regular-expression based surface parser.
NLTK: Top-Level Organization
• NLTK is organized as a flat hierarchy of packages and
modules.
• Each module provides the tools necessary to address a
specific task
• Modules contain two types of classes:
• Data-oriented classes are used to represent information relevant to
natural language processing.
• Task-oriented classes encapsulate the resources and methods
needed to perform a specific task.
Installing NLTK
• 32-bit binary installation
• Install Python:
• http://www.python.org/download/releases/3.4.1/ (avoid the 64-bit versions)
• Install Numpy (optional):
• http://sourceforge.net/projects/numpy/files/NumPy/1.8.1/numpy-1.8.1-win32-superpackpython3.4.exe
• Install NLTK:
• http://pypi.python.org/pypi/nltk
• Test installation:
• Start>Python34, then type import nltk
Installing NLTK Data
NLTK Corpora
NLTK Book
Simple Statistics
NLP Pipeline
Using a Tagger
Supervised Classification
Gender Identification
Gender Identification(cont.)
WordNet
Semantic Similarity
Theano
• Python library that allows you to define, optimize, and
evaluate mathematical expressions involving multidimensional arrays efficiently
• Easy parallelization: CPU or GPU
• Speed optimization
• Deep learning tutorial codes
• Good maintenance
• A great user group
Requirements
• Linux, Mac OS X or Windows operating system
• Python >= 2.6
• NumPy >= 1.6.2
• SciPy >= 0.11
• A BLAS installation (with Level 3 functionality)
Easy Installation of an Optimized
Theano on Current Ubuntu
• For Ubuntu 11.10 through 14.04:
• sudo apt-get install python-numpy python-scipy python-dev
python-pip python-nose g++ libopenblas-dev git
• sudo pip install Theano
• For Ubuntu 11.04:
• sudo apt-get install python-numpy python-scipy python-dev
python-pip python-nose g++ git libatlas3gf-base libatlas-dev
• sudo pip install Theano
Test the newly installed packages
• NumPy (~30s): python -c "import numpy; numpy.test()“
• SciPy (~1m): python -c "import scipy; scipy.test()“
• Theano (~30m): python -c "import theano; theano.test()"
Adding two Scalars
>>> import theano.tensor as T
>>> x = T.dscalar('x')
>>> y = T.dscalar('y')
>>> z = x + y
>>> z.eval({x : 16.3, y : 12.1})
array(28.4)
Adding two Matrices
>>> x = T.dmatrix('x')
>>> y = T.dmatrix('y')
>>> z = x + y
>>> f = function([x, y], z)
>>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]])
array([[ 11., 22.],
[ 33., 44.]])
Logistic Function
>>> x = T.dmatrix('x')
>>> s = 1 / (1 + T.exp(-x))
>>> logistic = function([x], s)
>>> logistic([[0, 1], [-1, -2]])
array([[ 0.5
, 0.73105858],
[ 0.26894142, 0.11920292]])
Restricted Boltzmann Machine
RBM in Theamo
class RBM(object):
"""Restricted Boltzmann Machine (RBM) """
def __init__(
self,
input=None,
n_visible=784,
n_hidden=500,
W=None,
hbias=None,
vbias=None,
numpy_rng=None,
theano_rng=None
):
Generative Training
Contrastive Divergence
Bayesian network(=Belief network)
• probabilistic graphical model
• represents a set of random variables and their conditional
dependencies via a directed acyclic graph
Layer Stacking
Deep Learning for WSD
• WSD(Word Sense Disambiguation)
• 둘 이상의 의미로 사용되는 어휘가 문맥에서 어떤 의미로 사용되었는
지를 구분하는 작업
차에서 내리자 내리는 눈 때문에 모두 건물 안으로 뛰어갔다.
• 기계번역/정보검색 등 자연언어처리 응용시스템의 성능을 좌우함
세종 의미분석 말뭉치
Naïve Bayes for WSD
• Bayes’ Rule
sk: 중의성 어휘 s의 의미
c : 중의성 어휘 s의 주변 문맥 어휘
SVM for WSD
Word2Vec
Word2Vec Install
• Download the code: svn
checkout http://word2vec.googlecode.com/svn/trunk/
•Download the code: svn
checkout http://word2vec.googlecode.com/svn/trunk/
• Run 'make' to
compile word2vec tool
•Run 'make' to compile word2vec tool
•Runscripts:
the demo scripts:
./demo-word.sh andand
./demo• Run the demo
./demo-word.sh
./demo-phrases.sh
phrases.sh
•Forabout
questionsthe
abouttoolkit,
the toolkit,
• For questions
see http://groups.google.com/group/word2vec-toolkit
see http://groups.google.com/group/word2vec-toolkit
Vector Representation in Word2Vec
>>> model = Word2Vec(sentences, size=200) # default value is 100
>>> model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1)
[('queen', 0.50882536)]
>>> model.doesnt_match("breakfast cereal dinner lunch".split())
'cereal'
>>> model.similarity('woman', 'man')
0.73723527
>>> model['computer'] # raw NumPy vector of a word
array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)