print - hkust cse

Download Report

Transcript print - hkust cse

Introduction to Python
Building a Web Crawler in Python
Feb 13
Why Python?
• Easy to learn, yet powerful
• Emphasizes readability
• Great as both a scripting/glue language and for full-blown
application development
• "Scales with the ability of the programmer”
• …
Introduction to Python
• For Mac OS/most of the Linux and Unix users:
• Python has already been installed
• For Windows users:
• You can follow the instructions here:
• Download your python here:
Running Python
There are many ways Python can be used:
• Interactively
• Run the python program with no arguments and end up at something like this:
• Type “python” at your command line, then ENTER
• Write your codes (e.g., print “hello world”)
• Type “exit()” to exit
• Useful for succinct tests, debugging, and for demonstrations
Running Python
There are many ways Python can be used:
• Non-interactively
• Write a script (a text file) and run it at the command line with python,
maybe adding arguments and other options.
• Example:
• Write a script and name it
• Run it at command line (under the directory where you put the
using the command “python”
Running Python
There are many ways Python can be used:
• Using an IDE (Interactive Development Environment)
• A hybrid of the above — save work in script files, but maintain an interactive
session for running them and debugging
• Python comes with its own IDE named idle.
• Other IDE:
• Pydev with Eclipse
• PyCharm
• Wing IDE
• Komodo IDE
• Sublime Text
• …
Using Python Interactively
• First some explanation of terminology and notation:
• Text written in the Python language (or any language) is generically referred to
as source code.
• >>> means the interpreter is waiting for input (more code).
• So does ... — specifically, as a continuation of the previous code.
• Idle's interactive interpreter actually just indents instead of using the ... notation.
• Indentation, even TABs vs. SPACEs, matters!!! (More on that later.)
Comments in Python
Python code can be sprinkled with comments that are ignored by the interpreter.
The Single Line Comments in Python...
• begin with the # character,
• can be on lines by themselves or follow on the same lines as code,
• take effect until the end of the line.
The Multiple Line Comments in Python...
• begin with ’’’ , end with ‘’’’.
The interpreter prints whatever your code evaluates to.
Since python has all sorts of numeric types, it can be used as a simple calculator.
the operators +, -, * and / work like in most other languages
parentheses can be used for grouping
>>> 2+2
>>> (50-5*6)/4
>>> 7/3 # integer division returns the floor:
>>> 2**3 # exponentiation
Python has a float type for floating point numbers:
>>> 3 * 3.75 / 1.5
>>> 42 * 1.234e-2
Operators with mixed type operands convert the integer operand to floating point:
>>> 7.0 / 2
>>> 7 / 2.0
• Python also has a long type for arbitrary-length integers; conversion is usually
automatic if necessary:
>>> 2**70
• Python also support octal and hex bases, complex, numbers, etc.
• documentation on numeric types is at:
Variable Assignment
• The equal sign (=) is used to assign a value to a variable (often more appropriate to
read this as gets rather then equals):
>>> width = 20
>>> height = 5*9
>>> width * height
• Python developers often like to use assignment shorthand:
>>> x = y = 0
>>> x
>>> y
>>> x, y = 4, 2
>>> x
>>> y
Variables and Types
• Note that we never declared the type of the variable — Python uses dynamic
typing and dynamic binding:
>>> x = 1
>>> type(x) #this returns the type of x rather than its
<type 'int'>
>>> x = 1.2
>>> type(x)
<type 'float'>
• Variables aren't defined until you give them a value:
>>> n
Traceback (most recent call last): File "<stdin>", line 1, in
<module> NameError: name 'n' is not defined
• Python strings hold text data:
• Strings can be enclosed in single- or double-quotes (unlike bash and perl, there is no
significant difference).
• The \ escape character can be used to embed quotes within strings.
• The print statement nicely prints strings to the screen.
>>> print 'spam eggs'
spam eggs
>>> print 'doesn\'t'
>>> print "doesn't"
>>> print '"Yes," he said.'
"Yes," he said.
>>> print "\"Yes,\" he said."
"Yes," he said.
>>> print '"Isn\'t," she said.’
"Isn't," she said.
The \ is also used to write the newline (\n), tab (\t), and other special characters:
>>> print 'line one\nline two\nline three'
line one
line two
line three
>>> print 'topic\n\tsub1\n\tsub2' #(actual tabs would work here, too)
Prefix a string literal with r (for raw) to not have it treat any characters as special (this will
come in very handy later with regular expressions):
>>> print r'topic\n\tsub1\n\tsub2'
String Operations
The + operator works on strings, too (called concatenation), as does *:
>>> word = 'Help' + 'A'
>>> word
>>> '<' + word*5 + '>'
Note unlike some other languages, there is no separate type for a single character — all
text is string:
>>> type('c')
<type 'str'>
String Operations
Strings can be subscripted (indexed), and sliced.
Python indexing starts at 0.
>>> word[4]
>>> word[0:2]
>>> word[:2] # The first index defaults to zero
>>> word[2:4]
>>> word[2:] # The last index defaults to the end of the string
And negative indices count backwards from the end:
>>> word[-1] # The last character
>>> word[:-2] # Everything except the last two characters
String Methods
Strings are objects with methods. We'll define this later, but here are some examples:
>>> print 'a' + ' foo '.strip() + 'z'
>>> first, last = 'George Washington'.split()
>>> first
>>> last
>>> 'abcdefghijklmnopqrstuvwxyz'.find('m')
See these for everything you want to know about Python strings.
Control Structures – if/else
Of course, rarely is just a sequence of value manipulations useful, we need control structures to build logic
into a program. Perhaps the most well-known control structure is the if statement. Here's Python's:
>>> x = 5
>>> if x < 0:
print 'negative'
...elif x == 0:
print 'zero'
print 'positive'
This introduces several concepts, including comparison operators (<, >, == (equal to — don't confuse
with assignment), <, >=, and != (not equal to)) and Boolean values:
>>> x < 0
>>> x > 0
and Indentation.
Unlike most other programming language, indentation matter — it's Python's way of grouping statements
(unlike the curly braces using in, for example, C).
•The body of a control structure must be uniformly indented (Idle, and many other syntax-aware editors,
automatically indent).
•When a compound statement is entered interactively, it must be followed by a blank line to indicate
completion (since the parser cannot guess when you have typed the last line).
•TABs vs. SPACEs, matters!!!
>>> if True:
print 'x' #leading space is a TAB
print 'y' #leading space is four SPACEs
File "<stdin>", line 3
print 'y' #leading space is four SPACEs ^
IndentationError: unindent does not match any outer indentation level
Control Structures – while
Here's another control structure, this time a loop:
>>> # Fibonacci series:
... # the sum of two elements defines the next
... a, b = 0, 1
>>> while b < 10:
print b
a, b = b, a+b
It re-executes its body until its condition is False (if b were not updated, it would result in an infinite
Sequence - Lists
In addition to numbers and strings, Python has several compound data types, used to group together
other values. The most versatile is the list (a list of comma-separated values (items) between square
brackets (list items need not all have the same type):
>>> a = ['spam', 'eggs', 100, 1234]
>>> a
['spam', 'eggs', 100, 1234]
Lists are a type of sequence object; strings are sequences, too, and like strings, they can be indexed, sliced,
concatenated, etc.:
>>> a[0]
>>> a[1:3]
['eggs', 100]
>>> a[:2] + ['bacon', 2*2]
['spam', 'eggs', 'bacon', 4]
Sequence - Lists
Lists are objects with methods, too (more on that later):
>>> a.append(9.87)
>>> a
['spam', 'eggs', 100, 1234, 9.87]
>>> a.pop()
>>> a
['spam', 'eggs', 100, 1234]
Sequence - Tuples
It is possible to change individual elements of a list:
>>> a
['spam', 'eggs', 100, 1234]
>>> a[2] = a[2] + 99
>>> a
['spam', 'eggs', 199, 1234]
We say that lists are mutable (strings are actually not mutable, you must make new strings). Python has a
close cousin to the list called the tuple. Tuples are like lists, but the syntax uses parentheses instead of
square brackets, and they're not mutable:
>>> b = ('spam', 'eggs', 100, 1234)
>>> b[2] = b[2] + 99
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not
support item assignment
Sequence - Tuples
Lists, tuples, and strings, like all sequences, have a length that can be determined using a built-in function:
>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
Control Structures - misc
There are several other keywords used to
• for We can loop over anything iterable using a for statement
• break breaks out of the smallest enclosing for or while loop
• continue continues with the next iteration of the loop (i.e. prematurely)
• pass do nothing (used as a placeholder)
Other Materials for Python Beginners
• Useful introduction material for Python beginners:
Harvard Python Workshop
The Python Tutorial (official)
Python for Beginners (official)