Transcript Chapter 5
Computer Science 121
Scientific Computing
Winter 2016
Chapter 5
Files and Scripts
Files and Scripts
File (non-technical): (Word) document, image,
recording, video, etc.
File (technical): a named sequence of bytes on
disk.
ASCII vs. Binary
“ASCII file” means “file that can be viewed as text by a
program (Notepad) that interprets each byte as an ASCII
code”.
Binary file is anything that cannot be viewed that way
“JPEG file” means “file that can be viewed as an image by using a
program (Photoshop) that interprets the bytes as JPEG-encoded
image.
“MP3 File” means “file that can be watched/heard as a video/audio
recording by using a program that interprets the bytes as an MP3encoded video / audio stream”.
“Foo File” means “file whose contents can be experienced by using
a program that interprets the bytes as a Foo encoding”.
XML (eXtensible Markup Language) is an attempt to
compromise between binary and ASCII: make all data
human-readable
5.1 Filenames
General format: name . extension
For historical reasons, extension is usually three
characters.
Extension tells OS what program to use to open
file (MS Word, Excel, ImageView, ...)
Aside: File Deletion
Q.: What happens when you “delete” a file?
sort.m
foo.m
OMFG.jpg
hamlet.doc
011010
110101
000100
111011
(Drag OMFG.jpg to trash and empty trash…)
Aside: File Deletion
A.: What appears to happen...
foo.m
sort.m
hamlet.doc
011010
110101
111011
Aside: File Deletion
A.: What actually happens ...
foo.m
sort.m
011010
110101
000100
hamlet.doc
111011
Then use WinUnDelete (e.g.) to get back
OMFG.jpg
Directory Structure
Directories (folders) are organized hierarchically
(one inside another)
So we are forced to choose a single organization
method (like library with card catalog indexed
only by author)
But we can use links (shortcuts) to add additional
organization, without copying files.
Pathnames
Pathname is “full name” of directory in a linear
form
– e.g., C:\MyDocuments\cs121\myproj\new\
Complete filename includes path
e.g.,
C:\MyDocuments\cs121\myproj\new\myprog.py
This becomes important because of the ...
Working Directory
>>> import os
>>> print(os.getcwd())
>>> /home/levys/csci121
Without installed packages (numpy, matplotlib), we could only
access / import code that resides in our working directory
IDLE makes it easy to start in the right working directory
– Linux: Launch a terminal from that directory (folder), then run
IDLE
– Windows: Launch IDLE, do File/Open, find your code, and hit
F5 (also works in Linux and Mac OS X)
Occasionally, you may need to adjust the PYTHONPATH system
variable to get access to Python packages
5.2 Importing and Exporting Data
Many kinds of data files, with many Python
packages to work with them
We'll focus here on two popular types of ASCII
file:
– Numerical spreadsheets in CSV (CommaSeparated Values) format exported from Excel
– Non-numerical text files from repositories
(e.g., Project Gutenberg)
Spreadsheet data should have all cells filled (“flat
format”), with numerical values:
YES
NO
Importing CSV files with NumPy's
loadtxt
If file has no column headers (names):
>>> a = loadtxt("sunspots-no-header.csv", delimiter=",")
If file does have column headers:
>>> a = loadtxt("sunspots-with-header.csv", delimiter=",",
skiprows = 1)
Open file in Excel to see how many rows to skip
Non-numerical ASCII Files
Almost always, we want a list of individual words in
the file (not characters)
Remember: open / read / split
>>> f = open("hamlet.txt")
>>> s = f.read()
>>> w = s.split()
>>> w = open("hamlet.txt").read().split()
As with loadtxt, we can pass various delimiters to
split to remove punctuation from the words.
5.3 Scripts
A script is a file, with extension .py, that allows
you to execute a sequence of Python commands
as if you were typing them in by hand.
From IDLE: File / Open … (find the script), then
hit F5 key to run
Development cycle: Run, fix errors, run again, fix
more errors, … $#@! … victory dance!!!
IDLE will tell you where the errors are (line #)
Scripts + Comments = Reproducible Results!