Transcript Chapter 5

Computer Science 121
Scientific Computing
Winter 2016
Chapter 5
Files and Scripts
Files and Scripts


File (non-technical): (Word) document, image,
recording, video, etc.
File (technical): a named sequence of bytes on
disk.

ASCII vs. Binary


“ASCII file” means “file that can be viewed as text by a
program (Notepad) that interprets each byte as an ASCII
code”.
Binary file is anything that cannot be viewed that way




“JPEG file” means “file that can be viewed as an image by using a
program (Photoshop) that interprets the bytes as JPEG-encoded
image.
“MP3 File” means “file that can be watched/heard as a video/audio
recording by using a program that interprets the bytes as an MP3encoded video / audio stream”.
“Foo File” means “file whose contents can be experienced by using
a program that interprets the bytes as a Foo encoding”.
XML (eXtensible Markup Language) is an attempt to
compromise between binary and ASCII: make all data
human-readable
5.1 Filenames



General format: name . extension
For historical reasons, extension is usually three
characters.
Extension tells OS what program to use to open
file (MS Word, Excel, ImageView, ...)
Aside: File Deletion

Q.: What happens when you “delete” a file?
sort.m
foo.m
OMFG.jpg
hamlet.doc

011010
110101
000100
111011
(Drag OMFG.jpg to trash and empty trash…)
Aside: File Deletion

A.: What appears to happen...
foo.m
sort.m
hamlet.doc
011010
110101
111011
Aside: File Deletion

A.: What actually happens ...
foo.m
sort.m
011010
110101
000100
hamlet.doc

111011
Then use WinUnDelete (e.g.) to get back
OMFG.jpg
Directory Structure



Directories (folders) are organized hierarchically
(one inside another)
So we are forced to choose a single organization
method (like library with card catalog indexed
only by author)
But we can use links (shortcuts) to add additional
organization, without copying files.
Pathnames

Pathname is “full name” of directory in a linear
form
– e.g., C:\MyDocuments\cs121\myproj\new\

Complete filename includes path
 e.g.,
C:\MyDocuments\cs121\myproj\new\myprog.py

This becomes important because of the ...
Working Directory
>>> import os
>>> print(os.getcwd())
>>> /home/levys/csci121


Without installed packages (numpy, matplotlib), we could only
access / import code that resides in our working directory
IDLE makes it easy to start in the right working directory
– Linux: Launch a terminal from that directory (folder), then run
IDLE
– Windows: Launch IDLE, do File/Open, find your code, and hit
F5 (also works in Linux and Mac OS X)

Occasionally, you may need to adjust the PYTHONPATH system
variable to get access to Python packages
5.2 Importing and Exporting Data


Many kinds of data files, with many Python
packages to work with them
We'll focus here on two popular types of ASCII
file:
– Numerical spreadsheets in CSV (CommaSeparated Values) format exported from Excel
– Non-numerical text files from repositories
(e.g., Project Gutenberg)
Spreadsheet data should have all cells filled (“flat
format”), with numerical values:
YES
NO
Importing CSV files with NumPy's
loadtxt

If file has no column headers (names):
>>> a = loadtxt("sunspots-no-header.csv", delimiter=",")

If file does have column headers:
>>> a = loadtxt("sunspots-with-header.csv", delimiter=",",
skiprows = 1)

Open file in Excel to see how many rows to skip
Non-numerical ASCII Files

Almost always, we want a list of individual words in
the file (not characters)

Remember: open / read / split
>>> f = open("hamlet.txt")
>>> s = f.read()
>>> w = s.split()
>>> w = open("hamlet.txt").read().split()

As with loadtxt, we can pass various delimiters to
split to remove punctuation from the words.
5.3 Scripts




A script is a file, with extension .py, that allows
you to execute a sequence of Python commands
as if you were typing them in by hand.
From IDLE: File / Open … (find the script), then
hit F5 key to run
Development cycle: Run, fix errors, run again, fix
more errors, … $#@! … victory dance!!!
IDLE will tell you where the errors are (line #)
Scripts + Comments = Reproducible Results!