Transcript Lecture 1

GTECH 731 Programming for Geographic Applications
Tuesdays 5:35 p.m. - 9.15 p.m. Room 1090B-HN
Professor Sean Ahearn
[email protected]
212-772-5327
1023 Hunter North CARSI Lab
Teaching Assistant
Tony Ierulli
[email protected]
914-471-1526
GTECH 731 Programming for Geographic Applications
Texts
Required: Learning C# 3.0 (Paperback) by Jesse Liberty and Brian MacDonald.
Optional: Java Programming for Spatial Sciences by Jo Wood. Java is very similar to C#, but not
similar enough for us to use this as the primary text. However, it may be useful to read in parallel
with the Liberty text.
Other readings may be given out in the form of handouts.
GTECH 731 Programming for Geographic Applications
Attendance and Exercises
Assignments: There will be short assignments almost weekly. It is very important to stay up-to-date
with these, because each assignment will build directly on the last one. These will account for most
of the grade.
Absences: Especially in the first half of the semester, any missed material will be problematic since
each topic depends on the preceding topics.
Plagiarism: It is important to do your own work and work through the problems yourself.
Lab policies: Always delete your working files from you local machine and keep all your files on the
network drives. Don’t install any software, and otherwise abide by the lab policies:
http://www.geography.hunter.cuny.edu/~tbw/spars/rules.html
1. Programming Background
• Computer basics
• Programs
• Languages
• Role of the operating system
• C# program elements
Computer Basics
The basic concept of a computer was first envisioned by Turing in 1936, when he described an
abstract model of the modern computer:
Details
From Wood, 2002.
Computer Basics
Turing then devised the “Universal Turing Machine”, a related thought-experiment
where the machine can run any other defined Turing machine.
This corresponds most closely to an actual computer, where any algorithm can be
run on a single machine.
For an interesting and more in-depth discussion, of this topic see:
Martin Davis. Engines of Logic: Mathematicians and the Origin of
the Computer. Chapter 7 "Turing Conceives of the all-purpose
computer" Norton, 2001. ISBM 0393322297.
Computer Basics
Rather than using arbitrary symbols, computers represent everything as a zero or a one (a bit), usually
grouped into multiples of 8 (a byte).
Most PCs now have a 32- or 64-bit architecture, which means that data is most often treated in units of four
or eight bytes, for example:
0001 1000
1010 0111
0111 1111
0000 0000
What exactly these numbers mean depends on the context. They can represent:
An instruction in a program:
ADD two numbers
MOVE this information from this location in memory to another
An integer:
123,456
-12
A floating-point number:
1.1234
123,456.789
A location in the computer’s memory:
The place where the text of the constitution is stored
Letters:
ABCD
Etc., etc.
Computer Basics
System diagram
Program memory refers to the memory storing the actual program, which is normally loaded
from disk into memory, and data memory, which is where information manipulated by the
program is stored.
The processor treats these two kinds of information differently. Programs are a sequence of
instructions executed by the processor; data is information altered and stored by the
program.
Diagram from Hordeski, 1990.
Computer Basics
Processor diagram
The processor contains registers which
contain the information it is currently operating
on, the current program location, and other
critical information.
Registers are located in the heart of the CPU
(central processing unit) and represent the
fastest-working part of the system.
Information is then transferred from a cache,
or short-term memory on the main chip, which
in turn transfers information to and from main
memory (RAM).
Information in RAM may then be passed on to
disk (e.g., in file/save), or a network, printer,
etc.
Diagram from Hordeski, 1990.
More information on registers...
Computer Basics
Memory hierarchy
This results in a hierarchy of storage areas:
fastest
Registers
Cache
Memory (nanoseconds)
Disk (milliseconds)
Network Drives (eg, fileservers) (seconds)
Slowest
Registers and Cache are not usually managed by the programmer directly.
Programs
A program is a series of instructions that operate on data. The central processing unit reads instructions
in sequence from memory, and executes them one by one, in a kind of loop:
Diagram from http://en.wikipedia.org/wiki/Image:CPU_block_diagram.svg
Programs
Each instruction, or group of bits understood as a command by the processor, is loaded from memory by
the processor, and results in a particular action being taken.
Rather than using the long binary number or machine code, a programmer can represent the instruction
with a mnemonic.
Example from http://www.compilers.net/paedia/assembly_language/index.htm.
Each kind of processor has its own Instruction Set, which means that these instructions are different for
different chip makers. This is a large part of why, for example, code written for Motorola and Intel chips had
such a hard time cooperating.
Programs
These mnemonics are the basis of assembly language. In assembly language, you have to explicitly deal
with low-level details like registers and locations in memory, which allows you to write very efficient code.
However, it is extremely time-consuming and impractical for most applications.
Usually programming is done in a higherlevel language, which is automatically
translated into machine code. In a high-level
language, very few words represent many
lines of assembly code.
Example from http://www.pcmag.com/encyclopedia_term/0,2542,t=compiler&i=40105,00.asp
Languages
These high-level languages are all basically variations on replacement algorithms, or grammars, wherein
rules are implemented which govern what system of replacements generates the final program.
There are many ways a language syntax can be described. The following is a sample of the kind of
grammar diagram that can succinctly describe a statement. Each valid language statement can be replaced
either by other language statements, or by entities that can be ultimately distilled to machine code.
Language grammars are a very large topic, but not really necessary when actually programming.
See the relevant Wikipedia entry for more details.
Languages
Lineages
Languages follow lineages, where each language share’s characteristics of its predecessors. In
this course we will be using C#, which is closely related to Java and descends from C and C++.
C is a very low-level, systems-oriented procedural language that made it easy for
programmers to write code as economical as assembly language. It also made it easy to
make mistakes and write buggy software.
C++ added some more advanced features, making it possible to write more high-level code,
but it but left all the original problems of C in place. In some ways this made things worse by
making them more complicated.
Java rectified most of these problems, but the Java system is geared toward cross-platform
development, and is awkward when using system-specific features (like Windows user
interfaces).
C# has the benefits of Java, but is closely linked to Microsoft’s .NET framework, which allows
you to write fully functional Windows programs that use all of the features of the OS.
Languages
Lineages
For a more up-to-date diagram, go to http://www.levenez.com/lang/history.html#05
Languages
High-level versus low-level
There are a variety of ways of classifying languages. High-level and low-level is sometimes a useful
distinction, although it can be misleading because you can, for example, write high-level functions using a
low-level language, and some high-level languages fully support low-level functions.
From Wood, 2002.
Languages
Compiled versus interpreted
In interpreted languages, the code is not compiled to machine code, but, when the
program is run, the instructions are translated into system commands by a separate
program (the interpreter) on the fly.
Most scripting languages, like JavaScript, are interpreted. They are usually slower
than compiled languages, but not always. The speed of a program depends on
many factors, and whether it is interpreted may or may not be determinative.
Diagram from http://web.cs.wpi.edu/~gpollice/cs544-f05/CourseNotes/maps/Class1/Compilervs.Interpreter.html
Languages
Compiled versus interpreted
Languages can be compiled or interpreted. In compiled languages, the code is
compiled to machine code, and the operating system manages running and
terminating the program:
Diagram from http://web.cs.wpi.edu/~gpollice/cs544-f05/CourseNotes/maps/Class1/Compilervs.Interpreter.html
Languages
Hybrid languages
Java, C#, and Visual Basic are examples of
hybrid languages. In these cases the compiler
generates an intermediary code which depends
on a separate software infrastructure to run.
This allows for more flexibility because the
intermediate code is not resolved to machine
code, so is more independent of the particular
platform it runs on. At the same time, it can be
highly optimized because it is compiled.
In C# and .NET, the Intermediate Code is MSIL,
or Microsoft Intermediate Language.
The Interpreter is the Just-In-Time Compiler, or
JIT, which creates executable code from the
MSIL on the fly.
Diagram from http://www.codeproject.com/KB/dotnet/clr.aspx?df=100&forumid=3272&exp=0&select=412238.
Role of the Operating System
Running a program
Once a program is compiled, it is run by the
operating system. The operating system is
responsible for allocating memory for the
program, loading its first instructions into
the processor, managing the process as it
runs, and cleaning up after it terminates.
It also provides the interfaces by which it
communicates with devices which would
otherwise require more specialized code.
Role of the Operating System
In modern operating systems, the OS is responsible for many functions that would otherwise require
programmers to rewrite basic operations like drawing a letter on the screen.
One key role of the operating system is to launch and manage programs, which, when running, become
processes that the operating system juggles.
Other functions include interacting with the user (managing the mouse, keyboard, and display), managing
communication with peripheral devices (disk drives, networks, etc), displaying graphics, managing
windows, and many other functions that once required specialized programs.
That means most of what most programs do is interact with the operating system.
The Microsoft .NET architecture provides a convenient way of accessing OS and network resources. Since
we will be working in that environment, a large part of the code we write will involve interacting with .NET.
Sample C# program
Here is a very simple C# program. It consists of components that we will get into in more detail in later
classes. All of these elements are required in any C# program.
using System;
namespace HelloNameSpace
{
public class HelloWorld
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
}
}
}
Sample C# program
The “USING” statement defines what part of the .Net framework (or other external components) will be incorporated
into this program
using System;
“NAMESPACE” says that any names (more on this later) created here are part of the given unit, not another (e.g.,
System).
namespace HelloNameSpace
Curly braced define the beginning and end of any block of code. A block of code means different things in different
contexts. Here, the HelloNameSpace consists of anything within the block. For clarity, blocks typically share the
same level of indentation.
{
Sample C# program
PUBLIC CLASS HELLOWORLD says this code unit, or class (much more on classes later), that we are calling
HelloWorld, is available to any external code to use.
public class HelloWorld
{
STATIC VOID MAIN. More on static and void later, but MAIN is a special function name that declares this as the
starting point of the program. STRING[] ARGS is required for the starting point of the program, and contains any
command-line parameters (for example, if we typed in HelloWorld “Banana” at the command line, args would
contain “Banana”).
static void Main(string[] args)
{
SYSTEM.CONSOLE.WRITELINE(“HELLO WORLD!”); is telling the Console object within the system namespace to
write the line “Hello World!”.
System.Console.WriteLine("Hello World!");
}
}
}
Main() and WriteLine() are functions, or ways of invoking code that take some action. HelloWorld and
Sytem.Console are objects, or units of code that contain functions.