Bioinformatics Programming - National Cheng Kung University

Download Report

Transcript Bioinformatics Programming - National Cheng Kung University

Bioinformatics Programming
EE, NCKU
Tien-Hao Chang (Darby Chang)
1
Background
Preparation for this class
2
We talk about
Terminology
3
http://farm3.static.flickr.com/2109/2178878189_56a2d16d39.jpg
Synchronization
4
Linux
Difference to UNIX
5
UNIX



To put it very generically, Linux is an operating
system kernel, and UNIX is a certification for
operating systems.
The UNIX standard evolved from the original Unix
system developed at Bell Labs (1969). After Unix
System V, it ceased to be developed as a single
operating system, and was instead developed by
various competing companies, such as Solaris (from
Sun Microsystems), AIX (from IBM), HP-UX (from
Hewlett-Packard), and IRIX (from Silicon Graphics).
UNIX is a specification for baseline interoperability
between these systems, even though there are many
major architectural differences between them.
6


Linux was born out of the desire to
create a free software alternative to the
commercial UNIX environments. Its
history dates back to 1991, or further
back to 1983, when the GNU project,
whose original aims where to provide a
free alternative to UNIX, was
introduced.
Linux has never been certified as being
a version of UNIX, so it is described as
being “Unix-like.”
7
UNIX
History






1960s
1970s
1970s/80s
1980s
1990s
now
multics project (MIT, GE, AT&T)
AT&T Bell Labs
UC Berkeley
DOS imitated many Unix ideas
Commercial Unix fragmentation
GNU Project
Linux
Unix is widespread and available
from many sources, both free
and commercial
8
http://upload.wikimedia.org/wikipedia/commons/5/51/Unix_history.svg
9
UNIX
Flavors



Sun's Solaris, Hewlett-Packard's HP-UX, and IBM's
AIX® are all flavors of UNIX that have their own
unique elements and foundations.
Windows has two main lines. The older flavors are
referred to as "Win9x" and consist of Windows 95, 98,
98SE and Me. The newer flavors are referred to as
"NT class" and consist of Windows NT, 2000, XP,
Vista, and 7. Microsoft no longer supports Windows
NT, all the 9x versions.
The flavors of Linux are referred to as distributions (or
"distros").
10
Linux
Distributions

All the Linux distributions released around the same
time frame will use the same kernel. They differ in the
–
–
–
–
–
–

add-on software
GUI
install process
price
documentation
technical support
All the flavors of Windows come from Microsoft, the
various distributions of Linux come from different
companies/vendors such ass Linspire, Red Hat, SuSE,
Ubuntu, Xandros, Knoppix, Slackware, Lycoris, and so
on.
11
UNIX
Philosophy

Multiuser / Multitasking

Flexibility / Freedom

Everything is a file


File system has places, processes
have life
Designed by programmers for
programmers
12
UNIX
Structure
Programs
Kernel
Hardware
13
UNIX
The File System
http://www.comsci.us/fs/notes/images/unixfs.gif
14
UNIX
Programs

Shell is the command line interpreter

Shell is just another program

A program or command
– interacts with the kernel
– may be any of:
• built-in shell command
• interpreted script
• compiled object code file
15
Any Questions?
16
Vs. Windows
Which is better?
Of course, this is a open question.
17
Terminology
Operating System
18
Vs. Windows
To you, are Linux and Windows the same
thing? Or, Linux is an platform for only
specific usage?
19
Terminology
Terminal
20
http://www.linuxmail.info/images/windows-xp/putty-terminal-vncserver.png
What is inside the terminal?
21
http://linux.vbird.org/linux_server/0310telnetssh/Xserver_client.png
22
http://rohansplace.com/TSWeb/Remote_desktop_connection_icon.png
Yes, Remote Desktop, is a terminal
23
http://images.ptt.cc/connect.gif
Similar to anything you use to access BBS, conceptually
24
Getting Started
25
You’re welcome to
Interrupt me, anytime!
26
Getting Started
Logging In

Login and password prompt to log in
– login is user’s unique name
– password is changeable; known only to
user, not to system staff

Unix is case sensitive
– issued login and password (usually in
lower case)
27
Getting Started
Passwords

Do:
– make sure nobody is looking over your
shoulder when you are entering your
password
– change your password often
– choose a password you can remember
– use eight characters, more on some
systems
– use a mixture of character types – include
punctuation and other symbols
28
Getting Started
Passwords

Don’t:
–
–
–
–
–
–
–

use a word (or words) in any language
use a proper name
use information in your wallet
use information commonly known about you
use control characters
write your password anywhere
EVER give your password to anybody
Your password is your account security:
– To change your password, use the passwd command
– Change your initial password immediately
29
Getting Started
Unix Command Line Structure

A command is a program that tells the Unix
system to do something. It has the form:
command options arguments
– “Whitespace” separates parts of the command
line
– An argument indicates on what the command
is to perform its action
– An option modifies the command, usually
starts with “-”
30
Getting Started
Getting Help



Not all Unix commands will follow the
same standards
Options and syntax for a command are
listed in the “man page” for the
command
man: On-line manual
– $ man command
– $ man -k keyword
31
Getting Started
Directory Navigation

pwd
print working directory

cd
change working directory
(“go to” directory)

mkdir make a directory

rmdir remove directory

ls
list directory contents
32
Getting Started
Permissions

Each line (when using -l option of ls) includes the
following:
–
–
–
–
–

type field (first character)
access permissions (characters 2–10):
first 3: user/owner
second 3: assigned unix group
last 3: others
Permissions are designated:
–
–
–
–
r
w
x
-
read permission
write permission
execute permission
no permission
33
Getting Started
File Maintenance Commands

chmod
chgrp
chown
rm
cp
mv

chmod [options] file





change the file or directory access permission
change the group of the file
change the owner of a file
remove (delete) a file
copy file
move (or rename) file
– Using + and - with a single letter:
• u
• g
• o
user owning file
those in assigned group
others
– $ chmod u+w file # gives the user (owner) write permission
– $ chmod g+r file # gives the group read permission
– $ chmod o-x file # removes execute permission for others
34

chmod [options] file
– using numeric representations for
permissions:
•r
•w
•x
=
=
=
4
2
1
– $ chmod 777 file
• gives user, group, and others r, w, x
permissions
– $ chmod 750 file
• gives the user read, write, execute
• gives group members read, execute
• gives others no permissions
35
Getting Started
Display Commands

echo
echo the text string to stdout

cat
concatenate (list)

head
display first -n lines of file

tail
display last -n lines of file

Useful in pipe
36
Any Questions?
37
Getting Started
System Resource Commands









df
du
ps
report file system disk space usage
estimate file space usage
show status of processes (options vary from
system to system — see the man pages)
kill
terminate a process
whereis
report program locations
which
report the command found
hostname reports the name of the machine the user is
logged into
uname
has additional options to print info about system
hardware and software
date
print or set the system date and time
38
Getting Started
More Fun with Files

ln — link to another file
– symbolic link (soft link)
• $ ln -s source target
• A symbolic link is used to create a new path to
another file or directory. Useful when the target file
has versions.
– hard link
• $ ln source target
• A hard link creates a new directory entry pointing to
the same inode as the original file. The file will not be
deleted until all the hard links to it are removed.
– Very different when you delete the original file.
39




sort — sort file contents
uniq — remove duplicate lines
file — file type
tr — translate characters
– $ tr ‘[a-z]’ ‘[A-Z]’ file

find — find files
– $ find . -name ay
– $ find . -newer empty
– $ find . -type d –print

gzip — compression
– often use .gz extension

tar — archive files
– use .tar extension
– use .tgz extension when combining gzip

wc — word count
40
Any Questions?
41
Shells
42
Shells

The shell sits between you and the
operating system
– acts as a command interpreter
– reads input
– translates commands into actions to be
taken by the system

To see what your current login shell is:
– $ echo $SHELL
43
Shells
Basic Shells

Bourne Shell (sh)
– good features for I/O control — often used for scripts
– other shells based on Bourne may be suited for
interactive users
– default prompt is $

C Shell (csh)
–
–
–
–
–
–
uses C-like syntax for scripting
I/O more awkward than Bourne shell
job control
history
default prompt is %
uses ˜ symbol to indicate a home directory (user’s or
others’)
44
Shells
Other Shells

Based on the Bourne Shell:
– Korn (ksh)
– Bourne-Again Shell (bash)
• job control
• history
• uses ˜ symbol to indicate a home directory
(user’s or others’)
– Z Shell (zsh)

Based on the C Shell:
– T-C shell (tcsh)
45
Shells
Built-in Shell Commands

The shells have a number of built-in
commands:
– executed directly by the shell
– don’t have to call another program to be
run
– different for the different shells
– cd, echo, exit, for, if, pwd, …
46
Shells
Environment Variables


Environmental variables are used to provide
information to the programs you use.
Global environment variables are set by your login
shell and new programs and shells inherit the
environment of their parent shell.
–
–
–
–
GROUP
HOME
HOST
PATH
– SHELL
– USER
your login group, e.g. staff
path to your home directory, e.g. /home/frank
the hostname of your system, e.g. nyssa
paths to be searched for commands, e.g.
/usr/bin:/usr/ucb:/usr/local/bin
the login shell you’re using, e.g. /usr/bin/csh
Your username, e.g. frank
47
Any Questions?
48
http://en.wikipedia.org/wiki/File:Tux.svg
Now, we are more familiar with this penguin
49
http://blog.sherweb.com/wp-content/uploads/linux-vs-windows.jpg
50
Linux Vs. Windows
Interface
• Kernel/GUI-Based
• Target Users
Support
Business
• Developers
• Drivers/Games/Virus
• Pirate Copy
• Open Source
Popularity
• Users
• Habits
51
Linux Vs. Windows
History


Linux was originally built by Linus Torvalds
at the University of Helsinki in 1991. Linux
is a Unix-like, kernel-based, fully memoryprotected, multitasking operating system. It
runs on a wide range of hardware from PCs
to Macs.
First version of Windows — Windows 3.1
released in 1992 by Microsoft. Windows is a
GUI-based operating system. It has
powerful networking capabilities, is
multitasking, and extremely user friendly.
52
Linux Vs. Windows
Functionalities




Linux seems to be more reliable, flexible
and generous.
Ironically, even Linux is open source, it falls
short in the number of different applications
available for it.
Windows seems to be less mature (at first)
in most measures of evaluating a good OS.
However, it proves that the appearance is
more important than everything. Crucial but
real.
53
http://www.nudonation.com/archivos/bill-gates.jpg
Of course, this guy is probably the
most successful sale ever
54
http://msnbcmedia2.msn.com/j/msnbc/Components/Photos/060615/060615_gatesFoundation_hmed_5p.hmedium.jpg
He helped many biomedical related researches
55
http://i5.tinypic.com/4yqudc7.jpg
As time goes by
56
http://4.bp.blogspot.com/_5irnbDcN0to/SwG_4mVCUlI/AAAAAAAAAfY/YRLLzWZE_po/S740/LinuxDistributions.jpg
Linux has many partners
57
Linux Vs. Windows
Things Changed

Linux has much improved UI
– To me, the installation procedure of some distributions seems
easier than Windows

Windows keeps strengthening the ability of being a good
OS, no matter what the reason is
– For example, Microsoft improved IE to eliminate Netscape (it
succeeded at IE3). Again, Microsoft wants to do it against
Firefox now. Both IE7 and 8 failed. But who knows?

Although the functionality difference is decreasing, the
popularity difference is increasing.
– Habit (this is even critical in search engine war)
– Support (the hateful Windows update)
– Is the flexibility of Linux an advantage?
58
http://static-p4.fotolia.com/jpg/00/11/93/51/400_F_11935145_JyxCv7ufq6qk48jfPraVyKoxrDs4obfy.jpg
Which distribution? (probably scared many beginners)
59
http://www.iconfinder.net/ajax/download/png/?id=33647&s=128
Ubuntu
60
http://art4linux.org/system/files/ubuntu-girls-mini.jpg
61
Ubuntu





Ubuntu is based on the Debian distribution (good package
management). It is named after the Southern African ethical
ideology Ubuntu (“humanity towards others”).
Ubuntu provides an up-to-date, stable operating system for the
average user, with a strong focus on usability and ease of
installation.
Web statistics from late 2009 suggest that Ubuntu's share is
between 40 and 50%.
Ubuntu is sponsored by the UK-based company Canonical Ltd.,
owned by South African entrepreneur Mark Shuttleworth.
By keeping Ubuntu free and open source, Canonical is able to
utilize the talents of community developers in Ubuntu's
constituent components. Instead of selling Ubuntu for profit,
Canonical creates revenue by selling technical support and from
creating several services tied to Ubuntu.
62
http://upload.wikimedia.org/wikipedia/commons/7/78/Mark_Shuttleworth_by_Martin_Schmitt.jpg
63
Mark Shuttleworth




Born at 18 September 1973
Founded Thawte in 1995, which specialised in
digital certificates and Internet security and
then sold it in December 1999, earning about
USD 575 million.
In September 2000, Shuttleworth formed HBD
Venture Capital, a business incubator and
venture capital provider.
In March 2004 he formed Canonical Ltd., for
the promotion and commercial support of free
software projects.
64
http://www.openfoundry.org/
There are speeches really valuable,
do some homework
65
To Sum Up
Ubuntu is as friendly as any version of
Windows. Everyone can start to use it
without any introduction.
66
http://poietes.files.wordpress.com/2009/04/yoda-1.jpg
However, if you choose a dual system,
you will never become a master
67
Shell Scripts
68
Shell Scripts

Similar to DOS batch files
Quick and simple programming
Text file interpreted by shell, effectively new
command
List of shell commands to be run sequentially
Execute permissions, no special extension necessary

Magic first line




– #!
– Include full path to interpreter (shell)
• #!/bin/sh
69
Shell Scripts
Interacting

Special variables for processing arguments
–
–
–
–
–
–
–

$#
number of arguments on command line
$0
name that script was called as
$1 – $9 command line arguments
$@
all arguments (separately quoted)
$*
all arguments
$?
numeric result code of previous command
$$
process ID of this running script
Interacting With User
– Talk to user (or ask questions) first, then get input from
user, put it in variable
• echo prompt
read variable
70
Shell Scripts
Control Structure




if [ … ]; then
…
fi
for variable in … ; do
…
done
Check sh man page for details, also look at examples.
#!/bin/sh
if [ $# -ge 2 ]
then
echo $2
elif [ $# -eq 1 ]; then
echo $1
else
echo No input
fi
71
Any Questions?
72
Can you
Use shell script to change filenames from
lower- to uppercase? Remember that the
wild card symbol * can help you get all
files.
73

#!/bin/sh
for file in *; do
echo "processing $file"
mv $file `echo $file | tr '[a-z]' '[A-Z]‘`
done

How would you do in Windows?

BTW, why Perl? It can be done in one line
– $ ls | perl -nle 'my $o=$_; tr/a-z/A-Z/; \
rename $o, $_'

How would you do with C?
74
Any Questions?
75
Code Size Calculator
In
Out
a file
code size
Requirement
- input from command line
- do not count space characters
- do not count comments (C style)
- must complete in Unix
- if you don’t have one, contact me ASAP
- using C would be the best
Bonus
- write a shell script version
76
Deadline
2010/3/9 23:59
Zip your code, a step-by-step README of
how to execute the code and anything
worthy extra credit. Email to
[email protected].
77
gcc
78
gcc


gcc is the GNU C Compiler, and g++ is
the GNU C++ compiler, while cc and CC
are the Sun C and C++ compilers also
available on Sun workstations.
Notice that, C++ is different to C in a
certain extent. A safe way is to regard
they are two different languages with
very similar basic structures.
79
gcc
Compiling a Simple Program

Consider the following example: let “hello.c” be a file that
contains the following C code
– #include “stdio.h”
int main() {
printf(“Hello\n”);
}

The standard way to compile this program is with the
command
– $ gcc hello.c -o hello

This command compiles hello.c into an executable program
named “hello”. It does nothing more than print the word
“hello” on the screen.
– $ chmod 755 hello
– $ ./hello
80

Alternatively, the above program could be
compiled using the following two commands
– $ gcc -c hello.c
– $ gcc hello.o -o hello


The end result is the same, but this two-step
method first compiles hello.c into a machine
code file named “hello.o” and then links hello.o
with some system libraries to produce the final
program “hello”.
In fact the first method also does this twostage process of compiling and linking, but the
stages are done transparently, and the
intermediate file “hello.o” is deleted in the
process.
81
gcc
Frequently Used Options


The examples below demonstrate how to use many of the more
commonly used options. Some options can be combined, although
it is generally not useful to use “debugging” and “optimization”
options together.
Makes the resulted executable contain symbolic information for
the gdb debugger
– $ gcc -g myprog.c -o myprog

Have the compiler generate many warnings about syntactically
correct but questionable looking code. It is good practice to
always use this option with gcc and g++
– $ gcc -Wall myprog.c -o myprog

Generate optimized code. The -O is a capital o and not the
number 0!
– $ gcc -O myprog.c -o myprog

Compile a C program that uses math functions such as “sqrt”
– $ gcc myprog.c -o myprog -lm
82
gcc
Multiple Source Files

If there are multiple source file
– $ gcc file1.c file2.c -o myprog

Or
– $ gcc -c file1.c
$ gcc -c file2.c
$ gcc file1.o file2.o -o myprog

The second one compiles source files separately. If only
file1.c was modified
– $ gcc -c file1.c
$ gcc file1.o file2.o -o myprog

Notice that file2.c does not need to be recompiled.
– significant time savings when there are numerous source files

This process, though somewhat complicated, is generally
handled automatically by a makefile.
83