ls - Vanderbilt University
Download
Report
Transcript ls - Vanderbilt University
Applied Bioinformatics
Course Overview & Introduction to Linux
Bing Zhang
Department of Biomedical Informatics
Vanderbilt University
[email protected]
What is bioinformatics
Bio
Bioinformatics
informatics
Hypotheses
Questions
Samples
Experiments
Storage/retrieval
Visualization
Computational methods
Statistical methods
Data
2
DNA
RNA
Protein
Metabolite
Phenotype
Sequence
Expression
Structure
Interaction
Why now?
Bio
Hypotheses
Questions
Samples
Experiments
informatics
informatics
Storage/retrieval
Visualization
Computational methods
Statistical methods
Data
3
DNA
RNA
Protein
Metabolite
Phenotype
Sequence
Expression
Structure
Interaction
Roles for different investigators in bioinformatics
Algorithm developer
Statisticians
Mathematicians
Computer scientists
Tool developer
Data provider/consumer
Graph courtesy of http://www.incogen.com/
4
Bioinformaticians
Biologists
Comprehensive resource list
http://bioinformatics.ca/links_directory/
5
March 2015
174 Resources
623 Databases
1548 Tools
Sequence and structure databases
6
Genbank: http://www.ncbi.nlm.nih.gov/genbank/
Annotated collection of all publicly available DNA sequences
126,551,501,141 bases in 135,440,924 sequence as of April 2011
UniProt: http://www.uniprot.org/
Comprehensive resource for protein sequences and functional information
534,242 reviewed entries as of January 2012
PDB: http://www.rcsb.org/
3D structures of large biological molecules, including proteins and nucleic acids
79,180 structures as of February 2012
Pfam: http://pfam.sanger.ac.uk/
Collection of protein families, each represented by multiple sequence alignments
and hidden Markov models (HMMs)
13,672 families as of November 2011
Genome browsers
UCSC genome browser
Ensembl genome browser
7
http://genome.ucsc.edu/cgi-bin/hgGateway
http://www.ensembl.org/index.html
Gene-centric databases
8
Entrez Gene
http://www.ncbi.nlm.nih.gov/gene
NCBI/NIH
All completely sequenced genomes
One gene per page
Ensembl BioMart
http://www.ensembl.org/biomart/martview
EMBL-EBI and Sanger Institute
Vertebrates and other selected eukaryotic species
Batch information retrieval
Gene expression data
Gene Expression Omnibus (GEO)
ArrayExpress
9
http://www.ncbi.nlm.nih.gov/geo/
http://www.ebi.ac.uk/arrayexpress/
Pathway and network resources
Gene Ontology (GO): http://www.geneontology.org/
Pathway databases
KEGG: http://www.genome.jp/kegg/pathway.html
Reactome: http://www.reactome.org/
WikiPathways: http://www.wikipathways.org/
Protein-protein interaction databases
DIP: http://dip.doe-mbi.ucla.edu/
MINT: http://mint.bio.uniroma2.it/mint/
BioGRID: http://www.thebiogrid.org/
HPRD: http://www.hprd.org
Protein-DNA interaction database
10
Transfac: http://www.gene-regulation.com
Qi Liu, Ph.D. ([email protected]; Department of Biomedical Informatics; 2525 West End Ave,
Suite 800; Phone: 322-6618)
Course content and grades
Date
Subject
Project
3/23
Introduction to Linux and R
Project 1. Microarray data analysis
3/25
Introduction to Linux and R
Report due 4/13 (50 points)
3/27
Introduction to Linux and R
3/30
Supervised and unsupervised data analysis: lecture
4/1
Supervised and unsupervised data analysis: lab
4/3
Pathway and network analysis: lecture
4/6
Pathway and network analysis: lab
4/8
Pathway and network analysis: lab
4/10
RNA-Seq for differential expression analysis: lecture
Project 2. RNA-Seq data analysis
4/13
RNA-Seq for differential expression analysis: lecture
Report due 4/22 (25 points)
4/15
RNA-Seq for differential expression analysis: lab
4/17
RNA-Seq for differential expression analysis: lab
4/20
Exome sequencing data analysis: lecture
4/22
Exome sequencing data analysis: lecture
4/24
Exome sequencing data analysis: lab
4/27
Exome sequencing data analysis: lab
Final Grade = sum of the three project scores (100 pts in total).
A: 85-100; B: 75-84; C: 65-74; D: 55-64; F: 0-54
11
Project 3. Exome sequencing data
analysis
Report due 5/1 (25 points)
Course materials and report submission
12
Lecture slides available
athttps://sites.google.com/site/vanderbiltigp2014/bioregulation-ii/minimester3/applied-bioinformatics
Project reports are due at 5pm on the due date (4/13, 4/22, 5/1). There will be a
10% per day deduction for late reports. Report 1 should be sent to Dr. Zhang,
Reports 2 and 3 should be sent to Dr. Liu (see email addresses below).
Instructor contact information
Dr. Bing Zhang: [email protected]
Dr. Qi Liu: [email protected]
ACCRE
13
Advanced Computing Center for Research & Education
http://www.accre.vanderbilt.edu/
The compute cluster currently consists of more than 500 Linux systems
with quad or hex core processors
Linux system
An operating system (OS) like Windows or Mac
Portable, multi-tasking, multi-user OS
High performance and free, making it idea for high performance
computing clusters
Proper use of ACCRE
14
Information in the ACCRE cluster group igp300b_ab may not
contain data, information, technology, images, or software that is
controlled under Federal Export Administration Regulations (EAR),
International Traffic in Arms Regulations (ITAR), Patient Health
Information (PHI), or Research Health Information (RHI) nor is it
considered proprietary.
Get an ACCRE account
15
http://www.accre.vanderbilt.edu/?page_id=617
Registration form
Name, VUNetID, Department (VU), School (VU), Email, Phone, Position
Group: IGP300b_ab (igp300b_ab)
Primary research area: bioinformatics
Primary application: Existing Application
Primary application name: R
Primary application type: Serial
Expected typical number of processors: NA
Expected typical number of concurrent running jobs: 1
Linux experience:
Expected compilers/languages: C, C++, R, perl, python
Expected external libraries: NA
BlueArc User: No
Other useful information: NA
Logging onto the cluster and change password
Windows
Application: Bitvise SSH (https://www.bitvise.com/ssh-client-download)
Two steps: edit profile->save profile
Host: vmplogin.accre.vanderbilt.edu
Username: your_user_name
Mac
Spotlight to find the application: Terminal
Command: ssh [email protected]
Change password
rsh auth
passwd
Exit
16
exit
Logging onto the cluster and change password
(using Bitvise SSH in Windows)
17
Logging onto the cluster and change password (using Terminal in Mac)
You won’t see any
response while typing
password, which is fine.
18
Hierarchical File system
/
/home
bin
etc
usr
tmp
home
scratch
/home/igptest
chmod
cp
bin
date
lib
annie
igptest
cody
bin
docs
src
grep
mv
diff
rm
find
vi
gcc
id
make
perl
ssh
libc.so
libgpfs.so
libjpeg.so
libstdc++.so
myprog.sh
prog1.c
dothis.pl
prog2.f77
dothat.py
prog3.cpp
/home/igptest/src/prog3.cpp
19
Working with directories
pwd (print your present working
directory)
ls (list directory contents)
mkdir (make a directory)
cd (change directory)
20
.. (parent directory)
. (current directory)
~ or no parameter (home
directory)
rmdir (remove an empty
directory)
Absolute and relative paths
Absolute path
A file or directory location in relation to the root of the file system, always
begin with a /
Relative path
A file or directory location in relation to where you currently are in the file
system, will not begin with a /
Absolute path
21
Relative path
Working with files
22
more (display the contents of
a file)
space bar to show next page
q to exist
cp (copy a file)
mv (rename/move a file)
rm (remove a file)
Getting help
23
man (display manual pages
for a command)
man ls (display manual for the
ls command)
space bar to show next page
q to exist
Alternatives of ls
ls -a (do not ignore entries
starting with .)
ls -l (use a long listing format)
ls -al (use a long listing format
and do not ignore entries
starting with .)
Editing files with nano
cd ~ (change to home directory)
nano .bashrc (use nano to edit file .bashrc, which includes commands that are
executed when starting the system).
24
Add “setpkgs –a R” to the end of the file (this will allow you to use the R
environment which has been installed in the ACCRE system for statistical
computing).
A quick tutorial http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html
Copying files to/from a local computer
Windows
Application: Bitvise SSH
(https://www.bitvise.com/ssh-client-download)
Mac
Application: Cyberduck
(https://it.vanderbilt.edu/software/downloads.php)
25
Connect to: vmplogin.accre.vanderbilt.edu
Username: your_user_name
Don’t change other items
Copying files to/from a local computer (using Bitvise SFTP in Windows)
26
Copying files to/from a local computer (using Fugu in Mac)
27
Summary
Command
Meaning
rsh <hostname>
Remote shell
passwd
Modify a user’s password
exit
Exit the shell
pwd
Display the path of the current directory
ls
List files and directories
ls -a
List all files and directories
ls -al
List all files and directories in a long listing format
mkdir <directory name>
Make a directory
cd <directory name>
Change to named directory
cd
Change to home directory
cd ~
Change to home directory
cd ..
Change to parent directory
rmdir <directory name>
Remove a directory
more
View the contents of a file
cp <file1> <file2>
Copy file1 and name the copied file file2
mv <file1> file2>
Move or rename file1 to file2
rm <file name>
Remove a file
man <command>
Display manual pages for a command
nano <file name>
Use the nano text editor to view and edit a file
28
Exercise
Copy the file sample_file.txt under directory /home/igptest to your
test directory
Make a copy of the file, sample_file_1.txt
View and modify the file sample_file_1.txt using nano, correct the
typo (Warld -> World)
Copy the file to your desktop
Copy a file from your desktop to your test directory
Add “setpkgs –a R” to the end of your .bashrc file
29
Create a test directory with the name “test” under your home
Go through the required sections of the following tutorial before next
class. http://ryanstutorials.net/linuxtutorial/
Required sections: 1, 2, 3, 4, 5, 9, 11
Optional sections: 8, 12
Advanced sections: 6, 7, 10, 13