ls - Vanderbilt University

Download Report

Transcript ls - Vanderbilt University

Applied Bioinformatics
Course Overview & Introduction to Linux
Bing Zhang
Department of Biomedical Informatics
Vanderbilt University
[email protected]
What is bioinformatics
Bio




Bioinformatics
informatics




Hypotheses
Questions
Samples
Experiments
Storage/retrieval
Visualization
Computational methods
Statistical methods
Data





2
DNA
RNA
Protein
Metabolite
Phenotype




Sequence
Expression
Structure
Interaction
Why now?
Bio




Hypotheses
Questions
Samples
Experiments
informatics
informatics




Storage/retrieval
Visualization
Computational methods
Statistical methods
Data





3
DNA
RNA
Protein
Metabolite
Phenotype




Sequence
Expression
Structure
Interaction
Roles for different investigators in bioinformatics


Algorithm developer

Statisticians

Mathematicians

Computer scientists
Tool developer


Data provider/consumer

Graph courtesy of http://www.incogen.com/
4
Bioinformaticians
Biologists
Comprehensive resource list
http://bioinformatics.ca/links_directory/

5
March 2015

174 Resources

623 Databases

1548 Tools
Sequence and structure databases




6
Genbank: http://www.ncbi.nlm.nih.gov/genbank/

Annotated collection of all publicly available DNA sequences

126,551,501,141 bases in 135,440,924 sequence as of April 2011
UniProt: http://www.uniprot.org/

Comprehensive resource for protein sequences and functional information

534,242 reviewed entries as of January 2012
PDB: http://www.rcsb.org/

3D structures of large biological molecules, including proteins and nucleic acids

79,180 structures as of February 2012
Pfam: http://pfam.sanger.ac.uk/

Collection of protein families, each represented by multiple sequence alignments
and hidden Markov models (HMMs)

13,672 families as of November 2011
Genome browsers

UCSC genome browser


Ensembl genome browser

7
http://genome.ucsc.edu/cgi-bin/hgGateway
http://www.ensembl.org/index.html
Gene-centric databases


8
Entrez Gene

http://www.ncbi.nlm.nih.gov/gene

NCBI/NIH

All completely sequenced genomes

One gene per page
Ensembl BioMart

http://www.ensembl.org/biomart/martview

EMBL-EBI and Sanger Institute

Vertebrates and other selected eukaryotic species

Batch information retrieval
Gene expression data

Gene Expression Omnibus (GEO)


ArrayExpress

9
http://www.ncbi.nlm.nih.gov/geo/
http://www.ebi.ac.uk/arrayexpress/
Pathway and network resources

Gene Ontology (GO): http://www.geneontology.org/

Pathway databases



KEGG: http://www.genome.jp/kegg/pathway.html

Reactome: http://www.reactome.org/

WikiPathways: http://www.wikipathways.org/
Protein-protein interaction databases

DIP: http://dip.doe-mbi.ucla.edu/

MINT: http://mint.bio.uniroma2.it/mint/

BioGRID: http://www.thebiogrid.org/

HPRD: http://www.hprd.org
Protein-DNA interaction database

10
Transfac: http://www.gene-regulation.com
Qi Liu, Ph.D. ([email protected]; Department of Biomedical Informatics; 2525 West End Ave,
Suite 800; Phone: 322-6618)
Course content and grades
Date
Subject
Project
3/23
Introduction to Linux and R
Project 1. Microarray data analysis
3/25
Introduction to Linux and R
Report due 4/13 (50 points)
3/27
Introduction to Linux and R
3/30
Supervised and unsupervised data analysis: lecture
4/1
Supervised and unsupervised data analysis: lab
4/3
Pathway and network analysis: lecture
4/6
Pathway and network analysis: lab
4/8
Pathway and network analysis: lab
4/10
RNA-Seq for differential expression analysis: lecture
Project 2. RNA-Seq data analysis
4/13
RNA-Seq for differential expression analysis: lecture
Report due 4/22 (25 points)
4/15
RNA-Seq for differential expression analysis: lab
4/17
RNA-Seq for differential expression analysis: lab
4/20
Exome sequencing data analysis: lecture
4/22
Exome sequencing data analysis: lecture
4/24
Exome sequencing data analysis: lab
4/27
Exome sequencing data analysis: lab
Final Grade = sum of the three project scores (100 pts in total).
A: 85-100; B: 75-84; C: 65-74; D: 55-64; F: 0-54
11
Project 3. Exome sequencing data
analysis
Report due 5/1 (25 points)
Course materials and report submission
12

Lecture slides available
athttps://sites.google.com/site/vanderbiltigp2014/bioregulation-ii/minimester3/applied-bioinformatics

Project reports are due at 5pm on the due date (4/13, 4/22, 5/1). There will be a
10% per day deduction for late reports. Report 1 should be sent to Dr. Zhang,
Reports 2 and 3 should be sent to Dr. Liu (see email addresses below).

Instructor contact information

Dr. Bing Zhang: [email protected]

Dr. Qi Liu: [email protected]
ACCRE


13
Advanced Computing Center for Research & Education

http://www.accre.vanderbilt.edu/

The compute cluster currently consists of more than 500 Linux systems
with quad or hex core processors
Linux system

An operating system (OS) like Windows or Mac

Portable, multi-tasking, multi-user OS

High performance and free, making it idea for high performance
computing clusters
Proper use of ACCRE

14
Information in the ACCRE cluster group igp300b_ab may not
contain data, information, technology, images, or software that is
controlled under Federal Export Administration Regulations (EAR),
International Traffic in Arms Regulations (ITAR), Patient Health
Information (PHI), or Research Health Information (RHI) nor is it
considered proprietary.
Get an ACCRE account
15

http://www.accre.vanderbilt.edu/?page_id=617

Registration form

Name, VUNetID, Department (VU), School (VU), Email, Phone, Position

Group: IGP300b_ab (igp300b_ab)

Primary research area: bioinformatics

Primary application: Existing Application

Primary application name: R

Primary application type: Serial

Expected typical number of processors: NA

Expected typical number of concurrent running jobs: 1

Linux experience:

Expected compilers/languages: C, C++, R, perl, python

Expected external libraries: NA

BlueArc User: No

Other useful information: NA
Logging onto the cluster and change password




Windows

Application: Bitvise SSH (https://www.bitvise.com/ssh-client-download)

Two steps: edit profile->save profile

Host: vmplogin.accre.vanderbilt.edu

Username: your_user_name
Mac

Spotlight to find the application: Terminal

Command: ssh [email protected]
Change password

rsh auth

passwd
Exit

16
exit
Logging onto the cluster and change password
(using Bitvise SSH in Windows)
17
Logging onto the cluster and change password (using Terminal in Mac)
You won’t see any
response while typing
password, which is fine.
18
Hierarchical File system
/
/home
bin
etc
usr
tmp
home
scratch
/home/igptest
chmod
cp
bin
date
lib
annie
igptest
cody
bin
docs
src
grep
mv
diff
rm
find
vi
gcc
id
make
perl
ssh
libc.so
libgpfs.so
libjpeg.so
libstdc++.so
myprog.sh
prog1.c
dothis.pl
prog2.f77
dothat.py
prog3.cpp
/home/igptest/src/prog3.cpp
19
Working with directories

pwd (print your present working
directory)

ls (list directory contents)

mkdir (make a directory)

cd (change directory)

20

.. (parent directory)

. (current directory)

~ or no parameter (home
directory)
rmdir (remove an empty
directory)
Absolute and relative paths

Absolute path


A file or directory location in relation to the root of the file system, always
begin with a /
Relative path

A file or directory location in relation to where you currently are in the file
system, will not begin with a /
Absolute path
21
Relative path
Working with files

22
more (display the contents of
a file)

space bar to show next page

q to exist

cp (copy a file)

mv (rename/move a file)

rm (remove a file)
Getting help


23
man (display manual pages
for a command)

man ls (display manual for the
ls command)

space bar to show next page

q to exist
Alternatives of ls

ls -a (do not ignore entries
starting with .)

ls -l (use a long listing format)

ls -al (use a long listing format
and do not ignore entries
starting with .)
Editing files with nano

cd ~ (change to home directory)

nano .bashrc (use nano to edit file .bashrc, which includes commands that are
executed when starting the system).


24
Add “setpkgs –a R” to the end of the file (this will allow you to use the R
environment which has been installed in the ACCRE system for statistical
computing).
A quick tutorial http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html
Copying files to/from a local computer

Windows

Application: Bitvise SSH
(https://www.bitvise.com/ssh-client-download)

Mac

Application: Cyberduck
(https://it.vanderbilt.edu/software/downloads.php)
25

Connect to: vmplogin.accre.vanderbilt.edu

Username: your_user_name

Don’t change other items
Copying files to/from a local computer (using Bitvise SFTP in Windows)
26
Copying files to/from a local computer (using Fugu in Mac)
27
Summary
Command
Meaning
rsh <hostname>
Remote shell
passwd
Modify a user’s password
exit
Exit the shell
pwd
Display the path of the current directory
ls
List files and directories
ls -a
List all files and directories
ls -al
List all files and directories in a long listing format
mkdir <directory name>
Make a directory
cd <directory name>
Change to named directory
cd
Change to home directory
cd ~
Change to home directory
cd ..
Change to parent directory
rmdir <directory name>
Remove a directory
more
View the contents of a file
cp <file1> <file2>
Copy file1 and name the copied file file2
mv <file1> file2>
Move or rename file1 to file2
rm <file name>
Remove a file
man <command>
Display manual pages for a command
nano <file name>
Use the nano text editor to view and edit a file
28
Exercise


Copy the file sample_file.txt under directory /home/igptest to your
test directory

Make a copy of the file, sample_file_1.txt

View and modify the file sample_file_1.txt using nano, correct the
typo (Warld -> World)

Copy the file to your desktop

Copy a file from your desktop to your test directory

Add “setpkgs –a R” to the end of your .bashrc file

29
Create a test directory with the name “test” under your home
Go through the required sections of the following tutorial before next
class. http://ryanstutorials.net/linuxtutorial/

Required sections: 1, 2, 3, 4, 5, 9, 11

Optional sections: 8, 12

Advanced sections: 6, 7, 10, 13