Project of CZ5225
Download
Report
Transcript Project of CZ5225
Project of CZ5225
Zhang Jingxian: [email protected]
Identifying biomarkers of drug
response for cancer patients
Aims:
To develop of predictors of response to drugs
To learn how to get public microarray data
To learn how to preprocess microarray raw
data
To annotate the genes of interest
Requirements
Each group investigates:
ONE kind of cancer patient drug response
Need Two datasets from different studies
Download the raw data
Use Bioconductor in R to prepossess raw data
Identify certain number of genes
Annotate those identified genes in your report
Each group needs only ONE report
Requirements
All kinds of affymatrix expression datasets
related to drug response of cancer
patients are available
Dataset needs to contain at least 20
samples
Dataset needs two comparable outcome
groups: response vs. non-response;
resistance vs. non-resistance, et al.
Bioconductor & R
http://www.bioconductor.org
Advantages
Cross platform
Comprehensive and centralized
Algorithms and methods have undergone evaluation by statisticians and
computer scientists before launch. And in many cases there are also
literature references
Good documentations
New methods/functions can easily be incorporated and implemented
Quality check of data analysis methods
Analyzes both Affymetrix and two color spotted microarrays, and covers
various stages of data analysis in a single environment
Cutting edge analysis methods
Linux, windows and MacOS
Comprehensive manuals, documentations, course materials, course notes
and discussion group are available
A good chance to learn statistics and programming
Installation R & Bionconductor
Install R from: http://cran.stat.nus.edu.sg/
Open R platform then execute:
>source("http://bioconductor.org/biocLite.R")
>biocLite()
Check library by execute: >library()
Case study
Dataset source (GSE19697):
http://www.ncbi.nlm.nih.gov/geo
Extraction raw data into: D://gse19697
Create title.txt :
Open R
Set workdir by execute:
Load simpleaffy module by execute:
>setwd(‘d://gse19697’)
>library(simpleaffy)
Load data by:
>eset <- read.affy('title.txt')
Calculate expression by:
>eset.rma <- call.exprs(eset,'rma')
Compare two groups by:
>pc.result <- pairwise.comparison(eset.rma,
"title", c("pCR", "RD"), eset)
Filter significant changed markers
between two groups by:
>significant <- pairwise.filter(pc.result,fc=log2(1.5),
tt=0.001)
Plot significant changed markers:
>plot(significant)
Annotate selected markers:
>significant
Annotate selected markers:
Heatmap:
> significant <- pairwise.filter(pc.result,fc=log2(1),
tt=0.001)
> pid<-rownames(significant@means)
>eset.hm<-eset.rma[pid,]
> install.packages("RColorBrewer")
> library(RColorBrewer)
> hmcol <- colorRampPalette(brewer.pal(10, "RdBu"))(256)
> spcol <- ifelse(eset.hm$title == "pCR", "goldenrod",
"skyblue")
> heatmap(exprs(eset.hm), col = hmcol, ColSideColors =
spcol)
Assignment 2
Genetics of gene expression (eQTL)
Aim: to identify potential genetics various
that causes differential expression
Deadline of report: two weeks before final
examination
Genetics of gene expression
SNP
expression Quantitative Trait
Locus (eQTL)
tries to find genomic variation to explain
expression traits.
One difference between eQTL mapping and
traditional QTL mapping is that, traditional
mapping study focuses on one or a few traits,
while in most of eQTL studies, thousands of
expression traits will be analyzed and thousands
of QTLs will be declared.
GGdata: all 90 hapmap CEU samples, 47K
expression, 4mm SNP
Chromosome 17
> biocLite(“GGtools”)
>biocLite(“GGdata”)
>library(GGtools)
>library(GGdata)
> c17 = getSS("GGdata", "17")
>/////get(“CSDA", revmap(illuminaHumanv1SYMBOL))
> t1 = gwSnpTests(genesym("CSDA") ~ male, c17,
chrnum("17"))
> /////t1 = gwSnpTests(probeId(" GI_21359983-S ") ~
male, c17, chrnum("17"))
> topSnps(t1)
>plot_EvG(genesym("CSDA"), rsid("rs7212116"), c17)
>//c_full = getSS(“GGdata", as.character(1:22))
Requirements for assignment 2
Identify the genetics cause (eQTL) of the
genes selected in assignment 1
Get SNPs with significant association
(<10e-4) from each chromosome
Paste the plot image for each association
Annotate SNPs in dbSNP
Submit a report for each group