Project of CZ5225

Download Report

Transcript Project of CZ5225

Project of CZ5225
Zhang Jingxian: [email protected]
Identifying biomarkers of drug
response for cancer patients

Aims:




To develop of predictors of response to drugs
To learn how to get public microarray data
To learn how to preprocess microarray raw
data
To annotate the genes of interest
Requirements

Each group investigates:







ONE kind of cancer patient drug response
Need Two datasets from different studies
Download the raw data
Use Bioconductor in R to prepossess raw data
Identify certain number of genes
Annotate those identified genes in your report
Each group needs only ONE report
Requirements



All kinds of affymatrix expression datasets
related to drug response of cancer
patients are available
Dataset needs to contain at least 20
samples
Dataset needs two comparable outcome
groups: response vs. non-response;
resistance vs. non-resistance, et al.
Bioconductor & R

http://www.bioconductor.org

Advantages

Cross platform


Comprehensive and centralized


Algorithms and methods have undergone evaluation by statisticians and
computer scientists before launch. And in many cases there are also
literature references
Good documentations


New methods/functions can easily be incorporated and implemented
Quality check of data analysis methods


Analyzes both Affymetrix and two color spotted microarrays, and covers
various stages of data analysis in a single environment
Cutting edge analysis methods


Linux, windows and MacOS
Comprehensive manuals, documentations, course materials, course notes
and discussion group are available
A good chance to learn statistics and programming
Installation R & Bionconductor


Install R from: http://cran.stat.nus.edu.sg/
Open R platform then execute:
>source("http://bioconductor.org/biocLite.R")
>biocLite()

Check library by execute: >library()
Case study

Dataset source (GSE19697):
http://www.ncbi.nlm.nih.gov/geo


Extraction raw data into: D://gse19697
Create title.txt :


Open R
Set workdir by execute:


Load simpleaffy module by execute:


>setwd(‘d://gse19697’)
>library(simpleaffy)
Load data by:

>eset <- read.affy('title.txt')

Calculate expression by:


>eset.rma <- call.exprs(eset,'rma')
Compare two groups by:

>pc.result <- pairwise.comparison(eset.rma,
"title", c("pCR", "RD"), eset)

Filter significant changed markers
between two groups by:

>significant <- pairwise.filter(pc.result,fc=log2(1.5),
tt=0.001)

Plot significant changed markers:


>plot(significant)
Annotate selected markers:

>significant

Annotate selected markers:
Heatmap:








> significant <- pairwise.filter(pc.result,fc=log2(1),
tt=0.001)
> pid<-rownames(significant@means)
>eset.hm<-eset.rma[pid,]
> install.packages("RColorBrewer")
> library(RColorBrewer)
> hmcol <- colorRampPalette(brewer.pal(10, "RdBu"))(256)
> spcol <- ifelse(eset.hm$title == "pCR", "goldenrod",
"skyblue")
> heatmap(exprs(eset.hm), col = hmcol, ColSideColors =
spcol)
Assignment 2



Genetics of gene expression (eQTL)
Aim: to identify potential genetics various
that causes differential expression
Deadline of report: two weeks before final
examination
Genetics of gene expression
SNP
expression Quantitative Trait
Locus (eQTL)


tries to find genomic variation to explain
expression traits.
One difference between eQTL mapping and
traditional QTL mapping is that, traditional
mapping study focuses on one or a few traits,
while in most of eQTL studies, thousands of
expression traits will be analyzed and thousands
of QTLs will be declared.

GGdata: all 90 hapmap CEU samples, 47K
expression, 4mm SNP

Chromosome 17











> biocLite(“GGtools”)
>biocLite(“GGdata”)
>library(GGtools)
>library(GGdata)
> c17 = getSS("GGdata", "17")
>/////get(“CSDA", revmap(illuminaHumanv1SYMBOL))
> t1 = gwSnpTests(genesym("CSDA") ~ male, c17,
chrnum("17"))
> /////t1 = gwSnpTests(probeId(" GI_21359983-S ") ~
male, c17, chrnum("17"))
> topSnps(t1)
>plot_EvG(genesym("CSDA"), rsid("rs7212116"), c17)
>//c_full = getSS(“GGdata", as.character(1:22))
Requirements for assignment 2





Identify the genetics cause (eQTL) of the
genes selected in assignment 1
Get SNPs with significant association
(<10e-4) from each chromosome
Paste the plot image for each association
Annotate SNPs in dbSNP
Submit a report for each group