Public Microarray Databases

Download Report

Transcript Public Microarray Databases

Using Internet Databases
to Collect Information on
Bladder Cancer
Riham Soliman
Research assistant
Bioinformatics group
Objectives of talk
1.
Outline the importance of web-based public databases in
the medical field
2.
Necessity of having a biomedical research portal
containing information collected from experiments on
Egyptian samples
3.
Outline our research goals
4.
Explain our research and its benefits to the Egyptian
community
Introduction
The basics
DNA double-helix
Cytoplasm
Nucleus
From: iGenetics CD-ROM (Animation Chapter 1: Genetics: An Introduction)
Molecular genetics
3 billion nucleotides

Nucleotides are molecules
constituting the DNA doublehelix
C
A
G
T
G C
T
G
C
C

All our traits are encoded in
DNA

Genes are specific
sequences of nucleotides
that characterize our traits
passed on from parents
G
Modified from: iGenetics CD-ROM (Animation Chapter 2: DNA as
Genetic Material: The Hershey-Chase Experiment)
C
A
G
T
Complements
Gene expression

How is DNA transformed into functional output for the cell,
and consequently organism, survival?
Central dogma
DNA  RNA  protein
Transcription

Translation
Gene expression analysis can be performed by studying


RNA level- transcriptome
Protein level- proteome
Genetic mutations

Changes in the genetic
sequence

Required for genetic diversity
among individuals

Disease-causing mutations



Deletions
Insertions
Duplications
http://www.genome.gov//Pages/Hyperion/DIR/VIP/Glossary/Illustration/mutation.cfm
What is cancer?

Normally cells will grow and divide until organism has
completed development

Some cells retain ability to grow and divide long after
termination of development  carcinogenesis

Uncontrolled cell division arises

The cell only cares about making more copies of itself
rather than undergoing proper division
Cancer-causing mutations

Tumor suppressor genes (TSG)
 Mutations might cause under expression of TSG

Proto-oncogenes
 Mutations cause them to become over expressed
 Become oncogenic (cancer-causing)

Carcinogenesis is a multi-step process
 A single mutation is not enough
 Accumulation of more than one mutations is necessary
Mutagenesis: multi-step
http://www.cancervic.org.au/about-cancer/what_is_cancer
Bioinformatics: a history

Is an interdisciplinary discipline combining medicine, biology, computer science and
mathematics.
 Serves the biological and medical community
 Based on computational power

Dates back to 1960s
 Discovery of DNA double helix
 Discovery of genes; contain information guiding building of all cellular
components.

Human genome project
 Completed in 2003
 Sequencing of the entire human genome

Today
 Challenge of amalgamating large amounts of data from biomedical research
 Genetic research
 Molecular research
Databases and information
stored within them
Why are databases necessary?

Data provided is tailored to scientist’s requirement

Offers a variety of information on genes, RNA, proteins,
diagrams, images, etc.

Databases sprout collaborations between scientists




Improved research
Data sharing
Interoperability
Ease-of-access to stored data
Considers the fact that molecular scientists might not be computer proficient
Information provided on databases

Literature


NCBI (National Centre for Biotechnology Information)
General databases


Google search  Scholar
Academic databases

Ebscohost

Sequence data

Protein



Level of expression



Sequence
3D structure
Different experimental conditions comparable to physiological environment
Time-course experimentation
Protein-protein and protein-DNA interactions
KEGG
Kyoto Encyclopedia of Genes and Genomes
Cytoplasm
Nucleus
Nuclear membrane
KEGG: bladder cancer
MAPK pathway from KEGG
WikiPathways
MAPK pathway on Wikipathway:
downloaded using GenMAPP
GenMapp is an open source bioinformatics application to
visualize metabolic pathways
BioCarta: MAPK pathway
Data extraction from NCBI

National Center for Biotechnology Information.

Run and maintained by collaborative efforts of computer
scientists, molecular biologists, biochemists, research
physicians and structural biologists.

Provides information on diseases, genes, gene sequences,
gene transcripts, proteins, protein interactions, function,
additional resources.
Types of services offered by NCBI

PubMed




BLAST (Basic Local Alignment Search Tool)




Most famous tool on NCBI
Used for pair-wise sequence comparison
Identification of novel sequences and/or determining their property(ies).
Entrez




Literature search service of the National Library of Medicine.
Access to over 16 million citations linked to participation online journals.
Speed, efficient, easy to use.
One of the most popular search engines in NCBI
Search query can be name of gene, protein (if different) or accession number for the gene,
RNA or protein.
A plethora of relevant information produced
OMIM (Online Mendelian Inheritance in Man)

Used mostly by physicians and medical investigators interested with genetic disorders
Cancer-specific databases
caBIG

Is an information network connecting the cancer research community

Cancer Biomedical Informatics Grid

Provided by the National Cancer Institute (NCI) in the USA

Integrative cancer research extending from bench to bedside and
back again

Accelerate discovery of new detection, diagnostic and treatment
techniques to improve outcome

Shares information on clinical research, imaging, pathology and
molecular biology
caBIG services and resources

Domain workspaces constitute areas of interest to the cancer-researching
and medical community
1.
2.
3.
4.

Integrative cancer research (ICR) workspaces
Clinical trial management systems
In vivo imaging workspace
Tissue banks and pathology tools workspace
caBIG Tools
1.
2.
3.
4.
Bioconductor: established open-source collection of software packages for high
throughput genome analysis
caArray: open-source, web and programmatically accessible array data
management system
caIMAGE: database of cancer images
caMATCH: system that identifies patients who are potentially eligible for clinical
trials
Profiling of bladder
cancer data from public
databases
Objectives of research
1.
Collecting information on genes involved in bladder cancer.
2.
Assembling an interaction network for these genes.
3.
Identifying biomarkers
4.
Collecting expression level data, e.g., microarray data.
5.
Automatic management, processing, visualization of this data.
Figure 1.3: Age-standardised (World) incidence rates for bladder
cancer, by sex, world regions, 2002 estimates
Egypt
Southern Europe
Northern America
Western Europe
Northern Africa
Northern Europe
Australia/New Zealand
Central & Eastern
Western Asia
Southern Africa
South America
Japan
Caribbean
Central America
Polynesia
Eastern Asia
Eastern Africa
South -Eastern Asia
South-Central Asia
Western Africa
China
India
Micronesia
Middle Africa
Melanesia
Males
Females
0
5
10
15
20
25
30
Rate per 100,000 population
35
40
Source: http://info.cancerresearchuk.org/cancerstats/types/bladder/incidence/
Bladder cancer stages
From: OXFORD,G.A.R.Y. and THEODORESCU,D.A.N. Review Article: The Role of Ras Superfamily Proteins in
Bladder Cancer Progression, The Journal of Urology, 170: 1987-1993, 2003.
Bladder cancer types
Squamous cell
carcinoma
Carcinoma in situ
Transitional cell
carcinoma
Superficial (low
grade)
From: http://cornellurology.com/bladder/gi/types.shtml
Metastatic
transitional cell
carcinoma
Invasive (high
grade)
Aetiology of bladder cancer in Egypt

Cigarette smoking (3-7 fold risk) (Samanic et al. 2006)

Aromatic amines
 Occupational hazard

Schistosomiasis (Michaud, 2007)
 Bathing in infested waters
 Working in fields
 SCC was more common TCC during times of high
schistosomiasis.
Genes involved in bladder cancer

To identify genes involved in bladder carcinogenesis and
progression, internet research was performed to gather
information about these genes.

Sources

Publicly available databases e.g.






NCBI www.ncbi.nlm.nih.gov/
KEGG http://www.genome.jp/
BioGRID http://www.thebiogrid.org/
GeneOntology http://amigo.geneontology.org/
Ensembl www.ensembl.org/
Literature search using Pubmed (NCBI) and Google.
Data collection



Genes were collected using Boolean queries, e.g., “Bladder
cancer, name of gene”.
We identified 261 genes related to bladder cancer
Data was summarized in a list containing gene information
and interacting genes.







Gene name, NCBI accession number, URLs
Chromosome locus
Protein-protein interactions
Function in normal cell
Function in bladder cancer cell
Diagnostic/prognostic potential or use
Literature
Data annotation
Biomarker identification

Target in cancer research is mainly to predict tumor behavior.


Early diagnosis
Prevent delayed treatment situations

We need to distinguish harmless early lesions from those that will
progress into cancer.

Depends on good tests and tools.

Current diagnosis of bladder cancer: cystoscopy.

Research community is developing good biomarkers for this
purpose.

Biomarkers are molecules that could be targeted in therapy.
Biomarkers in use
Marker
Sensitivity
%
Specificity
%
Method of detection
Manufacturer
NMP22
47-87
58-91
Enzyme immunoanalysis
Matritech
Bard Diagnostics
BTA STAT
57-82
61-82
Antigen-antibody
colorimetric
BTA TRAK
55-80
38-98
Enzyme immunoanalysis
Bard Diagnostics
Intracel Corp
Oncor
FDP
41-93
77-94
Antigen-antibody
colorimetric
Telomerase
53-91
46-99
Polymerase chain reaction
79-90
Immunofluorescence
immnoassay/ cytology
Immunocyt
86-95
Diagnocure
Quanticyt
45-59
70-93
Morphometry
Gentian Scientific
Software
UBC
59-79
84-96
Enzyme immunoanalysis
IDL
57-78
Eelectrochemiluminescence
assay
Roche
Diagnostics
Eichrom
Technologies
CYFRA 21-1
74-99
BLCA4
85-96
85-100
Enzyme immunoanalysis
Hyaluronic
acid/hyaluronidase
82-92
83-96
Enzyme immunoanalysis
Markers inserted onto KEGG’s bladder
cancer network
GPSM2
NMP22
Hyaluronidase
Hyaluronic acid
CD44
Microarray technology
Measuring gene expression
Gene expression analysis:
Transcriptomics

Microarray technology: the study of mRNA levels in cells


Transcriptome
Looks at the abundance of the transcript for thousands of genes

High throughput
http://en.wikipedia.org/wiki/DNA_microarray
cDNA microarray
Custom-made
Oligonucleotide
Ready
Revolutionized by Affymetrix
company
Affymetrix
array
Cancer
Control
Up regulation
Down regulation
Differential expression
From : http://www.fastol.com/~renkwitz/microarray_chips.htm
Output of microarray

Raw image is usually a 16-bit TIFF file.

Microarray image processor converts color intensities into
raw quantitative data (probe-level data)

No immediate observations can be made concerning gene
expression from raw data

Statistical analysis applications are used to interrogate
the data for information on gene expression patterns
Raw data storage
Modes of data storage
As files
•Data is stored directly on the
institution’s or lab’s computer
•Does not require special
software
•Difficult to track and query the
data if larger experiments are
performed.
In local databases
•Commercial or academic
•Allows local storage of data
•Good tracking and management
of experimental data and
integration with public MA
databases.
•Requires purchase, installation
and maintenance of complex
software
Public and commercial microarray
databases
PUBLIC







GEO (Gene expression
omnibus)  NCBI
ArrayExpress (EBI-EMBL)
caBIG
SMD (Stanford microarray
database)
Yale microarray database
RED (Rice expression
database)
Oncomine
COMMERCIAL
 Oncomine
 Array Informatics
 Limas
 GeNet (Russian website)
OTHER
 CleanEx (SIB)
 GenMAPP
Our bladder cancer microarray data
collection

Queried “Bladder cancer” using all public databases
identified

Collected 14 data sets on bladder cancer
 ArrayExpress
 GEO
 Oncomine

Based on literature, there are unpublished data sets
Gender
Disease
state
Disease
staging
Precomputational analyses

Some databases provide information from preliminary
analysis on data.

Make data exploration much easier and quicker for the
user.

Oncomine
ONCOMINE™ RESEARCH
ONCOMINE performs pre-computations on data to
make data exploration much easier and quicker
Oncomine is made up of 3 layers
•
Data input
•
Data analysis
•
Data visualization
Single and multiple experiment analyses
Single-experiment analysis
Outlier
Largest value
Upper quartile
Median
Lower quartile
Smallest value
Outlier
Multiple-experiment analysis
SIB (Swiss Institute of Bioinformatics)

Research groups based in different European
countries.

The main goal is to provide a bioinformatics platform
conglomerating as well as analyzing different data sets

CleanEx  microarray database
 Data is analyzed
 into their portal for easier access and interpretation
CleanEx
•
Provided through the Swiss Institute of Bioinformatics (SIB)
•
Service similar to ONCOMINE but gathers data sets only from GEO
Does not allow profile visualization
Collecting information on
bladder cancer in Egypt
specifically
Published article on bladder cancer in
Egypt



Ewis et al. (2007) studied bilharzia-associated SCC
(squamous cell carcinoma)
Analysis performed using with microarray
17 patients diagnosed at the Egyptian National Cancer
Institute.
RESULT
 Showed a change in expression- differential expression in 82 genes
 38 genes up regulated
 44 genes down regulated
Our own data analysis on Ewis et al.
data
1.
Annotated information gathered on each of 82 genes
2.
Compared expression pattern for each gene with other
data sets from public, free databases
3.
Identified 7 genes from the Ewis study showing opposition
to all other datasets collected
4.
Identified 3 genes from the Ewis study correlating in
expression with other studies from databases
5. Gathered more detailed information on the 7 genes



Where do they lie in our KEGG pathway network
How vital are they to cell function
Does Ewis data make sense (based on the known function)?

Discrepancies found in results  Keratin 16
KEGG BC pathway with all significant
markers for research
KRT16
Not much data provided on the remaining proteins
TGFBR
SMAD4
WE NEED
TO UNDERSTAND
THEIR FUNCTION
SMAD2/3
TGFβ
ACVR1B
JNK
KRT7
Modified from the KEGG database
CONCLUSION
Follow up of Ewis et al. study
PROS
 Offers good preliminary
information on bilharziaassociated bladder cancer in
the Egyptian population
CONS
 Several mistakes detected in
annotation
 Pooled samples
 Only SCC studied
 Does not explain the present
discrepancies in the results
e.g. Keratin 16
FOLLOW UP STUDY IS NECESSARY TO UNDERSTAND
DISCREPANCIES AND GENETIC DIFFERENCES
BETWEEN WESTERN AND EGYPTIAN PATIENTS
Problems with data collection
1.
Information in databases is expanding as more research is
carried out.
2.
Each public database does not have a complete
representation of all molecules.

Time-consuming to look through several databases.
3.
There is no bladder cancer-specific database.
4.
Automated methods are needed to update the data.
Long-term objectives of our study
Determine the genetic and molecular profile of the
Egyptian bladder cancer patients
1.


Based on histology
Based on the bilharzial status
2.
Identify biomarkers to use as drug targets in a clinical
setting
3.
Improve treatment modalities

Tailored to the Egyptian profile
Thank you