FB - TIGP Bioinformatics Program

Download Report

Transcript FB - TIGP Bioinformatics Program

Advanced Bioinformatics Core (ABC)
進階生物資訊核心設施
Chang, Chuan-Hsiung (張傳雄)
Chen, Chen-Hsin (陳珍信)
Hsu, Chun-nan (許鈞南)
Li, Kuo-Bin (李國彬)
Yang, Ueng-Cheng (楊永正)
1
Our niches
Interdisciplinary collaboration
基因體
相關資訊
資訊科學技術
Information
Technology
IT
Vision-based R & D
Data collection
Analysis tools
Databases
Workflow
比較生物資訊
Comparative
Bioinformatics
CB
FB
功能生物資訊
Functional
Bioinformatics
GS
基因體研究統計
Genomic
Statistics
預防性醫學
個人化醫學
Interpretation
R&D is the basis for service and collaboration
2
單一窗口處理服務申請
使用者
單一窗口
任務編組小組會議
CB
登錄
瞭解問題
no
FB
GS
IT
登錄
分析結果上線
進度報告
使用者
完成
後續諮詢
yes
初部規劃
綜合分析結果
對使用者說
明分析方法
品質管制
fail
平台分析
客制化服務
合作性服務
pass
3
Online service
http://abc.binfo.org.tw/
•
Comparative bioinformatics
– Bacterial Genome Annotation System
(bGAS)
– Genome Comparison Tools (include
CAGO, CAMP, CICP)
•
•
– Pathway Knowledge Management
System (PKMS)
•
Gene variation related
Alternative splicing related
– Putative Alternative Splicing database
(PALSdb)
– Integrated splicing variants database
(ISVdb)
•
•
Disease candidate gene databases
– Spinocerebellar ataxia candidate gene
database (SCAdb)
– STR-related disease database
(STRRDdb)
– Disease associated gene database
(DAGdb)
– Encyclopedia of Hepatocellular
Carcinoma genes Online (EHCO)
Gene expression related
– Bacterial gene expression database
(BGEdb)
– Microarray Annotation and Profile
(MAP)
– Cross-Hybridization Analysis
Network of Gene Expression
(CHANGE)
Phenotype related
– Bacteria: Bacterial phenotype
database (BPdb)
– Cellular level: Integrated RNAi
database
– Organismal level: Genotype to
Phenotype (G2P)
– A functional analysis and selection
tool for SNP in large scale association
study (FastSNP)
•
Pathway related
•
Utilities
– Gene Name Service (GNS)
•
Consultation service
– http://consult.binfo.org.tw/
4
Cancers
Infectious disease
Highly heritable disease
Breast cancer
Liver cancer
Lung cancers
The same strategy may be applied to all types of cancers
5
Value-added information and tools
New method: top-down
Gene
variation
Genome
Gene variation
• Functional Analysis and Selection
Tool for SNP (FastSNP) in large
scale association study
Alternative splicing
Risk factor
Genotype
Pathway
analysis
Literature
mining
Disease
• Putative Alternative Splicing
(PALS) db
• Integrated splicing variant (ISV) db
Pathway analysis
• Pathway knowledge Management
System (PKMS)
Phenotypes
• Disease Associated Gene (DAG) db
• Gene to Phenotype (G2P) db
• Integrated RNAi db
6
Two ways to collect information:
web wrapper agent and text mining
Gene Symbol
Candidate Gene Approach
SNP rsID
Single SNP (batch)
Chromosome
SNP Search
Gene name service
Text mining
Novel SNP
ESEfinder
RESCUE-ESE
Ensembl
Agent Starter
dbSNP
TFSEARCH
PolyPhen
Function
Report
FastSNP
Swiss-prot
NCBI GenBank
Prioritization
7
World’s most accurate automatic gene name
identification from biomedical literature
BioCreAtIvE - Critical Assessment for
Information Extraction in Biology
http://biocreative.sourceforge.net/
8
Common strategy to discover the
disease mechanism
Raw data
Control
differences
Distinguish
cause & effect
Experiment
Genotyping or
Gene expression
Patterns
Look for
major factor
Mechanisms
Form hypothesis
Design therapeutic
intervention
9
cancer
More than 400 gene expression microarrays for
cervical, lung, breast, etc. cancers
were analyzed by ABC’s tools
• Design
– MIAME check list
– GESDAS (Gene Expression
Study Design and Analysis
Suite)
• Analysis
– SMD (Stanford Microarray
Database)
– GESDAS
– MAP (Microarray Annotation
and Profile)
– IPIR (integrated protein
interaction resource)
– CHANGE (Cross
Hybridization Analysis
Network of Gene Expression)
– SpliceGear and ChangeGear
• Interpretation
– PKMS (Pathway Knowledge
Management System)
– Integrated RNAi database
– DAG db (Disease Associated
Gene database)
– G2P (Genotype to Phenotype)
• Six cancer-related
publications in year 2006
10
Microarray study design
http://gears.stat.sinica.edu.tw/MIAME/MIAME.php
11
Genomic Statistics Unit for Complex Diseases
in the NRPGM Advanced Bioinformatics Core
Enhancing the web platform:
cDNA
New
New
Image plots
Affymetrix
MM larger than PM
12
Expanding GESDAS to a more comprehensive platform
“Gene-Environment Analysis Refining System” (GEARS)
for general biomarkers (not open yet)
13
Integrated Protein Interaction Resource (IPIR) =>
Microarray Annotation and Profile (MAP) =>
pathway knowledge management system (PKMS)
Red: ER+
Green: ERYellow: ER+ and ERNo PPI expansion
With PPI expansion
14
World’s Most Accurate Protein Subcellular Localization Image
Classifier (July 2006 – Present)
Previous best result: 83%
Our preliminary result: 93%
Publications
•
Y.-S. Lin et al. Boosting Multi-Class Learning with Repeating Codes. In TAAI
2006 Conference on Artificial Intelligence and Applications. December, 2006.
•
C.-C. Lin et al. Boosting Multiclass Learning with Repeating Codes for Protein
15
Subcellular Localization. Submitted, 2007
Cancers
Infectious diseases
Highly heritable diseases
Infectious
diseases
Taiwan Pathogenic Microorganism Gene
Database (TPMGD) for CDC, Taiwan
16
Integrate sequence with
epidemiology information
Dec. 2005 – Dec. 2006
17
同樣的系統,以 EpiNet 為名,對學術界開放
使用者身份切換成功
發佈最新消息與新聞管理
搜尋、顯示欄位的管理
資料管理者的權限
新增與管理資料庫內容
資料查詢及瀏覽
個人工作區操作及序列分析
18
Advanced Bioinformatics Core
18
The next generation bioinformatics tool for biomedical
scientists: Web service & workflow tool
19
Comparative bioinformatics tools
Vibrio vulnificus strain-specific plasmid genomes.
bGAS (bacterial Genome
Annotation System)
Integrated
Comparative Analysis
Platform (iCAP) for
Genomic Data
20
bGAS (bacterial Genome Annotation System)
21
22
23
Research method
Cancers
Infectious disease
Highly heritable diseases
Genome
Disease gene
Genes
Candidate genes
Candidate
region
Chromosome
Linkage analysis
Genotyping
Disease
Schizophrenia
24
Example of providing integrated service: Searching
for Disease-Associated Gene Variations
Collect information
(IT,FB)
Sequencing Core
Look for gene
variation (CB)
Integrate information
& Primer design (FB)
Priority setting (FB)
Integrate information
perform quality control (FB)
Candidate gene variation
& disease phenotype (GS)
25
Gene variation detection and
gene-gene interaction
• Design
–
–
–
–
–
FastSNP
ISV db
PALS db
PipMaker pipeline
Primer3
• Analysis
– PolyPhred pipeline
– GAP (Generalized
Associated Plots)
analysis
60 primer pairs were designed
18,000 sequences were compared
103 Variation sites were found
* 68 were not reported before
* 20 variation sites may related to phenotype (need more samples)
26
Synergy is emerging from
collaboration
Gene expression
• Help a single project to
integrate different types of
Gene
related
information
information
• Make new observations by
ABC from
integrating data
different users
Sequencing
Genotyping
Proteomics
Phenotype
related
information
PET gene probe
RNAi
Mouse mutagenesis
27