Transcript slide
知識發掘之發展與應用
蔣以仁
I Know That I Don‘t Know Anything.
1
知識
易經
觀乎天文,以察時變;觀乎人文,以化成天下
三字經
知某數 識某文
老子『論知識』
知者弗言,言者弗知。
知人者智,自知者明。
莊子
知道不知的道理,就達到最高的境界了
buste de Socrate
I Know That I Don‘t Know Anything
Francis Bacon
Nam et ipsa scientia protestas est (知識就是力量)
為何知識管理如此迫切?
“The chief economic
priority for developed
countries is to raise
the productivity of
knowledge . . . The
country that does this
first will dominate the
twenty-first century
economically.”
開發中國家首要經濟目標為
知識的創造力…誰先掌握誰
就統領二十一世紀的經濟
Peter F. Drucker
知識經濟時代的來臨
微軟總裁比爾.蓋茲(Gates, 1999)
在《數位神經網路》一書中,更明白指出未來的
企業是以知識與網路為基礎的企業。
知識經濟的核心理念
知識經濟具有以下十項核心理念(高希均,民89):
(1)知識獨領風騷
(2)管理推動變革
(3)變革引發開放
(4)科技主導創新
(5)創新顛覆傳統
(6)速度決定成敗
(7)企業家精神化不可能為可能
(8)網際網路超越時空限制
(9)全球化同創商機與風險
(10)競爭力決定長期興衰
知識經濟的定義
知識經濟乃是一種強調知識的創造、傳播與運用之經濟;換
言之,知識經濟的真正意涵在鼓勵知識的創造,將這些知識
有效地散播出去,並讓這些知識能廣泛地被運用於經濟發展
的整套體制。
張忠謀(2002)
「知識經濟的重點不是知識,而是轉知識為利潤,所以
「使用」科技知識比「擁有」科技知識更重要。
美國前總統柯林頓(Clinton)
知識經濟係以科技為燃料,由創業精神(企業家精神)
(entrepreneurship)及創新(innovation)所驅動的
新經濟運作模式。
知識經濟時代最重要的兩件事就是知識管理與創新
知識保存價值
企業知識的保留與轉換
知識資產的投資
精簡與退休
人員輪替
生產力
52%
能力
重複能量消耗
過多的會議
溝通問題
組織目標
下達決策
可行性
快速
非正規
美國企業之知識管理
價值
管理迅速變革
保留競爭的
3%
重要策略
12%
新舊技術
組織與使
用企業有
價值資料
增加
生產力與品質
企業知識的轉換
快且有效的決策
課程
創新
群策群力
… 等等
33%
減少
循環時間
反應時間
重複投資
作業花費
會議時間
外界顧問
…等等
7
知識管理6C的觀念
Collect蒐集:累積並蒐集個人知識專業技能
Clarify確認:確認並篩選所要擷取的知識內容
Classify分類:便於檢索或搜尋
Communicate溝通:虛擬溝通環境之建置
Comprehend了解:增進組織及個人間的了解
Create創新/分享 :知識創新並提升組織整體能力
知識架構
資料 (文字) 探勘
知識階層
常識
知識
資訊
資料
訊息
資源分佈
區別
從台北到機場搭機到高雄開會。
資料: 100, 12, 34, 15.
資訊: 颱風在台北東南方100公里海面,以時速12公
里朝西北34度方向前進,瞬間最大陣風15級風 。
知識: 天氣會造成延遲或使你必須取消與會。
常識: 可能必須改訂下一班火車去高雄以能趕上會
議。
知識架構
推手
領導
文化, 結構
網路
連結, 聯繫
虛擬
會議
通訊
功能
(整合式)
分享資源
應用
資訊庫
(虛擬檔案館)
腦力激盪- 解題 – 專案控管
需求分析 – 設計 – 測試 – 分析 等.
實證 & 案例推論
案例推理
實證推理
病例診斷
案例研究
實證 & 案例研判
文獻
實證佐證
相似案例
概念查詢
經驗分析
個案分析
資料探勘的應用領域
零售業-於銷售資料中採礦顧客的消費習性,並可藉由交易紀錄找出
顧客偏好的產品組合,找出流失顧客的特徵與推出新產品的時機點等
等
直效行銷業-其強調的分眾概念與資料庫行銷方式在導入探勘的技術
後,使直效行銷的發展性更為強大,例如利用資料探勘分析顧客群之
消費行為與交易紀錄,結合基本資料,並依其對品牌價值等級的高低
來區隔顧客,進而達到差異化行銷的目的
製造業-其對資料探勘的需求多運用在品質控管方面,由製造過程中
找出影響產品品質最重要的因素,以期提高作業流程的效率
財務金融業-利用資料探勘來分析市場動向,並預測個別公司的營運
或是股價利率等的走向
醫療業-用來預測手術、用藥、診斷、或是流程控制的效率
詐欺行為的偵測(Fraud Detection)-電話公司、信用卡公司、保險
公司、股票交易商、以及政府單位等等,這些行業每年因為詐欺行為
而造成的損失都非常可觀。Data Mining可以從一些信用不良的客戶資
料中找出相似特徵並預測可能的詐欺交易,達到減少損失的目的
其他-在NBA球賽資料中,找出球員的強弱點/星際星體分類/從太空
船拍攝的影像資料,找尋星球上的火山
資料知識形成流程
Integration
Interpretation/
Evaluation
Data Mining
Transformation
Raw
Data
Preprocessing
Knowledge
Selection/
cleansing
Pattern
Preprocessed
Data
Target Data
Data
Warehouse
Understanding
Transformed
Data
14
BI 結構
Information Sources
Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
e.g., MOLAP
Semistructured
Sources
Data
Warehouse
extract
transform
load
refresh
etc.
OLAP
serve
Query/Reporting
serve
e.g., ROLAP
Operational
DB’s
serve
Data Marts
Data Mining
15
Gaining market intelligence from news
feeds
16
Sreekumar Sukumaran and Ashish Sureka
Integrated BI Systems
Intermedia Data
ETL
Complete Data
Warehouse
RDBMS
Text taggor & Annotator
ETL
Structural Data
DBMS
File System
XML
XML
Unstructured Data
EA
Legacy
CMS
Scanned
Documents
Email
17
Sreekumar Sukumaran and Ashish Sureka
知識來源與價值
100
90
網路訊息
新聞報導
專利
電子郵件
文件…
80
70
60
50
非結構資料
40
結構化資料
30
20
10
0
資料量
市場化價值
“On average, professional users spend 11 hours per week looking for
information. Seventy-one percent said they could not find what they were
looking for."
— "Information Management Software"
Lazard Freres & Co. LLC
February 2001
"The volume of digitized information will double every year from 2000 to
2005
(an increase to 30 times today's volume)."
— "Knowledge Management vs. Information Management"
Gartner Group
September 2000
18
文獻問題
出版統計
8TB(書籍), 25TB(新
聞), 20TB(雜誌),
2TB(期刊)
平均每分鐘科學知識增
加2000頁
新材料的閱讀須時5年
(24hrs/day)
How Can I Keep Up With the Literature?
19
Find the Evidence
Problems using MEDLINE:
No articles retrieved
“Answers” definitively answered
years ago
Manifestations of Renal TB
Viral/Bacterial bronchitis: Duration of
Symptoms
Legionella: prevalence of relative
bradycardia
Acute allergic episodes: ?
thrombocytopenia
MEDLINE indexed using a system
obtuse to most clinicians
Too many articles retrieved
Evolution
“To study history one must know in advance
that one is attempting something
fundamentally impossible, yet necessary
and highly important.”
Father Jacobus (Hesse's Magister Ludi)
Das Glasperlenspiel (The Glass Bead Game)
21
資訊巨幅成長
2006 年數位資訊量已達 1,610 億GB( 相
當於 161 Exabytes) 。
IDC 預估從 2006 至 2010 年間,資訊成
長量約為六倍。
2010 年時,有近 70% 的數位世界的資訊
是由個人使用者所創造,而至少有 85%
的資訊量是組織企業必須負起資訊安全、
隱私、可靠性及相關法規遵從的責任。
The Expanding Digital Universe,
http://www.emc.com/leadership/digitaluniverse/expanding-digital-universe.htm
100
網路訊息
新聞報導
專利
電子郵件
文件…
90
80
70
60
50
非結構資料
40
結構化資料
30
20
10
Oracle
0
資料量
市場化價值
網路搜尋引擎
以離線方式抓去網頁,透過建立一種內部資料儲
存方式,稱之為 (反轉;inverted) 索引,儲存資
料
線上檢索
Monika Henzinger, Search Technologies for the Internet
Science, Vol. 317. no. 5837, 468 – 471, 27 July 2007
Search Engine Problems
Index Comprehensiveness
Relevance
Deterministic Search
Search Query
Jaguar(Animal)
Jaguar(Automobile)
Problem: Scalable
J, Beall, The Weaknesses of Full-Text Searching. The Journal of
Academic Librianship, 34(5):438-444, 2008.
搜尋引擎之演進
1995-1997 AV,
Excite, Lycos, etc
第一代– 只使用“網頁內”文字資料
字頻, 語言
第二代--使用非頁內, 網路上特殊屬性資料
From 1998. Made
連接分析
popular by Google
點擊資料 (What results people click on)
but everyone now
下錨文字 (Hyperlinks, How people refer to this page)
第三代– 回答 “查詢所知”
語意分析 -- what is this about?
專注使用者所需, 非僅僅查詢
關鍵資料之推定
輔助使用者
整合搜尋及文件分析
Still experimental
網路搜尋問題
問題
查詢過於簡短不夠精確
同意與相似字詞讓查詢匹配度難預期
網頁作者混淆式安排, 讓搜尋結果差強人意
使用者需要額外功能, 如過濾器
解決
增加理解
結果排列
Trailblazer
Car
Basketball team
Monika Henzinger, Search Technologies for the Internet Science, Vol. 317. no. 5837, 468 – 471, 27
July 2007
Expand
分群檢索
1.
2.
Walter Warnick, Problems of Searching in Web Databases. Science .
Vol. 316. no. 5829, 1284, June 2007.
I-Jen Chiang, Discover the Semantic Topology in High-Dimensional
Data, Expert Systems with Applications, 33 (1), September, 2007.
Gartner 2005 Hype Cycle for
Emerging Technologies
http://www.gartner.com/resources/130100/130115/gartners_hype_c.pdf
Gartner 2006 Hype Cycle for
Emerging Technologies
Mashup can quickly meet tactical needs with reduced
development costs and improved user satisfaction.
Applications Architecture
Enables new ways to performing vertical applications that
will result in significantly increased revenue or cost savings
for an enterprise.
Enables new ways of doing business across industries
that will result in major shifts in industry dynamics
Real World Web
http://www.gartner.com/it/page.jsp?id=495475
知識產生
t1 t2 … t n
d1 w11 w12… w1n
d2 w21 w22… w2n
……
…
dm wm1 wm2… wmn
Stemming & Stop words
Raw text
tt
t
t tt
分群
Doc
similarity
d
d dd
d
d
dd
d d
d d
dd
Term Weighting
Tokenized text
tt
t t tt
Term
similarity
Sentence
selection
摘要
META-DATA/
ANNOTATION
Vector
centroid
d
分類
32
Text ETL to Mining
Mining target: individual text
Mining unit:
>texts
>category labeled items extracted from
text using NLP
IBM TAKMI
(Nasukawa, Nagano,1999)
Original Data
Structured Data
Call Taker: James
Date: Aug. 30, 2002
Duration: 10 min.
CustomerID: ADC00123
Q: cust sys has stopped
working.
A: checked cust bios and
it need updated. …
Unstructured Data
Meta Data
Category
Category
Dictionary
Synonym
Dictionary
Linguistic
Analysis
Item
[Call Taker] James
[Date] 2002/08/30
[Duration] 10 min.
[CustomerID] ADC00123
[Noun] Customer
[Software] BIOS
[Subj...Verb]
customer system..stop
[SW..Problem] BIOS..need
Tagging
Dependency Analysis
Named Entity Extraction
Intention Analysis
Visualization &
Interactive Mining
Mining
33
Text is Tough
其係一個極不容易表達的抽象性概念
(AI-Complete)
是許多概念彼此間抽象而複雜的無盡關係組合
一種名詞可以代表很多不同的概念
CELL, IV
類似的概念也有很多種方式可以表達 (aliases)
space ship, flying saucer, UFO, figment of imagination
概念是很難加以視覺化的
高維度
其分析構面可能高達成百上千
34
Text Mining is Easy
重複性很高
只要一些簡單的演算法,就可以從一些極為粗糙的
工作中,得到不錯的結果
找出重要片語
找到有意義的相關字
從文章中建立摘要
主要問題:
結果評估
必須定義目標及目的
35
Luhn's ideas (1958)
It is here proposed that the frequency of word occurrence
in an article furnishes a useful measurement of word
significance. It is further proposed that the relative
position within a sentence of words having given values of
significance furnish a useful measurement for determining
the significance of sentences. The significance factor of a
sentence will therefore be based on a combination of
these two measurements.
Luhn, H. P. (1958). The automatic creation of literature abstracts.
IBM Journal of Research and Development, 2, 159-165.
36
van Rijsbergen 79
資訊萃取
foodscience.com-Job2
JobTitle: Ice Cream Guru
Employer: foodscience.com
JobCategory: Travel/Hospitality
JobFunction: Food Services
JobLocation: Upper Midwest
Contact Phone: 800-488-2611
DateExtracted: January 8, 2001
Source:
www.foodscience.com/jobs_midwest.html
OtherCompanyJobs: foodscience.com-Job1
37
Internet
Collaborative Environment
Library
catalogs
Locally held
data
Public
repositories
Commercial
data sources
Agency data
sources
Spiders
Search engine
Search engine
Search engine
Search engine
Dynamic
content
Search engine
Search engine
Metasearch Tool
Custom
content
Automated
categorization
Taxonomy-driven web portal/Security control
Personalized
access
Virtual
Reference
Email alerts
Online
collaboration
Data/Text
Mining
Visualization
Text Analysis Spectrum
Targeted Facts
and Events
Classification
Concept
Identification
Entity Extraction
Clustering
What is this
document about?
Who did
what to
whom when
where, etc.
39
Why is getting dimensional data
so hard?
Hank bought plastic explosives from Henry in
Tucson yesterday.
Named Entity Extraction
Hank
People,
Weapons,
Vehicles,
Dates
Henry
NER
Engine
FrameNet
Plastic explosives
11/01/07
Tucson
40
Name Extraction via MMs
The delegation, which
training
sentences
included the commander
of the U.N. troops in
Bosnia, Lt. Gen. Sir
Michael Rose, went to
the Serb stronghold of
Pale, near Sarajevo,
Speech
Speech
for talks with Bosnian
Recognition
Serb leader Radovan
Text
Karadzic.
Training
Program
answers
NE
Models
Entities
Extractor
An easy but successful HMM application:
•Prior to 1997 - no learning approach competitive
with hand-built rule systems
•Since 1997 - Statistical approaches (BBN (Bikel et
al. 1997), NYU, MITRE, CMU/JustSystems) achieve
state-of-the-art performance
The delegation, which
included the
commander of the
U.N. troops in Bosnia,
Lt. Gen. Sir Michael
Rose, went to the
Serb stronghold of
Pale, near Sarajevo,
for talks with Bosnian
Serb leader Radovan
Karadzic.
Locations
Persons
Organizations
41
NER
42
Annotation and Tagging
Date
Acquiring
Organization
Acquisition
Event
Acquired
Organization
On November 16, 2005, IBM announced it had acquired Collation, a privately held company
based in Redwood City, California for undisclosed amount.
Place
Amount
Output to
RDBMS
Text Annotator
Date
Organization
Place
Amount
Nov. 16
IBM
Redwood
City, CA
Undisclosed
XML
output
On <Date>November 16, 2005</Date>, <ACQUIRING ORG>IBM</ACQUIRING ORG> announced it had
<ACQUISITION EVENT>acquired</ACQUISITION EVENT> <ACQUIRED ORG>Collation</ACQUIRED
ORG>, a privately held company based in <PLACE>Redwood City, California</PLACE> for
43
<AMOUNT>undisclosed</AMOUNT> amount.
醫學文獻告訴我什麼
醫學文獻來源:Medline
可發現疾病、症狀與藥物或化合物
的因果關聯
1.
Swanson DR. Searching natural language text by computer. Machine indexing and text searching offer an
approach to the basic problems of library automation. Science. 132:1099–1104, 21 Oct. 1960.
2. Swanson DR. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med.
30(1):7–18, 1986.
3. Swanson, D.R., Complementary structures in disjoint science literatures. In A. Bookstein, et al (Eds.),
SIGIR91: Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and
Development in Information Retrieval Chicago, Oct 13-16, 280-289, 1991.
偏頭痛?
Stress is associated with migraines
Stress can lead to loss of magnesium
Calcium channel blockers prevent some migraines
Magnesium is a natural calcium channel blocker
Spreading cortical depression (SCD) is implicated in some
migraines
High levels of magnesium inhibit SCD
Migraine patients have high platelet aggregability
Magnesium can suppress platelet aggregability
Smalheiser, N.R. & Swanson, D.R.. Assessing a gap in the biomedical literature: Magnesium deficiency and
neurologic disease. Neuroscience Research Communications, 15, 1-9, 1994.
文獻實証
All Migraine
Research
migraine
CCB
PA
SCD
stress
All Nutrition
Research
magnesium
找出新線索
雷諾氏現象
Hypothesis generation
Raynauds
Fish oils
vasoconstrictions
血管收縮
platelet aggregation
血小板活化凝集
blood viscosity
粘滯血症
Intermediate concepts
Swanson, D.R. (1994). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med.
Autumn;30(1):7-18, 1986 .
Literature processing
MEDLINE
citations
MetaMap
NER
Annotated citations
UMLS
EBP
Domain Model
Document
Retrieval
Query terms
E-Utilities
Essie
Knowledge
Extraction
PICO Query
Formulation
Question
frame
Answer
48
Semantic
processing
Semantic
matching
Answer
Generation
Document
frame
Clinical Task
Classification
Strength of
Evidence
Classification
Dina Demner-Fushman
Semantic processing example
Semantic processor
Problem Extractor
Population Extractor
Intervention Extractor
Outcome Extractor
Task Classifier
Strength of Evidence
Classifier
49
Amiodarone versus diltiazem for rate
control in critically ill patients with atrial
tachyarrhythmias.
… Patients with atrial fibrillation (n =
57), … were randomly assigned to one of
three intravenous treatment regimens.
Group 1 received diltiazem … group 2
received amiodarone ….
Sufficient rate control can be achieved
in critically ill patients with atrial
tachyarrhythmias using either diltiazem
or amiodarone …
Task: Therapy
Strength of Evidence: A (RCT)
Dina Demner-Fushman
Outcome extractor
Problem Extractor
Population Extractor
Intervention Extractor
Base classifiers
Cue-terms
Heuristic
N-gram
Multiple
Linear
Regression
Metaclassifier
Naïve Bayes
Position
Length
Score: 0.99
Sufficient rate control can be
achieved in critically ill
patients with atrial
tachyarrhythmias using
either diltiazem or
amiodarone.
Score: 0.75
Although diltiazem allowed
for significantly better 24-hr
heart rate control, this effect
was offset by a significantly
higher incidence of
hypotension requiring
discontinuation of the drug.
Training: 275 manually annotated abstracts
50
Dina Demner-Fushman
概念分群
{sun} {sun, beach}
Frequent term set:
{beach}
document
C2
Clustering:
C3
cluster
C4
C1
{C1, C2, C4, C5}.
Clustering Description:
C5
{surf, sun, beach, fun}.
Document
Collection
{surf}
{fun}
51
Anopheles
52
Mooter
科學人雜誌 3月號
53
文件資料分群
1.
2.
Walter Warnick, Problems of Searching in Web Databases. Science .
Vol. 316. no. 5829, 1284, June 2007.
I-Jen Chiang, Discover the Semantic Topology in High-Dimensional
Data, Expert Systems with Applications, 33 (1), September, 2007.
54
55
Extracting Information From Text
Ontology
Text
Minimal
recursion
semantics
representatio
ns
Database
[Deep Thought EU project]
Structuring knowledge from text
tagging, compounds, grammatical analysis,
ontological interpretation, regular expressions,
patter recognition
56
Patterns Construction
Taipei
Tokyo
Repository
Tagging &
annotation
CDW
Patterns
New York
Knowledge Repository
Or structured data
57
Knowledge Construction
Manual labor
Ontology
Domain
doc.
coll.
Statistical &
linguistic
analyses
[Brasethvik & Gulla, DKE, 38/1, 2001]
Want to extract prominent
concepts/relations from text
tagging, compounds, NP recognition, term
frequencies,
stopwords, language identification
58
Patterns
Explorer
Web Browser
Installed from http://...
Hard disk
Windows XP
crashes
is a
Desktop computer
Hard disk size 40 GB
Operating System
Products
Laptop
computers
Linux
Macintosh
59
演進
Local data
FTP
Gopher
HTML
More structure
Indexing
Search
Relevance Ranking
Latent Semantic Topology
Crawling
WebSQL
Social
Network
of
Hyperlinks
WebL
XML
Clustering
Collaborative
Filtering
ScatterGather
Topic Directories
Semi-supervised
Automatic
Learning
Classification
Web
Communitie
s
Topic
Distillation
Focused
Crawling
Web Servers
Monitor
Mine
Modify
User
Profiling
Web Browsers
人、事、時、地、物元資料
性質
refer to / refine
Conceptual Objects
人物
Physical Entities
participate in
affect or / refer to
location
Temporal Entities
時間
at
地點
61
資源索引
Ontology
expansion
CIDOC
CRM or
DC
人物
Background
knowledge /
Authorities
事件
物件
Thesauri
extent
CRM entities
Derived
knowledge
data (e.g. RDF)
Sources
and
metadata
62
(XML/RDF)
Explicit Events, Object Identity,
Symmetry
E39 Actor
E52 Time-Span
E53 Place
7012124
February 1945
P82 at some time
within
E7 Activity
E39 Actor
“Crimea Conference”
E38 Image
P86 falls within
E65 Creation Event
E39 Actor
P81 ongoing
throughout
*
E31 Document
“Yalta Agreement”
E52 Time-Span
11-2-1945
63
Rules Extraction
The formal concept C4 makes it possible the following
rules
R1 : t3 t1 t6
R2 : t5 t1 t6
R3 : t3 t5
The interpretation of the R1 and R2: The use of
terms t3 or t5 is always associated with that of
terms t1 and t6
The rule R3 express mutual equivalence of the
terms {t3,t5}: All the documents which have the
term t3 also have the t5 term.
64
病歷紀錄整合
ROYAL MARSDEN NHS TRUST - PATIENT CASE NOTE
######:MRS ##### #######
27 Aug 1998 Seen in the Follow Up Staging Clinic
This 65 year old lady has been reviewed in the Breast staging clinic.
As you know, she was originally diagnosed with a carcinoma of the left
ROYAL MARSDEN NHS TRUST - PATIENT CASE NOTE
breast in 1974 and treated with a total mastectomy. This was followed
######:MRS ##### #######
with MEFUP chemotherapy. In 1982 she noticed a lump in the
infraclavicular region which was excised and this was followed by
ROYAL MARSDEN NHS TRUST
- DIAGNOSTIC
- CT
REPORT
radiotherapy.
In 1994 she RADIOLOGY
developed a tumour in
the chest
cavity that
15 Dec 1993 General Surgical
was diagnosed
######:#######,MRS
#####with a CT guided biopsy and this was treated with VAC
I reviewed this patient in clinic today. She
has beenMARSDEN
followed
ROYAL
chemotherapy and radiotherapy to the mediastinum. Since 1994 she had
NHS TRUST - PATIENT CASE NOTE
Exam 18 Dec Examination LIVER/THORAX/ABDOMEN/PELVIS
noticed a slight deterioration and earlier this year she had problems
up for a left breast carcinoma for which she was treated with a
######:MRS ##### #######
Exam Number [NUM]
with occasional episodes of vomiting, nausea and general lethargy. She
mastectomy. She had a prosthesis removed last year and has had
Date of Birth 17 May 1933
some improvement in the symptoms of chest wall discomfort since
24intermittently.
Jan 1997
then although she still gets quite sharp pains
Ref
Seen in the Chemotherapy Clinic (TPFRIDAY)
[HCA1]
Clinical
She has been reviewed in the pain clinic local
where she
I sawto #####
today
was found to have lymphadenopathy in the right supraclavicular fossa
and was treated with Arimidex. Since being on Arimidex there was
OUTPATIENT
originally stablisation of her disease but recently it appears that the
node has started to enlarge.
in clinic. I am very pleased to say that she has
BR had
Verified by [HCA2] On examination today, she has a 1.5x1cm lymph node in the right
lives but has not had much relief of her symptoms. She feels
supraclavicular fossa and an essence of thickening probably due to
a complete response in her superior mediastinum and rightDIAGNOSIS: Carcinoma of breast.
previous therapy in the left supraclavicular fossa. She also has
though that she can bear with these and does not want any
CT scans
have been obtained through
chest,
abdomen
pelvis
with
oral
radiation
changes
in the lungand
which
produced
some
physical sign at both
supraclavicular fossa lymphadenopathy. There is some minimal
thickening
further intervention at present.
On examination today there is no sign of remaining
recurrence ofin
herthe
disease. Chest and abdominal examination
were
We might
fact
it unremarkable.
is felt that this
will see her again in a year's time.
contrast only.
soft tissues around the superior mediastinum and in
bases and there was no evidence of abdominal organomegaly.
Her recent staging investigations show that she has C5 carcinoma cells
There is thickening in the left clavicular fossa and small-
now be related to previous
present in the lymph node fine needle aspirate. A right mammogram is
volume residual abnormalities in the
mediastinum.
Comparison
unremarkable.
An ultrasound
of the liver is
wasmade
normal and a chest x-ray
showed
thickening present
in the left axilla due to
radiotherapy. To be honest, however, symptomatically there
withhas
thebeen
most recent scan (21.7.95)
and some
theresoft
is tissue
no discernible
change
28/03/2003, 10:35:26
little in the way of benefit with overall palliative response by
of CT
no criteria.
previous therapy. There is also some loss of volume in the left upper
zone but no lung nodules seen. A bone scan shows evidence of
Lung changes, which may have been
relatedchanges
to radiotherapy,
nowofless
degenerative
but no specific are
evidence
bony metastases. Her
change. She is tolerating the treatment fairly well. Interestingly she
extensive.
thyroid function tests show that the TSH is 0.12 and her free T3 are 4
which indicates that the TSH is slightly low. This does not amount to
has had virtually complete alopecia with the treatment. SheThere
has are
been
on
no abnormally-enlarged
nodes in the retroperitoneum
primary hypothyroidism but it would be worth repeating the thyroid
warfarin for about the same amount of time and I wonder whether
this are no focal hepatic
or pelvis. There
masses.
function
tests in three months time.
it appears
that the patient has stable disease on Arimidex
CONCLUSION:
No CT evidenceOverall,
of disease
progression.
may be partly responsible. We have given her a fourth cycle
of
apart from in the right supraclavicular fossa. The Arimidex is not
treatment today and we will see her in three weeks for consideration of
28/03/2003, 12:35:06
holding the disease completely and we feel that the best approach to
management would be to consider some radiotherapy to the right
her fifth.
supraclavicular fossa. She has previously had radiation therapy to the
28/03/2003, 10:44:20
left clavicular region and mediastinum. We have discussed performing a
CT scan of the thorax but she was unable to lie flat for the duration
of the investigation some months ago. We shall ask our radiotherapy
colleagues to review her and consider her for therapy. We shall review
her again in the follow up clinic in six weeks time.
28/03/2003, 10:50:25
What was
done…
What happened… And why
Human:1382
Pain:5735
Ulcer:1945
locus
locus
attends
reason
locus
reason
attends
finding
attends
Breast:1492
Clinic:4096
reason
plans
Clinic:1024
plans
plans
reason
locus
Biopsy:1066
target
finding
time
Clinic:2010
reason
Radio:1812
plans
Chemo:6502
treats
reason
Mass:1666
plans
treats
locus
time
Cancer:1914
time
time
time
time
time
time
Other Feature
Status
Name
Laterality
Status
Name
compare
Name
target
INVESTIGATION
LOCUS
partOf
Goal
subpart
Age
PATIENT
Sex
INTERVENTION
Race
REGIME
Occupation
Doctor
Dose
DRUG
Form
TIME
Name
after
CONSULT
about
Name
Type
PROBLEM
partOf
locatedAt
LOCATION
causes
Draft Schema
for Chronicle
Name
Size
Clinical Course
PATHOLOGY
Diagnostic Status
Family History
Evidence for
Presence / absence
Status
Other Feature
Route
Name
中文 NER – Example 2
黑色當道 少了尖叫 女星太規矩 城城活跳跳 金馬獎星光大道不若前晚金鐘獎
「峰芒」畢露,女星們規矩平穩的服裝,讓星光大道上少了一些特色,並未出現
讓人眼睛一亮的驚喜。其中,在金鐘獎上讓人血脈僨張的蕭淑慎,在金馬獎上可
以看出服裝「規矩」了些。總體來說,今年的星光大道造型略顯平庸。 秋冬主
流黑色更在金馬星光大道上大量出現,凱渥模特兒公司老闆、也是專業資深時尚
人洪偉明說:「可以發現他們選擇合適的服裝,規矩、正式的選擇,可避免遭受
批評,今年確實少了些特色,但重要的國際場合,平穩的黑色服裝,也是出席正
式場合的安全造型。」 洪偉明表示:「楊千嬅的服裝和她的人很搭,黑色蕾絲
讓她不至於顏色過重,正式中又帶點活潑,感覺很棒。」台中市長胡志強女兒胡
婷婷桃紅色的緞面禮服,也讓洪偉明很欣賞,他說:「整體感覺落落大方,亮色
服裝和她的人也很適合,她的自信和星光大道主持人蔣怡的乾淨大方一樣,讓人
感覺舒服,也是不錯的造型。」 舒淇鵝黃色的禮服,洪偉明笑說:「羅曼蒂克
的感覺和她的笑容很搭配,讓氣色宛如戀愛中的女人一樣美好。」梁詠琪的黑色
短禮服,雖然露出她的修長美腿,但洪偉明也建議:「她至少可以搭雙絲襪,整
體感覺會更好。她在演唱會上展現性感,其實星光大道上也可以大膽改變。」
至於男星們的服裝,今年則是絲絨的天下,洪偉明笑說:「男星們服裝不易做出
變化,敢大膽嘗試不同造型的人也不多,其中郭富城神采奕奕的精神,十分突出,
張震的服裝則顯得穩重而規矩。」
專有名詞
詞
詞類
出現次數
舒淇
[Nb]專有名稱
2
張震
[Nb]專有名稱
1
高達
[Nb]專有名稱
1
賴雅妍
[Nb]專有名稱
1
白
[Nb]專有名稱
1
米蘭
[Nb]專有名稱
1
竹幼婷戴榮賢
[Nb]專有名稱
1
林熙蕾
[Nb]專有名稱
2
郭富城
[Nb]專有名稱
1
楊貴媚
[Nb]專有名稱
1
范文芳
[Nb]專有名稱
1
林志玲
[Nb]專有名稱
1
金馬獎
[Nb]專有名稱
3
楊采妮
[Nb]專有名稱
1
舒淇鵝
[Nb]專有名稱
1
藍正龍
[Nb]專有名稱
1
金城武
[Nb]專有名稱
2
侯佩岑
[Nb]專有名稱
3
蕭淑慎
[Nb]專有名稱
4
梁詠琪
[Nb]專有名稱
2
黃志瑋
[Nb]專有名稱
1
黃子佼
[Nb]專有名稱
1
天心
[Nb]專有名稱
1
楊千嬅
[Nb]專有名稱
1
洪偉明
[Nb]專有名稱
2
胡婷婷
[Nb]專有名稱
2
師李
[Nb]專有名稱
1
戴起
[Nb]專有名稱
1
出現次
數
詞
詞類
背後
[Nc]地方詞
1
中途
[Nc]地方詞
1
世界
[Nc]地方詞
1
天下
[Nc]地方詞
1
原地
[Nc]地方詞
1
時間
詞
詞類
詞
出現次數
詞類
出現次數
昨天
[Nd]時間詞
4
露美腿
[LN]人名類
2
新春
[Nd]時間詞
1
昨晚
[Nd]時間詞
1
[LN]人名類
1
早春
台中市長胡志強
女兒胡婷婷桃紅
色
[Nd]時間詞
1
前晚
[Nd]時間詞
2
先後
[Nd]時間詞
1
今年
[Nd]時間詞
6
週末
[Nd]時間詞
1
分類之內容標示查詢
社群資產脈絡
知識社群
電力之效應
“負載”網脈
社群資產脈絡
“負載”資料
“負載”與“發電廠”
“汽電” 、“負載”與“發電廠”
時序性資訊彙整
事件分析
油價技術分析圖
專家與決策
文獻
知識群組
83
知識呈現
84
即時性分群
Real-time Index
Metadata of
Searching Results
85
公文性資料
86
災後重建
基金
因果圖 -- 失依兒童
所在各縣市失
依兒童狀態
各縣市政
府,社會
局等介入
各縣市福利,
信託基金的
成立
中低收入
戶補助
對單親家庭
的補助之災
後重建及經
費相關使用
87
重建家園專案
貸款
震災重建暫行條例
重建家園專案
災戶
金融機構
利息
房屋
損毀
受災戶
Rule Generate
B
﹁B
A
P(B|A)
P(﹁ B|A)
﹁A
P(B|﹁A)
P(﹁B|﹁A)
A
﹁A
B
P(A|B)
P(﹁ A|B)
﹁B
P(A|﹁B)
P(﹁A|﹁B)
Let S be a document set
AB :
P(B|A) >> P(A|B) of S
P(﹁ B|A) >> P(﹁ A|B)
of S
c2(B|A) > c2(A|B) of S
BA :
P(A|B) >> P(B|A) of S
P(﹁ A|B) >> P(﹁ B|A)
of S
c2(A|B) > c2(B|A) of S
越獨特的文件集,規則越明確
Rule Structure
Attribute11
(Noun)
…
Attribute21
Attribute1n
(Noun)
(Noun)
Object1 (Noun)
Relationship
…
Attribute2n
(Noun)
Object2 (Noun)
1. Object: 具體名詞,Relationship: 抽象名詞,Attribute:具體或抽象名詞
2.具體名詞則為等價或屬性關係,抽象名詞則為作為方法
3. Relationship = null Object1與 Object2的屬性關係 (is A, part of) Object -- Attribute
4. Object1 (Relationship) Object2: Object1 及 Object2 具有Relationship的關係
5. Object1 Attribute (Relationship) Object2:
Object1的Attribute與 Object2 具有Relationship的關係 (Attribute Object)
6. Attribute Object1 (Relationship) Object2:
Object1的Attribute與 Object2 具有Relationship的關係 (Attribute Object)
7. Object1 (Relationship) Object2 Attribute :
Object1與 Object2的Attribute具有Relationship的關係 (Attribute Object)
Recursive Rule Construction
Properties (objects)
object
Methods or Utilities
法規、法條等專業詞彙
屬性或條件
Object1 (Noun)
方法或Utility
Attribute1
(Noun)
…
Attributen
(Noun)
Object2 (Noun)
Attribute1
Attribute1
(Noun)
(Noun)
具動作意味
Generalize
Object:
attribute
貸款
Object:
Attribute (condition)
震災重建暫行條例
受災戶
method
重建家園專案
object
災戶
Object:
attribute
金融機構
利息
Object:
attribute
Object:
attribute
房屋
Object:
attribute
損毀
Object:
condition
Specify
規則
94
Clustering
95
範例
很適合用機洗
香味好聞
去污力強
洗衣省力
氣味清香
能去除99種污漬
洗得特別乾淨
香味好聞
白襪子洗得最乾淨
氣味很香
不傷手
能夠很好的去除污漬
衣服不易褪色
洗衣不費力
能去除99種污漬
用量少
洗得乾淨
對皮膚刺激少
洗各種污漬都很乾淨
洗得乾淨
價格適當
洗衣服的效果較好
氣味不錯
一直使用該品牌
洗好的衣物更白
氣味好聞
廣告印象深
洗得乾淨
易漂清
不太傷手
洗得乾淨
用量少
洗得乾淨
用量比別的牌子少
廣告大
洗得乾淨
用量少
質量好
用量少
洗得乾淨
包裝好
廣告多,吸引人
香味好聞
洗的乾淨、白
宣傳好,廣告有趣
很多人都說好
96
97
Tasks in News Detection
News Feeds
Segmentation
Detection
Retro
On-Line
Tracking
98
Might be Relevant
世貿中心
五角大廈
2001年九月11日
USS Cole
October 12, 2000
Location
Aden, Yemen
Date
October 12, 2000
11:18 am (UTC+3)
Attack type
suicide bombing
Deaths
19 (including the 2 perpetrators)
Injured
39
Perpetrator(s)
al-Qaeda, carried out by Ibrahim
al-Thawr and Abdullah alMisawa
99
911事件
可預防
FBI 明尼蘇達幹員
Zacarias Moussaoui 個人電
腦
FBI 鳳凰城備忘錄 (George
Will)
Dr. Bhandari (Virtual Gold, Inc)
資料探勘 可預防911悲劇
100
恐怖份子
101
911 恐怖份子網絡
102
911 恐怖份子網絡
103
赤軍旅(RedArmy Faction)威脅
Horst Herold (德國聯邦警察總長)
建立資料探勘之資訊網
Germany’s Bundeskriminalamt 1972
資料來源
房屋銷售、能源公司…
成果
Rolf Heissler (RAF 成員)
結果
Herold遭報導違反人權退休
1986 修改犯罪條例
911 三個飛行員係來自Hamburg
104
疫病警示及通報系統
世界衛生組織多年前即建立了「疫病警示及
通報系統」(Epidemic Alert and
Response)。
由於一些國家可能基於經濟衝擊的考量,可能淡
化有關疫情的報導,世界衛生組織的這套系統特
別裝置了一套軟體,可以由各國媒體的網站上抓
取相關資料並由二十位專家研判這些資料中的信
息。
105
資訊 與 知識 – Amazon 數位相機銷售
106
新聞事件 – 華盛頓時報
107
108
受災戶(金融輔助政策)
109
貸款(受災戶、臨時住宅)
110
Generative Discriminative
Generalize
Object:
attribute
貸款
Object:
Attribute (condition)
震災重建暫行條例
受災戶
method
重建家園專案
object
災戶
Object:
attribute
金融機構
利息
Object:
attribute
Object:
attribute
房屋
Object:
attribute
損毀
Object:
condition
Specify
111
未來(NASA)
Modeling Expert Knowledge
Capturing Knowledge
Integrating Distributed Knowledge
Sharing Knowledge
• Adaptive knowledge infrastructure
is in place
• Knowledge resources identified
and shared appropriately
• Timely knowledge gets to the right
person to make decisions
• Intelligent tools for authoring
through archiving
• Cohesive knowledge development
between JPL, its partners, and
customers
• Instrument design is semi-automatic
based on knowledge repositories
• Mission software auto-instantiates
based on unique mission parameters
• KM principals are part of Lab culture
and supported by layered COTS
products
• Remote data management allows
spacecraft to self-command
Enables seamless integration of
systems throughout the world
and with robotic spacecraft
Enables sharing of essential
knowledge to complete
Agency tasks
• MarsNet
• Europa Orbiter
• Space Interferometry Mission
• Knowledge gathered anyplace
from hand-held devices using
standard formats on interplanetary
Internet
• Expert systems on spacecraft
analyze and upload data
• Autonomous agents operate
across existing sensor and
telemetry products
• Industry and academia supply
spacecraft parts based on
collaborative designs derived from
JPL’s knowledge system
• Systems model experts’ patterns
and behaviors to gather
knowledge implicitly
• Seamless knowledge exchange
with robotic explorers
• Planetary explorers contribute to
their successor’s design from
experience and synthesis
• Knowledge systems collaborate
with experts for new research
Enables real-time capture of tacit
knowledge from experts on
Earth and in permanent
outposts
Enables capture of knowledge at the
point of origin, human or robotic,
without invasive technology
•
•
•
•
• Interstellar missions
• Permanent colonies
Mars robotic outposts
Comet Nucleus Sample Return
Saturn Ring Observer
Terrestrial Planet Finder
• Europa Lander/Submersible
• Titan Organics: Lander/Aerobot
• Neptune Orbiter/Triton Observer
112
2003
2007
2010
2025
Roadmap 發展建議
分散式知識網格
• 利用高速電腦中心之廣域
網格系統以讓各公家機構
能上傳檔案進行知識分析
• 訂立作業標的與程序,進
行分散與高速之檔案知識
彙整
• 建立分散式高速環境以協
助進行知識的建立、分享
,讓各級政府據以作為決
策的參考
• 自動串聯不同類型之決策
議題,以累積相似
• 自動化系統分析、擷取、
連結各類議題並與以整合
2005
決策指引
• 利用串聯之檔案知識建置
作業系統,協助各類專家
建立專業Ontology
• 自動產生文件與Ontology
間相關串聯索引以模組化
決策知識
• 整合案例推理、實證與決
策指引,以形成完整之自
動化決策輔助推理機制
• 系統自動依據Ontology與
文件之經驗串聯專業知識
以協助建立決策指引
• 建立決策評量機制,以能
精修專業決策
2006
決策模擬
政策規劃研究
• 建立模擬機制,設定各類
型狀況,進行情境模擬推
演
• 各單位可透過分散式網格
進行措施推演
• 根據Ontology與所有案例
,產生決策執行之機率推
論模式
• 依據推論規則與概率,推
估可能的決策與其風險
• 設定意外與不確定狀況,
進行未確定環境推估,並
作負向推論,以了藉相對
風險
• 藉由推論,產生程序與作
為
• 協助進行全面性的政策規
劃,不論有無經驗,可依
據規劃(OR,Planning)法
則進行最佳化規劃
• 計算作業程序與路徑,找
出關鍵途徑,並作擇優選
擇
• 利用Ontology與分散式網
格進行高速之最佳作業研
究規劃
• 依實際狀況修正所規劃之
政策
• 迅速分析誤差,即時性提
出最佳化之建議
2007
2008
113
敬請指教
Q&A
114