Transcript Document

UT DALLAS
Erik Jonsson School of Engineering & Computer Science
Incentive compatible Assured Data Sharing &
Mining
Murat Kantarcioglu
FEARLESS engineering
Incentives and Trust in Assured Information
Sharing
Combining intelligence through a loose alliance
Bridges gaps due to sovereign boundaries
Maximizes yield of resources
Discovery of new information through correlation, analysis of
the ‘big picture’
Information exchanged privately between two participants
Drawbacks to sharing
Misinformation
Freeloading
Goal: Create means of encouraging desirable behavior
within an environment which lacks or cannot support a
central governing agent
FEARLESS engineering
Possible Scenarios
• You may verify the shared data, and issue fines if the
data is wrong
– This is easy
• You may verify the share data but cannot issue fines
– Little bit harder
• You may only verify some aggregate result
– Hardest
FEARLESS engineering
Game Matrix
Play (agent j)
Value of
information
Play
(Agent i)
Truth
Truth
Lie
  min   max 

  Cv ( Pi  t j ) i
2


  min   max 

  Cv ( Pj  ti ) j
2


  min   max 

  Cv ( Pi  t j ) i
2


 Cv ( Pi  t j ) i
0
FEARLESS engineering
Trust
value
 Cv ( Pi  t j ) i
0
0
0
0
Agent
type
 Cv ( Pj  ti ) j
0
Minimal
verification
probability
0
  min   max 

  Cv ( Pj  ti ) j
2


 Cv (Pj  ti ) j
Lie
Do Not Play
Do
Not
Play
0
Cost of
Verification
0
0
Behaviors Analyzed in Data Sharing
NameSimulations
Strategy
Verification?
Punishment? Comments
Truth
No
No
Optimistic, maximizes
returns
Lie
No
No
Takes advantage of other
players, trumps Honest in
1 on 1
Random
Truth, Lie
No
No
Chaotic, chooses either
with equal probability
Tit-for-Tat
Truth, Lie
Always
Special
Mirrors other players’
actions, starts by selecting
Truth
Truth
Trust-based
No trading
Verifies activity according
to trust ratings, will cease
activity for number of
rounds with player who is
caught lying
Liar
Truth, Lie
Trust-based
No trading
Identical to LivingAgent
but lies with small
probability
SubtleLie
Truth, Lie
Trust-based
No trading
Identical to Liar, except
lies whenever
information value
reaches certain
threshold
Honest
Dishonest
LivingAgent
FEARLESS engineering
Simulation Results
We set δmin = 3, δmax = 7, CV = 2
Lie threshold is set 6.9
Honest behavior wins %97
percent of the time if all behaviors
exist.
Experiments show without
LivingAgent behavior, Honest
behavior cannot flourish.
Please see the following paper for mode details:
“Incentive and Trust Issues in Assured Information Sharing”
Ryan Layfield, Murat Kantarcioglu, and Bhavani Thuraisingham
International Conference on Collaborative Computing 2008
FEARLESS engineering
Verifying Final Result: Our Model
• Players P1....Pn:
• Each has some data (x1...xn), and
• Goal: compute a data mining function, D(x1,...,xn) that
maximizes the sum of the participants valuation function.
• Player Pt: Mediator between parties, computes the
function securely, and has test data xt
• Players value privacy, correctness, exclusivity
• Problem: How do we ensure that players share data
truthfully?
FEARLESS engineering
Assumption
• The best model that maximizes sum of the
valuation function is the model built by using the
submitted input data.
• Formally: Given submitted valuation functions
and submitted data
– D(x) = argmaxmM (S{k}vk(m) ) for any set of players
FEARLESS engineering
Mechanism
• Reservation utility normalized to 0
• ui(m) = vi(m) – pi(vi,v-i)
• [u = utility] [v = valuation] [p = payment]
• pi(vi,v-i) = argmaxm’M (S{k!=i}(vk(m’)) – S{k!=i}(vk(m))
• vi(m) = max{0,acc(m)-acc(D(xi)} – c(D)
– c is the cost of computation, acc is accuracy
FEARLESS engineering
Mechanism
• We compute pi using the independent test set
held by Pt
• Intuitively, mechanism rewards players based
on their contribution to the overall model
• This is a VCG mechanism, proved incentive
compatible, under our assumption
FEARLESS engineering
Experiments
• Does this assumption hold for normal data?
• Methodology
• 4 data sets from UCI Repository
• 3-party vertical partitioning, naïve-Bayes classifiers
• Determine accuracy and payouts
• Payouts estimated by acc(classifier) – acc(classifier
without player i’s data) – constant cost
• Once with all players truthful
• Once for each player and for each amount of perturbation
•
(1%, 2%, 4%, 8%, 16%, 32%, 64%, 100%)
• 50 runs on each
FEARLESS engineering
Census-Income (Adult)
Overall Accuracy
1
0.9
0.8
0.7
0.6
Player 1 Lying
0.5
Player 2 Lying
Player 3 Lying
0.4
0.3
0.2
0.1
0
T
FEARLESS engineering
L(1%)
L(2%)
L(4%)
L(8%)
L(16%)
L(32%)
L(64%)
L(100%)
Census-Income (Adult)
Payouts based on Overall Accuracy
0
-0.1
-0.2
Player 1 Lying
-0.3
Player 2 Lying
Player 3 Lying
-0.4
-0.5
-0.6
T
FEARLESS engineering
L(1%)
L(2%)
L(4%)
L(8%)
L(16%)
L(32%)
L(64%)
L(100%)
Census-Income (Adult)
Payouts - Overall Accuracy - Player 1 Lying
0.6
0.4
0.2
Player 1
0
Player 2
Player 3
-0.2
-0.4
-0.6
T
FEARLESS engineering
L(1%)
L(2%)
L(4%)
L(8%)
L(16%)
L(32%)
L(64%)
L(100%)
Census-Income (Adult)
Payouts - Overall Accuracy - Player 2 Lying
0.4
0.3
0.2
0.1
0
Player 1
-0.1
Player 2
Player 3
-0.2
-0.3
-0.4
-0.5
-0.6
T
FEARLESS engineering
L(1%)
L(2%)
L(4%)
L(8%)
L(16%)
L(32%)
L(64%)
L(100%)
Census-Income (Adult)
Payouts - Overall Accuracy - Player 3 Lying
0.4
0.3
0.2
0.1
0
Player 1
-0.1
Player 2
Player 3
-0.2
-0.3
-0.4
-0.5
-0.6
T
FEARLESS engineering
L(1%)
L(2%)
L(4%)
L(8%)
L(16%)
L(32%)
L(64%)
L(100%)
Breast-Cancer-Wisconsin
Overall Accuracy
0.97
0.96
0.95
Player 1 Lying
0.94
Player 2 Lying
Player 3 Lying
0.93
0.92
0.91
T
FEARLESS engineering
L(1%)
L(2%)
L(4%)
L(8%)
L(16%)
L(32%)
L(64%)
L(100%)
Conclusions
• Does the assumption hold?
• Not always, but it is very close, and would work as
a practical assumption
• If better model is found through lying, does
this hurt or help?
• Consideration: change the goal; not to prevent
lying but to build the most accurate classifier
• Finding the “right” lie may take too much
computation for profitability
FEARLESS engineering