Paper or Project Title ALL AUTHOR NAMES
Download
Report
Transcript Paper or Project Title ALL AUTHOR NAMES
Data Mining Bitcoin,
a P2P Crypto-Currency
Philip Koshy, Diana Koshy, and Patrick McDaniel
Bitcoin is a cryptographic currency that has been in continuous operation over the last 3 years. It currently enjoys an exchange rate
of $4.80 (as of April 1st, 2012), with a market capitalization of roughly $40 million. Payments in Bitcoin are made pseudoanonymously, making Bitcoin an attractive alternative to other forms of electronic payment. Academics have shown interest in
Bitcoin, particularly in de-anonymizing the user base, understanding the incentive structure, and extending the consensus
mechanism. The anonymous sale of drugs and weapons has also caused Bitcoin to be scrutinized by law enforcement and members
of Congress. Our goal is to study this network in an effort to judge the health of the system and disaggregate anomalous behavior.
Understanding Bitcoin
Bitcoin is a decentralized P2P currency that uses a gossip
protocol to transmit messages among peers in an overlay
network.
Transactions (Fig. 1) and blocks (Fig. 2) are the two main
data structures in the protocol. Coins are transferred among
users within transactions, which are then grouped into
blocks that must be accepted by the network.
Coin generation is tied to block creation. Creating a block is
computationally expensive since it requires solving a
cryptographic proof-of-work puzzle. Anytime a node
generates a block which goes on to be accepted by the
network, it is currently awarded 50 Bitcoins. Not all
published blocks will be accepted network-wide.
New blocks are linked to older blocks, forming a block chain
that is constantly being extended. Because of Bitcoin’s
decentralized and distributed nature, multiple participants
may generate blocks at the same time. This leads to the
distributed consensus problem. We can represent the block
chain as a tree structure, with the longest path representing
the accepted chain (Fig. 3). A participant choosing to
extend an existing path in the block chain indicates a vote
towards consensus on that path. The longer the path, the
more computation was expended building it.
Fig. 1: A coin owner
transfers coins by
digitally signing a
hash of the previous
transaction and the
public key of the next
owner. This signature
is then appended to
the end of the coin.
Fig. 2: Transactions
are placed in blocks,
which are linked by
hashes.
Preliminary Results
We have begun studying the data and have noted anomalous
behavior.
• Misconfiguration Errors
We have observed malformed messages being relayed across the
network. Some messages are incomplete, while others fail to
conform to the protocol specification. We have received
numerous transactions with malformed scripts. Scripts in the
Bitcoin protocol are used to verify identities, specify recipients of
coins, as well as indicate conditions under which coins may be
claimed. Scripts use a stack-based language similar to assembly.
• Potential Misuse
Fig. 3: The block chain is represented as a tree structure.
In this way, Bitcoin offers a unique solution to the consensus
problem in distributed systems since voting power is directly
proportional to computing power.
Data Collection
We are currently capturing real-time traffic from upwards of
5,000 nodes using a custom-built Bitcoin client leveraging
the Linux epoll API. With our client, we are able to request
historical data and record real-time broadcasts at scale. Our
data-collection software runs efficiently on consumer-grade
hardware; we are currently running the software on Ubuntu
11.10 Server Edition in a single thread, maintaining an inmemory footprint of only 400MB. Data is stored in a MySQL
5.1 database.
Blocks that completely ignore valid transactions are being
created, relayed, and accepted, often winning consensus over
competing blocks. By ignoring valid transactions, the block
creator is more quickly able to publish their block. This behavior
financially benefits the block creator at the expense of other
users. Future work will attempt to group colluding users to
determine if there is malicious intent.
• Consensus Without Contention
The branching factor of the block chain is close to 1, which
indicates that there is very little contention on the network about
which chain is longest. In other words, computing power is not
often wasted on chains that will eventually become orphaned.
• Double-Spending
A digital coin is theoretically easy to copy since it is decoupled
from physical reality. Despite built-in probabilistic safeguards
against double-spending, we have observed a number of
attempts. This may be the result of a poorly designed custom
client, or that of a malicious participant.