S13-blockchainx

Download Report

Transcript S13-blockchainx

Bitcoin
• Bitcoin is a cryptocurrency.
• The platform that hosts Bitcoin is a p2p system.
• Bitcoin can be abstracted as a digital file that records the
account balance of each user (i.e. a ledger).
• The “file” is stored on all the machines in the Bitcoin
network.
ledger
Alice
Bob
Carole
…
5
3
7
…
• Bitcoin is not issued by any authority.
• People use Bitcoin because they think it has value.
Bob
Alice
Transactions in the BTC network
• When a user transfer BTC to another user, the transfer is
broadcast to all the nodes in the BTC network.
• The transfer record is known to all the nodes in the
network.
• Each node of the BTC network knows all the transaction
records of the BTC network.
• Each node of the BTC network knows the balance of each
user of the BTC by looking through all the transaction
records.
Distributed vs centralised
• In the traditional banking system, the bank keeps the ledger of
all its customers.
• In the BTC network, each node has a copy of the ledger.
• For a bank, a customer only knows her own transactions
• In BTC, a transaction is known to everyone.
• For a bank, the customer can trust the bank.
• If something goes wrong, the customer can sue the bank.
• In the BTC network, the nodes do not trust each other.
• Cryptography is used to ensure that bad behaviour (e.g. stealing BTC) can
be prevented or cannot affect the functioning of the BTC network.
• BTC lost due to software bugs cannot be recovered.
SHA-256 hash function
• Produce a 256-bit digest (hash).
• Partition the message into 512-bit blocks.
• Pad the message if required
• The initialization vector is a constant.
• Each 512-bit message block is processed in turn
512 bits
512 bits
512 bits
hash
hash
…
hash
256 bits
256 bits
256 bits
256 bits
256 bits
hash
IV
padding
message
…
hash pointer
• A hash pointer indicates the location where some
information is stored.
• The value of a hash pointer is the hash of the information.
• Given a hash pointer, we can retrieve the information and
check whether the information has been modified.
Block chain
• A block chain is a linked list with hash pointers.
H( )
data
H( )
data
H( )
data
H( )
…
data
H( )
Merkle tree (hash tree)
• A Merkle tree is a tree in which the value of a non-leaf node is the
hash of the concatenation of the values of the node’s children
• Data can be stored in a hash tree to make retrieval more efficient
compared with a linked list.
H( ) H( )
H( )
H( )
data
H( )
data
H( )
H( )
data
H( )
H( )
data
H( )
data
H( )
data
H( )
H( )
data
H( )
data
public-key cryptography
• When encrypting information, we need to use one key to
encrypt the information and one key to decrypt information.
• In public-key cryptography, different keys are used for
encrypting and decrypting information.
• A pair of keys (public/private keys) are used.
• The information encrypted using one of the keys in the pair can
only be decrypted using the other key in the pair.
• A user generates a pair of public/private keys.
• The user keep the private key as a secret while making the
public key available to the public.
• Public-key cryptography is also used in authentication.
• The possession of the private key is used as a proof that
the entity is the subject associated with the public key that
corresponds to the private key.
• If Bob wants to make sure that he is talking to Alice, Bob
can ask Alice to encrypt a string with Alice’s private key.
• If Bob can decrypt the encrypted message correctly with
Alice’s public key, Bob regards the person that encrypted
message is Alice.
• You must keep your private key secret.
Digital Signature
• The digital signature of a data item is generated by
• compute the hash of the data item
• encrypt (i.e. sign) the hash with the private key of the owner of the data
item
• A digital signature can be verified by
• compute the hash of the data item, H(data)
• decrypt the digital signature with the public key of the owner of the data
item to obtain the hash calculated by the owner H’(data)
• check whether H(data) = H’(data)
Transactions in BTC
• Each user has at least one public/private key pair.
• The public key is used as the address for receiving BTC
payment.
• When Alice sends money to Bob, Alice creates a transaction
to record this transfer of money.
• The transaction has Bob’s public key to indicate that Bob
receives the payment.
• The transaction is signed by Alice using Alice’s private key
to ensure that Alice is the sender.
• Encrypt the hash of the transaction.
• Alice broadcasts the transaction to all the nodes in the BTC
network.
• The transaction have one or several inputs.
• Each input specifies the amount of BTC that Alice has gained
from other transactions (i.e. Alice actually has the BTC to
spend).
• When a BTC network node receives the transaction, the node
needs to first verify whether the inputs are valid (i.e. whether
Alice indeed has gained these BTCs in previous transactions).
• The node needs to verify the signature of Alice attached to the
transaction to ensure that the transaction is valid. How?
• This is to prevent the case that Eva uses the transaction that credit
Alice as the input of her transaction to give Alice’s money to Eva herself.
• The node also needs to make sure that Alice has not spent the
BTC that she gained in the input transactions (i.e. no double
spending).
Transaction
Eva  Alice
Transaction
input
Transaction
Allen  Alice
input
…
Alice  Bob
• It can be seen that you can trace the origin of the fund
involved in all the transactions in the BTC network.
Transaction
Transaction
Howard James
James  Alice
Transaction
Alice  Bob
Transaction
Bob Amy
Transaction
Transaction
Amy  Alice
Steve  Amy
Transaction
Transaction
Howard James
James  Alice
Transaction
Alice  Bob
Transaction
Bob Amy
Transaction
Transaction
Amy  Alice
Steve  Amy
Transaction
Transaction
Howard James
James  Alice
Transaction
Alice  Bob
Transaction
Bob Amy
Transaction
Transaction
Amy  Alice
Steve  Amy
• When a machine joins the BTC network as a “permanent”
node, the machine obtains a copy of all the transactions that
have happened since the BTC network came into existence.
• The machine needs to verify the validity of all the
transactions.
• In the BTC network, the machines do not trust each other.
• The validation can take several tens of hours. However, it
only needs to be done once by a node.
• A transaction has one or several outputs.
• Each output specifies the amount to be paid to an address
(i.e. the public key of the receiver).
• Alice’s address can appear in the output.
• For example, if Alice is buying a product that costs 1 BTC from Bob
and the input is 3 BTC, apart from paying 1 BTC to Bob, Alice pays
herself 2 BTC as the change.
• To figure out the balance of your BTC account, you need to
scan though all the transactions that pay you and you have
not used these transactions as the inputs of other
transactions.
• So, BTC is not a file recording the balances of all the users.
It is a log containing all the transactions in the BTC network.
Anonymity
• Your account in BTC network is represented as a public key.
• You can create any number of public keys to represent your
account.
• Users can generate public keys themselves.
• A public key is 520 bits. So, the chance that two users generate
the same public key is negligible.
• If you use Tor to access the BTC network, your identity can
be hidden.
• However, if you are not careful when making transactions,
people can link several public keys together. That is, the
public keys all represent the same user.
• You use those transactions that are paid to several public keys as
the inputs of one transaction.
Double spending
• Each transaction can only be used once as an input of
another transaction.
• Each node needs to check whether a transaction has been
used when the transaction is used as the input.
Transaction
Transaction
Steve  Alice
Alice  Bob
Transaction
Alice  Tom
Invalid transaction
• Transactions are propagated in the BTC network.
• As different transactions might be propagated along different
routes, BTC nodes might have different perceptions on which
transaction has been spent.
• T1: Eva uses a transaction to pay Bob.
• T2: Eva uses the same transaction to pay herself.
• If T1 and T2 are propagated along different routes in the BTC network,
different nodes in the BTC network would have different perception on
whether T1 happens before T2 or vice versa.
• As a result, some nodes would think T1 is invalid while others would think
that T2 is invalid.
• To prevent the above double spending scenario, BTC nodes need
to agree on the order of the events (i.e. whether T1 or T2
happens first).
• Paxos?
• BTC uses a cryptography-based technique to achieve consensus
among a set of nodes that do not trust each other
BTC mining
• Transactions are grouped together and placed in a block.
• The transactions in a block are regarded as happening at the
same time.
• The blocks are ordered to form a block chain.
• Transactions in a block at the back of the chain are regarded as
happen after the transactions in the blocks at the front of the
chain.
• The pointers between the blocks are hash pointers
Block
Block
Block
tx 123
tx 234
…
tx 345
tx 456
tx 567
…
tx 678
tx 789
tx 89A
…
tx ABC
Block
…
tx BCD
tx CDE
…
tx DEF
• Each BTC node constructs its own block of transactions (in
theory) and place the transactions that it has heard of in the
block.
• The BTC node needs to carry out various checks on the transactions
(as described earlier) before adding the transactions to it block.
• Due to the propagation delay of the transactions, BTC nodes
might come up with blocks that contain different (or partially
different) set of transactions.
• In order to add a block into the block chain, a node needs to
solve a cryptography puzzle.
• The puzzle is hard to solve in terms of the computation required to
solve the puzzle.
• The solution is easy to check.
• As the computation for solving the puzzle is expensive, the
BTC node needs to be given incentive for creating the block
and maintain the block chain (i.e. carrying out various checks
to confirms that the transactions to be included in the block
are valid).
• The nodes are rewarded with BTC for creating a block AND
solving the puzzle that allows the block to be added to the
block chain.
• The current reward is 25 BTC for one valid block.
Coinbase transaction
• A coinbase transaction is the transaction that specifies that
the node creating a block is being awarded 25 BTC.
• A coinbase transaction does not have any input and it does
not need to be signed.
• A coinbase transaction has a coinbase parameter that can
have any value.
• The coinbase parameter can be adjusted by a node when the
node tries to solve the cryptographic puzzle.
The data structure of a block
Pre: H( )
tx_tree root: H ( )
Nonce:
Pre: H( )
tx_tree root: H ( )
Nonce:
Hash:
Hash:
• Hash of a block = hash(hash
of the previous block | hash
of transaction tree | nonce)
• Nonce is a 32-bit value
H( )
H( )
Pay Alice 25 BTC
Coinbase: xxx…
H( )
H( )
H( )
H( )
tx
tx
tx
• The hash of a block is calculated using SHA-256 that
generated a 256-bit number.
• The puzzle is to find a nonce that makes the hash of a block
to less than a given value.
• The given value is chosen by the BTC nodes to control the
difficulty of the puzzle to ensure that on average a valid
block can be found (i.e. a node can solve the puzzle by
finding the right nonce) every 10 minutes.
• If the hash function is secure, the only way to succeed in
solving the puzzle is to try enough nonce.
Hash space
Target space
Bitcoin mining farm in China
https://i.ytimg.com/vi/nDiDHjLFmK8/maxresdefault.jpg
• Once a node finds a nonce that meets the requirement, the
node publishes the block.
• Other nodes can check the validity of the node by
computing the harsh of the block with the nonce in the
published block.
• A valid block is added to the chain.
• There is a little probability that more than one node find
the nonces that meets the requirement at almost the same
time.
• Each node adds the first block that it hears to the block
chain.
• As a result, there are might be several branches in the
block chain.
• The tie is broken when the next valid block is found as all
nodes would try to add their block to the longest chain in
the BTC network.
• As a result, the creator of the losing branch has to give up
its block (i.e. the creator would not get any reward) and try
to create another block.
• The transactions in the losing block will be regarded as
unconfirmed transactions, and they need to be included in
other valid blocks.
• The amount of computation required for solving a puzzle
makes it rare for nodes to find the solutions of the puzzle
at the same time. It is even rarer for this to happen several
times in a row.
• Therefore, the branches in the block chain will disappear
after a few blocks.
• That is, a consensus will be reached among the nodes on which
blocks should be in the block chain.
• As a block in a block chain might later be discarded and the
transactions in the discarded blocks become unconfirmed
transactions, transactions at the end of the block chain are
regarded as “risky”.
• Normally, people would wait for six more blocks to be added
to the back of a block before the transactions in the block
are regarded as confirmed.
• At the moment, it takes about 10 minutes for a block to be
added to the block chain.
• So, to confirm a transaction, it takes about one hour.
• This is much slower than a credit card transaction.
• As branches might occur in the block chain, an attacker
might explore this feature to launch a double spending
attack.
• Eva sends money to Bob.
• Then, Eva sends money to herself, computes a block that includes
the transaction that pays herself, and add the block to the block
chain before the Eva  Bob transaction is added to the block
chain.
• This would make Eva  Bob transaction an invalid transaction.
• This attack can only succeed if Eva can make sure that her
branch is the longest chain.
• Eva has to produce more than one valid block to add to her branch
in order to make her branch being the longest in the block chain.
• Eva can only do this with over 50% probability to succeed if she
possesses more than 50% of the computing power of the BTC
network.
Mining pools
• It is almost impossible for an individual miner to compute
the nonce for creating a valid block.
• Miners are pooling their computing resources together to
form mining pools to solve the puzzle together.
• The largest pool Ghash.io has over 50% computing power of
the BTC network at one stage.
• To prevent the pool’s resource is misused, the pool has voluntarily
restricted its own size.
Summary
• Protection against invalid transaction is based on
cryptographic functions.
• Protection against double-spending is by consensus.
• You are never 100% sure that a transaction is in the
consensus branch. Guarantee is probabilistic.
References
• Nakamoto, Satoshi (October 2008). "Bitcoin: A Peer-to-Peer
Electronic Cash System“, https://bitcoin.org/bitcoin.pdf
• https:// blockchain.info
• https://ghash.io/
Reviews
• How does BTC ensures that a user’s BTC is not spent by
other people?
• Why is it important for the nodes in the BTC network to
agree on the order in which transactions occur?
• What is the difference between a transaction chain and a
block chain?
• How do the nodes of the BTC network reach consensus on
the which blocks are in the block chain?
• How does the security of the block chain, the number of the
diversity of the miners and the value of the bitcoin
influence each other?
Reviews
• For an attacker with more than 50% of the computing power
of the BTC network, is it possible for the attacker to
• steal bitcoins from existing addresses
• supress transactions from the block chain or from the BTC’s p2p
network
• change block reward
• destroy confidence in bitcoin
• launch double spend attack
• How is a transaction being handled by the BTC network?
• How is the validity of a transaction being checked?
• Why is it hard to solve the puzzle for creating a valid block?