Bitcoin is not anonymous, but, rather, pseudo-anonymous. By now, most Bitcoin veterans know this. It’s less obvious to many, however, why Bitcoin is not really anonymous by default, and what can be done to de-anonymize Bitcoin users – and what Bitcoin users can do to reclaim their privacy.
Below is an advanced beginners guide to get a better understanding of the nuances of Bitcoin and anonymity.
How do Bitcoin transactions work?
To better understand Bitcoin’s anonymity, it's necessary to first understand how Bitcoin works on a basic level.
Most importantly, the Bitcoin protocol effectively consist of a series of transactions. These transactions are basically a package of different kinds of data, among which are transaction inputs and transaction outputs. Inputs
refer to Bitcoin addresses used to send bitcoin from, and can only be spent using the private key associated to that address. Outputs
effectively refer to addresses used to send bitcoin to. Each Bitcoin transaction transfers bitcoin from one or several inputs to one or several outputs (therefore, transferring bitcoin from one or several addresses to one or several addresses).
It's possible for a transaction to simply have one input and one output. But that is rare, as it would require that the amount of bitcoin to be sent (the output) precisely equal the amount of an earlier amount received (the input).
Instead, it's quite common that a transaction consists of multiple smaller inputs in order to make for one larger transaction. If someone, for instance, controls three different inputs of one bitcoin each, and needs to send 2.5 bitcoin to an online store, the software will merge all three inputs into a single transaction.
And it's even more common that a transaction consists of multiple outputs. This is because Bitcoin uses change addresses. Change addresses allow users to create a transaction that returns the excess amount of bitcoin from one or several inputs back to the original sender. So in the example above, the software will typically create two outputs. One output attributes 2.5 bitcoin to the address belonging to the online store, while another output will attribute .5 bitcoin back to the newly generated (change) address controlled by the sender.
What makes bitcoin 'anonymous'?
There are generally three reasons why bitcoin is sometimes regarded as anonymous.
First, unlike bank accounts and most other payment systems, Bitcoin
addresses are not tied to the identity of users
on a protocol level. Anyone can create a new and completely random Bitcoin address (and the associated private key) at any time, without the need to submit any personal information to anyone.
Second, transactions are not tied to the identity of users
either. As such, (and as long as a miner includes the transaction in a block) anyone can effectively transfer bitcoin from any address to which it controls the (private) keys, to any other address, with no need to reveal any personal information at all. Like physical cash, not even the receiver needs to know the identity of the sender.
And third, Bitcoin transaction data is transmitted and forwarded by nodes to a random set of nodes
on the peer-to-peer network. While Bitcoin nodes do connect to each other using IP-addresses, it's not necessarily clear for nodes whether the transaction data they received was created by the node they connect to, or if that node merely forwarded that data.
How is anonymity defeated?
There are basically three ways to de-anonymize Bitcoin users.
First of all, even though Bitcoin transactions are randomly transmitted over the peer-to-peer network, this system is not airtight. If an attacker, for instance, has the means to connect multiple nodes
to the Bitcoin network, the combined data collected from these different nodes might be enough to determine where a transaction originated.
Second, Bitcoin addresses can be linked to real identities if these real identities are used in combination
with the Bitcoin addresses in some way. This includes addresses used to deposit or withdraw money to or from a (regulated) exchange or wallet service, publicly exposed donation addresses, or addresses simply used to send bitcoin to someone (including the online store) when using a real identity.
But perhaps most importantly, all
transactions over the Bitcoin network are completely transparent and traceable by anyone. It's typically this complete transparency that allows multiple Bitcoin addresses to be clustered
together, and be tied to the same user. Therefore, if just one of these clustered addresses is linked to a real-world identity through one or several of the other de-anonymizing methods, all clustered addresses can be.
What is clustering?
Let’s take a closer look at clustering.
A very basic clustering method is the analysis of
transactions networks.
In its most basic form, this refers to the several inputs combined into a single transaction. While these inputs could have originated from different addresses, the fact that they were combined into a single transaction suggests that all these inputs – and therefore all related addresses – are controlled by the same user.
Similarly, there are various methods to identify change addresses
as being change addresses, which links them to the sender of the transaction. This is fairly straightforward when receiving bitcoin; the output that is not attributed to you is typically (though not always) attributed to the change address controlled by the sender. In addition, some Bitcoin software, reveals the change address to attentive onlookers, too. It does so, for instance, by always creating a change address as the last output
of a transaction. The use of
multisig-addresses
can be a giveaway as well.
Another clustering method is taint analysis. Taint analysis is fairly straightforward, too, and is even offered
by several freely accessible block explorers. Basically, taint analysis calculates what percentage of bitcoin on a specific address originated from another specific address, whether the addresses are one transaction separated from each other – or more.
And then there's amount analysis
and timing analysis. Amount analysis, as the name suggests, doesn't track specific transactions, but rather specific amounts. Similarly, timing analysis tracks specific times. If, for example, one input is exactly 2.6539924 bitcoin, and an unrelated output is exactly 2.6539924 (minus fee) one block later, it suggests that the sending and receiving addresses belong to someone using some kind of mixer (see below).
What can be done to reclaim privacy?
Bitcoin privacy is still very much an arms race. While progress is being made to improve Bitcoin anonymity on one hand, possible methods to de-anonymize users are often established on the other. And while it is beyond the scope of this article to explore all potential future possibilities to improve anonymity, there are some basic methods to increase privacy on the Bitcoin network available right now.
One such a straightforward solution is using TOR or other methods to hide IP addresses.
If Bitcoin transactions are transmitted over TOR, there is no way to determine where they originated from (granted that TOR itself does as promised, of course).
Another basic solution to increase privacy is creating a new address for each transaction. Creating a new address for each transaction makes it harder to link addresses to real identities, as it would at the very least require more clustering to do so. An increasing number of Bitcoin wallets do this automatically using hierarchical deterministic
(HD) wallet software.
A slightly more advanced method to gain privacy is the use of mixers. Mixers exist in multiple shapes and forms, but they basically enable that everyone using the mixer receives each others' bitcoin. If done well, mixing counters the analysis of transaction networks as well as taint analysis. And for improved results, mixing can be repeated.
One example of such a mixing strategy is CoinJoin, which merges inputs from and outputs to several users into one transaction – breaking the assumption that all inputs belong to the same user. CoinJoin does not, however, remove all taint from a Bitcoin address, since the inputs and outputs are still connected to some degree.
Alternatively, some mixers
can
remove all taint, as they return unrelated bitcoin
from completely different addresses
belonging to the mixer. However, these mixers are typically centralized, and as such
will
know the sending and receiving Bitcoin addresses belonging to users.
Additionally, to counter amount analysis, mixers can require all users to submit the same amount
into the mix. Alternatively, mixing services can charge a random fee, making it harder for an outsider to link the amount of bitcoin sent to the amount returned. Furthermore, it's possible to break up the amount
mixed, further obfuscating the coins, while smaller amounts
are easier lost in “the crowd” of transactions.
To counter timing analysis, moreover, mixers can wait some random time
before they send coins back; the longer this range, the harder it becomes to link transactions. Furthermore, extending the mixing time
increases the likelihood of transactions to be obfuscated with normal transactions.
But in the end, Bitcoin privacy is still a sliding scale – not a binary problem. Rather than being either completely anonymous or not at all, Bitcoin users enjoy a certain level of privacy, depending on how much of their identity they reveal, which of the anonymizing techniques they apply, how many, and how often.
N.b.: For specific examples of mixing techniques, see the research paper cited below.
The article is largely based on 'Research on Anonymization and De-anonymization in the Bitcoin System
', an ATR Defense Science & Technology Lab. paper by QingChun ShenTu and JianPing Yu from Bitbank Research Labs, published by Shenzhen University. Additional thanks go to
Bitsquare
developer Manfred Karrer and
Blocktrail
co-founder Jop Hartog for providing feedback on an earlier draft of this article.