PrivateData is marketed as
a data privacy solution in Hyperledger Fabric.
Unfortunately, this is just another serious security hole
somehow went under the radar, and all projects using this
function are at risk.
It amazes me that nobody
had mentioned this before so I guess I better point this
out now before more damages are being done.
The logic behind
Privated data is simple, it put data in a local embedded
data store and put a hash of that data on blockchain.
The issue is that
cryptographic hash is not an encryption mechanism, same
data hashed by anyone using the same hashing algorithm
will always get you the same hash! This is exactly what
hash functions are designed for, and that’s why we use
hash in digital signature to allow anyone to validate
signed data. However, this also means that anyone can
“decrypt” the data behind the hash by launching dictionary
attack.
Hashing is cheap, the
cost of each hash on a normal laptop cpu core is about 3
microseconds, basically I can create 1 billion candidate
result hashes within one hour on a single laptop cpu, and
check if they match to the hashes on hyperledger fabric
DLT. And I am just talking about using a single cpu on
my laptop, not even 50% of its processing power
Why is it dangerous?
Because if an attacker is connected to a blockchain
system, the attacker likely know the range of the data
being hashed (for example, hashed data could be trade ID,
item name, bank name, address, cell phone number), so you
can easily create dictionary attack to get the true data
behind the hash.
How about adding salt to
each data to be hashed? Well, that’s one thing Hyperledger
Fabric didn’t do. To their defense, hyperledger didn’t
implement salt because it is difficult to pass salts to
counter parties. You can’t use DLT to pass salt value to
counter parties because attackers would see it, so you
have to create another p2p connection with counter party
and send it over.
If you already have p2p
connection with all the counter parties, what’s the point
of using blockchain in the first place? just send your
data over! It’s just scary that so many people are using
this security hole and put their data in de facto clear
text.
Sure, if the hashed data
is so big then it would harder to perform dictionary
attack, but you better be very careful before using this
feature because any mis-use will result in data leak, it
is sad so many people actually believe this is a problem
solver