Re: Major security hole in Hyperledger Fabric - Private Data is not private #fabric-chaincode #ssl #fabric #fabric-questions #fabric-dstorage
If you have a chaincode that requires more than 1 organization to endorse the transaction, you need the execution of both chaincodes to produce the same results, so the hashes of the private data have to have the same salt, which means
their source of randomness most likely has to come from the client / SDK.
The client can pass this entropy via the transient map mechanism, however wasn't implemented (as you noted).
I wouldn't say that this is a "security hole", but you are correct that this needs to be documented so people that aren't educated about security will not shoot themselves in the foot.
Would you like to make a PR to add this to https://github.com/hyperledger/fabric/blob/master/docs/source/private-data/private-data.md?
From: "Ivan Ch" <acizlan@...>
Date: 10/21/2019 04:21 PM
Subject: [EXTERNAL] [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...
PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.
It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.
The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.
The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data. However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.
Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT. And I am just talking about using a single cpu on my laptop, not even 50% of its processing power
Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.
How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do. To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.
If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.
Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver