Major security hole in Hyperledger Fabric - Private Data is not private #fabric-chaincode #ssl #fabric #fabric-questions #fabric-dstorage


Ivan Ch <acizlan@...>
 

PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver


Yacov
 

Hi Ivan.

If you have a chaincode that requires more than 1 organization to endorse the transaction, you need the execution of both chaincodes to produce the same results, so the hashes of the private data have to have the same salt, which means
their source of randomness most likely has to come from the client / SDK.

The client can pass this entropy via the transient map mechanism, however wasn't implemented (as you noted).

I wouldn't say that this is a "security hole", but you are correct that this needs to be documented so people that aren't educated about security will not shoot themselves in the foot.

Would you like to make a PR to add this to https://github.com/hyperledger/fabric/blob/master/docs/source/private-data/private-data.md?


- Yacov.



From:        "Ivan Ch" <acizlan@...>
To:        fabric@...
Date:        10/21/2019 04:21 PM
Subject:        [EXTERNAL] [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by:        fabric@...




PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver





Senthil Nathan
 

Hi Ivan,

    Thank you for bringing this. We have discussed about including salt in the private data design document --
https://docs.google.com/document/d/1ShrgrYPWLznZSZrl5cnvmFq9LtLJ3tYUxjv9GN6rxuI/edit?usp=sharing
(please refer to section 2.6 Additional Consideration -- Salt Consideration).
We do have a JIRA for the same as well -- https://jira.hyperledger.org/browse/FAB-5101 but didn't implement
it as we have decided to leave it to the user for now (also for simplicity & flexibility).

    The salt to the data can always be added by the client which submits the transaction proposal. For example,
in the following JSON content, there can be an additional field called salt and the user can add any random data
to avoid a dictionary attack.
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
"salt": 88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589
}}

The same can be done for the keys too, not just values. As far as I know, many developers who use private data
follow this approach. I agree that a few might be unaware of this. As Yacov mentioned, we should add this approach
to our doc.

Regards,
Senthil


On Mon, Oct 21, 2019 at 6:51 PM Ivan Ch <acizlan@...> wrote:

PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver


David Enyeart
 

Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fabric recommends that private data be salted to avoid dictionary attacks. As this thread makes clear not everybody knows about the private data solution design considerations. I've opened Jira issue https://jira.hyperledger.org/browse/FAB-16885 to enhance the documentation with these considerations.


Dave Enyeart

"Senthil Nathan" ---10/21/2019 09:58:56 AM---Hi Ivan, Thank you for bringing this. We have discussed about including salt in

From: "Senthil Nathan" <cendhu@...>
To: Ivan Ch <acizlan@...>
Cc: fabric@...
Date: 10/21/2019 09:58 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Hi Ivan,

    Thank you for bringing this. We have discussed about including salt in the private data design document --
https://docs.google.com/document/d/1ShrgrYPWLznZSZrl5cnvmFq9LtLJ3tYUxjv9GN6rxuI/edit?usp=sharing
(please refer to section 2.6 Additional Consideration -- Salt Consideration).
We do have a JIRA for the same as well -- https://jira.hyperledger.org/browse/FAB-5101 but didn't implement
it as we have decided to leave it to the user for now (also for simplicity & flexibility).

    The salt to the data can always be added by the client which submits the transaction proposal. For example,
in the following JSON content, there can be an additional field called salt and the user can add any random data
to avoid a dictionary attack.
{"menu": {
 "id": "file",
 "value": "File",
 "popup": {
   "menuitem": [
     {"value": "New", "onclick": "CreateNewDoc()"},
     {"value": "Open", "onclick": "OpenDoc()"},
     {"value": "Close", "onclick": "CloseDoc()"}
   ]
 }

 "salt": 88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589

}}


The same can be done for the keys too, not just values. As far as I know, many developers who use private data
follow this approach. I agree that a few might be unaware of this. As Yacov mentioned, we should add this approach
to our doc.

Regards,
Senthil

On Mon, Oct 21, 2019 at 6:51 PM Ivan Ch <acizlan@...> wrote:
    PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

    It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

    The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

    The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

    Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

    Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

    How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

    If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

    Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver






Ivan Ch <acizlan@...>
 

thanks for reply

but I think you guys are down playing the seriousness of this issue. 

if u add salt then the salt must be passed to others so others can validate.

to avoid others to launch  dictionary attack, u must (in ur implementation)force peers to use private point2point connections to send the hash, otherwise u may create another security hole. 

plus, forcing p2p connection among participants would literally destroy the purpose of blockchain. 

this functionality need to change its name to something like "chain hash" to save others falsely believe this is a data privacy functionality. i know there must be marketing concerns calling it "private data", but u guys need to be responsible



Gari Singh <garis@...>
 

I think you might have missed one of the points on how you can actually pass in a salt value to all endorsing peers.
Proposal (endorsement) requests have a "transient" field which can be used. The value of this field can be extracted in chaincode and used to salt the data. It is never persisted in the actual ledger itself.

-----------------------------------------
Gari Singh
Distinguished Engineer, CTO - IBM Blockchain
IBM Middleware
550 King St
Littleton, MA 01460
Cell: 978-846-7499
garis@...
-----------------------------------------

-----fabric@... wrote: -----
To: fabric@...
From: "Ivan Ch"
Sent by: fabric@...
Date: 10/22/2019 05:23AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl

thanks for reply

but I think you guys are down playing the seriousness of this issue.

if u add salt then the salt must be passed to others so others can validate.

to avoid others to launch dictionary attack, u must (in ur implementation)force peers to use private point2point connections to send the hash, otherwise u may create another security hole.

plus, forcing p2p connection among participants would literally destroy the purpose of blockchain.

this functionality need to change its name to something like "chain hash" to save others falsely believe this is a data privacy functionality. i know there must be marketing concerns calling it "private data", but u guys need to be responsible


Yacov
 

Hey Ivan.

Private data is disseminated in a point to point manner among peers even now.
The peers that posses the private data, send the peers that don't (but are eligible of receiving it) the hash pre-images, and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block.

I don't see any technical obstacle that prevents you to add a salt per collection name for a given transaction, that will be concatenated to the computation of the hash of the key and the value for the said collection.
The salt can be part of the data element that is generated at the time of chaincode invocation, and will be passed along with the private data itself.

I don't agree that point to point connections defeat the purpose of the Blockchain, as the all this point to point data that is kept off-chain can be easily and efficiently verified if needed since its value is bound to the public blocks.

- Yacov.



From:        "Ivan Ch" <acizlan@...>
To:        fabric@...
Date:        10/22/2019 12:23 PM
Subject:        [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by:        fabric@...




thanks for reply

but I think you guys are down playing the seriousness of this issue.

if u add salt then the salt must be passed to others so others can validate.

to avoid others to launch  dictionary attack, u must (in ur implementation)force peers to use private point2point connections to send the hash, otherwise u may create another security hole.

plus, forcing p2p connection among participants would literally destroy the purpose of blockchain.

this functionality need to change its name to something like "chain hash" to save others falsely believe this is a data privacy functionality. i know there must be marketing concerns calling it "private data", but u guys need to be responsible






David Enyeart
 

Thanks again Ivan for pointing out the documentation hole - here's the doc update that describes how private data is secured:
https://hyperledger-fabric.readthedocs.io/en/latest/private-data-arch.html#protecting-private-data-content


Dave Enyeart

"David Enyeart" ---10/21/2019 11:03:49 AM---Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fa

From: "David Enyeart" <enyeart@...>
To: "Senthil Nathan" <cendhu@...>
Cc: Ivan Ch <acizlan@...>, fabric@...
Date: 10/21/2019 11:03 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fabric recommends that private data be salted to avoid dictionary attacks. As this thread makes clear not everybody knows about the private data solution design considerations. I've opened Jira issue https://jira.hyperledger.org/browse/FAB-16885 to enhance the documentation with these considerations.


Dave Enyeart

"Senthil Nathan" ---10/21/2019 09:58:56 AM---Hi Ivan, Thank you for bringing this. We have discussed about including salt in

From:
"Senthil Nathan" <cendhu@...>
To:
Ivan Ch <acizlan@...>
Cc:
fabric@...
Date:
10/21/2019 09:58 AM
Subject:
[EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by:
fabric@...




Hi Ivan,

    Thank you for bringing this. We have discussed about including salt in the private data design document --
https://docs.google.com/document/d/1ShrgrYPWLznZSZrl5cnvmFq9LtLJ3tYUxjv9GN6rxuI/edit?usp=sharing
(please refer to section 2.6 Additional Consideration -- Salt Consideration).
We do have a JIRA for the same as well -- https://jira.hyperledger.org/browse/FAB-5101 but didn't implement
it as we have decided to leave it to the user for now (also for simplicity & flexibility).

    The salt to the data can always be added by the client which submits the transaction proposal. For example,
in the following JSON content, there can be an additional field called salt and the user can add any random data
to avoid a dictionary attack.
{"menu": {
"id": "file",
"value": "File",
"popup": {
  "menuitem": [
    {"value": "New", "onclick": "CreateNewDoc()"},
    {"value": "Open", "onclick": "OpenDoc()"},
    {"value": "Close", "onclick": "CloseDoc()"}
  ]
}

"salt": 88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589

}}


The same can be done for the keys too, not just values. As far as I know, many developers who use private data
follow this approach. I agree that a few might be unaware of this. As Yacov mentioned, we should add this approach
to our doc.

Regards,
Senthil

On Mon, Oct 21, 2019 at 6:51 PM Ivan Ch <acizlan@...> wrote:
      PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

      It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

      The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

      The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

      Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

      Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

      How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

      If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

      Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver









Brian Behlendorf <bbehlendorf@...>
 

Lemons into lemonade. Thanks David and others who turned this from flame war kindling to a positive outcome.

Brian

On 10/22/19 8:28 AM, David Enyeart wrote:

Thanks again Ivan for pointing out the documentation hole - here's the doc update that describes how private data is secured:
https://hyperledger-fabric.readthedocs.io/en/latest/private-data-arch.html#protecting-private-data-content


Dave Enyeart

"David Enyeart" ---10/21/2019 11:03:49 AM---Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fa

From: "David Enyeart" <enyeart@...>
To: "Senthil Nathan" <cendhu@...>
Cc: Ivan Ch <acizlan@...>, fabric@...
Date: 10/21/2019 11:03 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fabric recommends that private data be salted to avoid dictionary attacks. As this thread makes clear not everybody knows about the private data solution design considerations. I've opened Jira issue https://jira.hyperledger.org/browse/FAB-16885 to enhance the documentation with these considerations.


Dave Enyeart

"Senthil Nathan" ---10/21/2019 09:58:56 AM---Hi Ivan, Thank you for bringing this. We have discussed about including salt in

From:
"Senthil Nathan" <cendhu@...>
To:
Ivan Ch <acizlan@...>
Cc:
fabric@...
Date:
10/21/2019 09:58 AM
Subject:
[EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by:
fabric@...




Hi Ivan,

    Thank you for bringing this. We have discussed about including salt in the private data design document --
https://docs.google.com/document/d/1ShrgrYPWLznZSZrl5cnvmFq9LtLJ3tYUxjv9GN6rxuI/edit?usp=sharing
(please refer to section 2.6 Additional Consideration -- Salt Consideration).
We do have a JIRA for the same as well -- https://jira.hyperledger.org/browse/FAB-5101 but didn't implement
it as we have decided to leave it to the user for now (also for simplicity & flexibility).

    The salt to the data can always be added by the client which submits the transaction proposal. For example,
in the following JSON content, there can be an additional field called salt and the user can add any random data
to avoid a dictionary attack.
{"menu": {
"id": "file",
"value": "File",
"popup": {
  "menuitem": [
    {"value": "New", "onclick": "CreateNewDoc()"},
    {"value": "Open", "onclick": "OpenDoc()"},
    {"value": "Close", "onclick": "CloseDoc()"}
  ]
}

"salt": 88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589

}}


The same can be done for the keys too, not just values. As far as I know, many developers who use private data
follow this approach. I agree that a few might be unaware of this. As Yacov mentioned, we should add this approach
to our doc.

Regards,
Senthil

On Mon, Oct 21, 2019 at 6:51 PM Ivan Ch <acizlan@...> wrote:
      PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

      It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

      The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

      The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

      Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

      Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

      How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

      If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

      Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver









-- 
Brian Behlendorf
Executive Director, Hyperledger
bbehlendorf@...
Twitter: @brianbehlendorf


Ivan Ch <acizlan@...>
 

Hi Yacov, 

thanks for your reply, let me clarify the jargon here so more people can understand

pre-image: data itself and its salt
"Private data is disseminated in a point to point manner among peers even now.The peers that posses the private data, send the peers that don't (but are eligible of receiving it) the hash pre-images, and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block.

I don't see any technical obstacle that prevents you to add a salt per collection name for a given transaction, that will be concatenated to the computation of the hash of the key and the value for the said collection.
The salt can be part of the data element that is generated at the time of chaincode invocation, and will be passed along with the private data itself."
first of all, I appreciate you agree that another point 2 point connection must be established between orgs to pass the salt and the image itself, anything on chain can be used to launch pre-image/dictionary attack

of course there is no technical obstacle to create salt, but the issue here is that it creates a false sense that data is private and can be validated. let me explain:

you try to argue that the salted hash on the public chain is a proof that some data is "valid". this itself is a terrible argument because hashes (unlike digital signature, homomorphic encryption) is not something that others can verify when the data (hash) it put on public chain.

here is an example: my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

my point is that the claims about private data mislead people to believe this feature will either help to orgs to protect data or validate a pre-existing data, but neither is true and can be easily used by an adversary to decode data (if there is no salt or salt is known) or to trick people believe in wrong data like the sample above.












Jay Guo
 

I don't think that's a valid example for private data - Private data can only prevent your actually ID from being read by other unauthorized parties, as for whether that ID is valid or not, it's really up to your application to decide. If someone is simply allowed to put arbitrary data on chain without proving it, I'd say that's a problem with application design, instead of Private Data in Fabric.

Hopefully this makes sense
- J

On Wed, Oct 23, 2019 at 10:59 AM Ivan Ch <acizlan@...> wrote:
Hi Yacov, 

thanks for your reply, let me clarify the jargon here so more people can understand

pre-image: data itself and its salt
"Private data is disseminated in a point to point manner among peers even now.The peers that posses the private data, send the peers that don't (but are eligible of receiving it) the hash pre-images, and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block.

I don't see any technical obstacle that prevents you to add a salt per collection name for a given transaction, that will be concatenated to the computation of the hash of the key and the value for the said collection.
The salt can be part of the data element that is generated at the time of chaincode invocation, and will be passed along with the private data itself."
first of all, I appreciate you agree that another point 2 point connection must be established between orgs to pass the salt and the image itself, anything on chain can be used to launch pre-image/dictionary attack

of course there is no technical obstacle to create salt, but the issue here is that it creates a false sense that data is private and can be validated. let me explain:

you try to argue that the salted hash on the public chain is a proof that some data is "valid". this itself is a terrible argument because hashes (unlike digital signature, homomorphic encryption) is not something that others can verify when the data (hash) it put on public chain.

here is an example: my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

my point is that the claims about private data mislead people to believe this feature will either help to orgs to protect data or validate a pre-existing data, but neither is true and can be easily used by an adversary to decode data (if there is no salt or salt is known) or to trick people believe in wrong data like the sample above.












Yacov
 

Hi Ivan.

> you try to argue that the salted hash on the public chain is a proof that some data is "valid". this itself is a terrible argument because hashes (unlike digital signature, homomorphic encryption) is not something that others can verify when the data (hash) it put on public chain.

No, that's not what I am arguing.
I said: , and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block
which means that they do just that - ensure that the hash pre-image of the private data corresponds to the hash in the public block.

That's what private data is - a means for several organizations to send each other information without putting it on the blockchain, but still bind the data to the blockchain for non repudiation of the fact that the data was put there (not of any other world / business facts as in your example).


> first of all, I appreciate you agree that another point 2 point connection must be established between orgs to pass the salt and the image itself, anything on chain can be used to launch pre-image/dictionary attack

Well, but this is already what is done now. This is how private data works in Fabric:
  • You (as the user) have the ability to put on the blockchain hashes of salted data.
  • The data is disseminated in a secure point to point connection between peers that are eligible of receiving the data.


- Yacov




From:        "Ivan Ch" <acizlan@...>
To:        fabric@...
Date:        10/23/2019 05:59 AM
Subject:        [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by:        fabric@...




Hi Yacov,

thanks for your reply, let me clarify the jargon here so more people can understand

pre-image:
data itself and its salt
"Private data is disseminated in a point to point manner among peers even now.The peers that posses the private data, send the peers that don't (but are eligible of receiving it) the hash pre-images, and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block.

I don't see any technical obstacle that prevents you to add a salt per collection name for a given transaction, that will be concatenated to the computation of the hash of the key and the value for the said collection.The salt can be part of the data element that is generated at the time of chaincode invocation, and will be passed along with the private data itself."

first of all, I appreciate you agree that another point 2 point connection must be established between orgs to pass the salt and the image itself, anything on chain can be used to launch pre-image/dictionary attack

of course there is no technical obstacle to create salt, but the issue here is that it creates a false sense that data is private and can be validated. let me explain:

you try to argue that the salted hash on the public chain is a proof that some data is "valid". this itself is a terrible argument because hashes (unlike digital signature, homomorphic encryption) is not something that others can verify when the data (hash) it put on public chain.

here is an example:
my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

my point is that the claims about private data mislead people to believe this feature will either help to orgs to protect data or validate a pre-existing data, but neither is true and can be easily used by an adversary to decode data (if there is no salt or salt is known) or to trick people believe in wrong data like the sample above.















Alexandre Pauwels
 

Hey Ivan,
Correct me if I'm wrong, but it seems you are thinking that the private data as implemented is flawed, and that the requirement to salt the data to secure it defeats the purpose of having the blockchain in the middle; again, let me know if this is a bad assumption of your thinking. However, the private-data store (which I'll call the pre-image store) and the chain of hashes (which I'll call the block store) exist for parallel but complementary reasons.

The block store cannot exist on its own as it stores no useful data which can be acted upon, this is obvious. It is simply a list of updates to salted hashes.

The pre-image store cannot exist on its own as, when you receive new information, you have no idea if the person giving you the information is giving you the same information that everyone else has. The purpose of the chain of hashes is to ensure that the plain-text information you have is the same copy of the plain-text information that everybody else has.

The role of ensuring that the data initially placed on the chain is accurate is NOT something that is determined by either data storage methods, it's something that's determined by the logic in your chaincode, e.g. in your example, you would be unable to send an update claiming your national ID is "7654321" in the first place, as the government which wrote the chaincode that you are calling would not allow you to do so. A better example would be to say that you are a bad actor and you would like to fool someone into thinking you are individual with ID "7654321". You would give them your public cert and your claimed ID along with a salt, and they would be unable to verify it as when they query for the national ID by the cert and then hashed it with the salt you gave, the hashes would not match.

Hope that makes sense,
Alex


On Tue, Oct 22, 2019 at 10:59 PM Ivan Ch <acizlan@...> wrote:
Hi Yacov, 

thanks for your reply, let me clarify the jargon here so more people can understand

pre-image: data itself and its salt
"Private data is disseminated in a point to point manner among peers even now.The peers that posses the private data, send the peers that don't (but are eligible of receiving it) the hash pre-images, and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block.

I don't see any technical obstacle that prevents you to add a salt per collection name for a given transaction, that will be concatenated to the computation of the hash of the key and the value for the said collection.
The salt can be part of the data element that is generated at the time of chaincode invocation, and will be passed along with the private data itself."
first of all, I appreciate you agree that another point 2 point connection must be established between orgs to pass the salt and the image itself, anything on chain can be used to launch pre-image/dictionary attack

of course there is no technical obstacle to create salt, but the issue here is that it creates a false sense that data is private and can be validated. let me explain:

you try to argue that the salted hash on the public chain is a proof that some data is "valid". this itself is a terrible argument because hashes (unlike digital signature, homomorphic encryption) is not something that others can verify when the data (hash) it put on public chain.

here is an example: my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

my point is that the claims about private data mislead people to believe this feature will either help to orgs to protect data or validate a pre-existing data, but neither is true and can be easily used by an adversary to decode data (if there is no salt or salt is known) or to trick people believe in wrong data like the sample above.












Ivan Ch <acizlan@...>
 

Hi Alexandre, Yacov

Thanks for your reply and I appreciate the discussion. my hands are tight now so I will give my full response later today:

Yes, my point is private data design maybe flawed in two ways: one is fixable by adding salt and then use point2point connection to send pre-image data to intended recipient .

However, the second issue is more fundamental and may be difficult to solve. In short, private data design would only work if all participants are honest parties. maybe I should use something that's not always fixed like national ID such as "trade ID" in my earlier example. (I am still trying to avoid real life examples here as it may give bad guys a chance to look). 

cheers

Ivan


David Enyeart
 

Your second point is not specific to private data. Agreement on input data needs to be part of the application design, regardless of whether it is a private data scenario or not. For example the smart contract may require that each of the transactors submit their approval of a proposed data change on chain, before a final transaction verifies the approvals are in place and makes the change on chain.


Dave Enyeart

"Ivan Ch" ---10/23/2019 12:10:40 PM---Hi Alexandre, Yacov Thanks for your reply and I appreciate the discussion. my hands are tight now so

From: "Ivan Ch" <acizlan@...>
To: fabric@...
Date: 10/23/2019 12:10 PM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Hi Alexandre, Yacov

Thanks for your reply and I appreciate the discussion. my hands are tight now so I will give my full response later today:

Yes, my point is private data design maybe flawed in two ways: one is fixable by adding salt and then use point2point connection to send pre-image data to intended recipient .

However, the second issue is more fundamental and may be difficult to solve. In short, private data design would only work if all participants are honest parties. maybe I should use something that's not always fixed like national ID such as "trade ID" in my earlier example. (I am still trying to avoid real life examples here as it may give bad guys a chance to look).

cheers

Ivan




Ivan Ch <acizlan@...>
 

Dave, Yacov, and Alex

Seems that the general response to this scenario is “this is an application design problem and should be solved by chaincode”

 
here is an example: my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

But my argument here is that chaincode design can’t solve this problem, and I can assure you that there is a large number of DLT deployments are at risk because of this.

 

As I stated earlier, hashes cannot be verified by third parties like digital signature or ZKP algorithm.  There is almost no way to guard against adversaries from putting fake data and then trick others to believe the fake data is real.

 

Since chaincode can’t decode hashes so the only thing a chaincode can perform is to limit on number of updates. In most financial use cases (e.g. trade transactions) this is irrelevant since pre-image data are not constants in the first place. Even for constant data such as “national ID” in the aforementioned scenario, chaincode most likely will still allow at least a few updates to cover typos.

 

Leaving it to applications is easier said than done since there are so few ways to get it right and this functionality simply opens door for attackers and yet offers almost nothing.

 

This bug is neither an application design issue nor fabric implementation issue, but a methodology problem that private data feature promotes. My humble recommendation is to depreciate this functionality or at least put warning signs to people still plan to use it


Senthil Nathan
 

Hi Ivan,

  

    As far as I know, Blockchain/DLT platform itself does not claim to find fake data. However, one may build an application using blockchain to find fake data. An example from real-world  -- https://www.coindesk.com/new-york-times-confirms-its-using-blockchain-to-combat-fake-news


    Detecting fake data is a hard problem to solve. Some overview of ongoing research can be found here -- https://www.dropbox.com/s/pwoqrlfcyhw13pc/CombatingFakeNews.pdf?dl=0


Regards,

Senthil


On Thu, Oct 24, 2019 at 3:32 PM Ivan Ch <acizlan@...> wrote:

Dave, Yacov, and Alex

Seems that the general response to this scenario is “this is an application design problem and should be solved by chaincode”

 
here is an example: my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

But my argument here is that chaincode design can’t solve this problem, and I can assure you that there is a large number of DLT deployments are at risk because of this.

 

As I stated earlier, hashes cannot be verified by third parties like digital signature or ZKP algorithm.  There is almost no way to guard against adversaries from putting fake data and then trick others to believe the fake data is real.

 

Since chaincode can’t decode hashes so the only thing a chaincode can perform is to limit on number of updates. In most financial use cases (e.g. trade transactions) this is irrelevant since pre-image data are not constants in the first place. Even for constant data such as “national ID” in the aforementioned scenario, chaincode most likely will still allow at least a few updates to cover typos.

 

Leaving it to applications is easier said than done since there are so few ways to get it right and this functionality simply opens door for attackers and yet offers almost nothing.

 

This bug is neither an application design issue nor fabric implementation issue, but a methodology problem that private data feature promotes. My humble recommendation is to depreciate this functionality or at least put warning signs to people still plan to use it


David Enyeart
 

You are essentially suggesting to add a warning that private data content can't be known by non-members of the collection. That is the whole point of private data and anybody considering an implementation will already know this. The non-members only validate against a hash of the data. The members can later share the private data content with non-members if a need-to-know arises, and the non-member can then validate the pre-image content against the hash on chain, with an understanding that only the group of transactors may have come to agreement on the data. This is the fundamental design of private data. Like any feature, It will be fit for some use cases, and not fit for others. I believe these considerations were already obvious, but hopefully this thread has provided some clarification. I am glad the thread has at least helped to improve the documentation around the importance of including a salt in your private data if it is predictable, to keep it secure.


Dave Enyeart

"Ivan Ch" ---10/24/2019 06:02:26 AM---Dave, Yacov, and Alex Seems that the general response to this scenario is “this is an application de

From: "Ivan Ch" <acizlan@...>
To: fabric@...
Date: 10/24/2019 06:02 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Dave, Yacov, and Alex

Seems that the general response to this scenario is “this is an application design problem and should be solved by chaincode”
       
      here is an example:
       my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.
But my argument here is that chaincode design can’t solve this problem, and I can assure you that there is a large number of DLT deployments are at risk because of this.
 
As I stated earlier, hashes cannot be verified by third parties like digital signature or ZKP algorithm.  There is almost no way to guard against adversaries from putting fake data and then trick others to believe the fake data is real.
 
Since chaincode can’t decode hashes so the only thing a chaincode can perform is to limit on number of updates. In most financial use cases (e.g. trade transactions) this is irrelevant since pre-image data are not constants in the first place. Even for constant data such as “national ID” in the aforementioned scenario, chaincode most likely will still allow at least a few updates to cover typos.
 
Leaving it to applications is easier said than done since there are so few ways to get it right and this functionality simply opens door for attackers and yet offers almost nothing.
 
This bug is neither an application design issue nor fabric implementation issue, but a methodology problem that private data feature promotes. My humble recommendation is to depreciate this functionality or at least put warning signs to people still plan to use it





Ivan Ch <acizlan@...>
 

Oct 21   

PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver

 

 Like
 
 Yacov
Oct 21   

Hi Ivan.

If you have a chaincode that requires more than 1 organization to endorse the transaction, you need the execution of both chaincodes to produce the same results, so the hashes of the private data have to have the same salt, which means
their source of randomness most likely has to come from the client / SDK.

The client can pass this entropy via the transient map mechanism, however wasn't implemented (as you noted).

I wouldn't say that this is a "security hole", but you are correct that this needs to be documented so people that aren't educated about security will not shoot themselves in the foot.

Would you like to make a PR to add this to https://github.com/hyperledger/fabric/blob/master/docs/source/private-data/private-data.md?


- Yacov.



From:        "Ivan Ch" <acizlan@...>
To:        fabric@...
Date:        10/21/2019 04:21 PM
Subject:        [EXTERNAL] [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by:        fabric@...




PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver




 

 Like
 
 Senthil Nathan
Oct 21   

Hi Ivan,
 
    Thank you for bringing this. We have discussed about including salt in the private data design document --
https://docs.google.com/document/d/1ShrgrYPWLznZSZrl5cnvmFq9LtLJ3tYUxjv9GN6rxuI/edit?usp=sharing
(please refer to section 2.6 Additional Consideration -- Salt Consideration).
We do have a JIRA for the same as well -- https://jira.hyperledger.org/browse/FAB-5101 but didn't implement
it as we have decided to leave it to the user for now (also for simplicity & flexibility).
 
    The salt to the data can always be added by the client which submits the transaction proposal. For example,
in the following JSON content, there can be an additional field called salt and the user can add any random data
to avoid a dictionary attack.
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
"salt": 88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589
}}
 
The same can be done for the keys too, not just values. As far as I know, many developers who use private data
follow this approach. I agree that a few might be unaware of this. As Yacov mentioned, we should add this approach
to our doc.
 
Regards,
Senthil
toggle quoted messageShow quoted text

 

 

 Like
 
 David Enyeart
Oct 21   

Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fabric recommends that private data be salted to avoid dictionary attacks. As this thread makes clear not everybody knows about the private data solution design considerations. I've opened Jira issue https://jira.hyperledger.org/browse/FAB-16885 to enhance the documentation with these considerations.


Dave Enyeart

"Senthil Nathan" ---10/21/2019 09:58:56 AM---Hi Ivan, Thank you for bringing this. We have discussed about including salt in

From: "Senthil Nathan" <cendhu@...>
To: Ivan Ch <acizlan@...>
Cc: fabric@...
Date: 10/21/2019 09:58 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Hi Ivan,

    Thank you for bringing this. We have discussed about including salt in the private data design document --
https://docs.google.com/document/d/1ShrgrYPWLznZSZrl5cnvmFq9LtLJ3tYUxjv9GN6rxuI/edit?usp=sharing
(please refer to section 2.6 Additional Consideration -- Salt Consideration).
We do have a JIRA for the same as well -- https://jira.hyperledger.org/browse/FAB-5101 but didn't implement
it as we have decided to leave it to the user for now (also for simplicity & flexibility).

    The salt to the data can always be added by the client which submits the transaction proposal. For example,
in the following JSON content, there can be an additional field called salt and the user can add any random data
to avoid a dictionary attack.
{"menu": {
 "id": "file",
 "value": "File",
 "popup": {
   "menuitem": [
     {"value": "New", "onclick": "CreateNewDoc()"},
     {"value": "Open", "onclick": "OpenDoc()"},
     {"value": "Close", "onclick": "CloseDoc()"}
   ]
 }

 "salt": 88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589

}}


The same can be done for the keys too, not just values. As far as I know, many developers who use private data
follow this approach. I agree that a few might be unaware of this. As Yacov mentioned, we should add this approach
to our doc.

Regards,
Senthil

On Mon, Oct 21, 2019 at 6:51 PM Ivan Ch <acizlan@...> wrote:PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver





 

 Like
 
Oct 22   

thanks for reply

but I think you guys are down playing the seriousness of this issue. 

if u add salt then the salt must be passed to others so others can validate.

to avoid others to launch  dictionary attack, u must (in ur implementation)force peers to use private point2point connections to send the hash, otherwise u may create another security hole. 

plus, forcing p2p connection among participants would literally destroy the purpose of blockchain. 

this functionality need to change its name to something like "chain hash" to save others falsely believe this is a data privacy functionality. i know there must be marketing concerns calling it "private data", but u guys need to be responsible


 

 Like
 
 Gari Singh
Oct 22   

I think you might have missed one of the points on how you can actually pass in a salt value to all endorsing peers.
Proposal (endorsement) requests have a "transient" field which can be used. The value of this field can be extracted in chaincode and used to salt the data. It is never persisted in the actual ledger itself.

-----------------------------------------
Gari Singh
Distinguished Engineer, CTO - IBM Blockchain
IBM Middleware
550 King St
Littleton, MA 01460
Cell: 978-846-7499
garis@...
-----------------------------------------

-----fabric@... wrote: -----
To: fabric@...
From: "Ivan Ch"
Sent by: fabric@...
Date: 10/22/2019 05:23AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl

thanks for reply

but I think you guys are down playing the seriousness of this issue.

if u add salt then the salt must be passed to others so others can validate.

to avoid others to launch dictionary attack, u must (in ur implementation)force peers to use private point2point connections to send the hash, otherwise u may create another security hole.

plus, forcing p2p connection among participants would literally destroy the purpose of blockchain.

this functionality need to change its name to something like "chain hash" to save others falsely believe this is a data privacy functionality. i know there must be marketing concerns calling it "private data", but u guys need to be responsible

 

 Like
 
 Yacov
Oct 22   

Hey Ivan.

Private data is disseminated in a point to point manner among peers even now.
The peers that posses the private data, send the peers that don't (but are eligible of receiving it) the hash pre-images, and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block.

I don't see any technical obstacle that prevents you to add a salt per collection name for a given transaction, that will be concatenated to the computation of the hash of the key and the value for the said collection.
The salt can be part of the data element that is generated at the time of chaincode invocation, and will be passed along with the private data itself.

I don't agree that point to point connections defeat the purpose of the Blockchain, as the all this point to point data that is kept off-chain can be easily and efficiently verified if needed since its value is bound to the public blocks.

- Yacov.



From:        "Ivan Ch" <acizlan@...>
To:        fabric@...
Date:        10/22/2019 12:23 PM
Subject:        [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by:        fabric@...




thanks for reply

but I think you guys are down playing the seriousness of this issue.

if u add salt then the salt must be passed to others so others can validate.

to avoid others to launch  dictionary attack, u must (in ur implementation)force peers to use private point2point connections to send the hash, otherwise u may create another security hole.

plus, forcing p2p connection among participants would literally destroy the purpose of blockchain.

this functionality need to change its name to something like "chain hash" to save others falsely believe this is a data privacy functionality. i know there must be marketing concerns calling it "private data", but u guys need to be responsible





 

 Like
 
 David Enyeart
Oct 22   

Thanks again Ivan for pointing out the documentation hole - here's the doc update that describes how private data is secured:
https://hyperledger-fabric.readthedocs.io/en/latest/private-data-arch.html#protecting-private-data-content


Dave Enyeart

"David Enyeart" ---10/21/2019 11:03:49 AM---Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fa

From: "David Enyeart" <enyeart@...>
To: "Senthil Nathan" <cendhu@...>
Cc: Ivan Ch <acizlan@...>, fabric@...
Date: 10/21/2019 11:03 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fabric recommends that private data be salted to avoid dictionary attacks. As this thread makes clear not everybody knows about the private data solution design considerations. I've opened Jira issue https://jira.hyperledger.org/browse/FAB-16885 to enhance the documentation with these considerations.


Dave Enyeart

"Senthil Nathan" ---10/21/2019 09:58:56 AM---Hi Ivan, Thank you for bringing this. We have discussed about including salt in

From: 
"Senthil Nathan" <cendhu@...>
To: 
Ivan Ch <acizlan@...>
Cc: 
fabric@...
Date: 
10/21/2019 09:58 AM
Subject: 
[EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: 
fabric@...




Hi Ivan,

    Thank you for bringing this. We have discussed about including salt in the private data design document --
https://docs.google.com/document/d/1ShrgrYPWLznZSZrl5cnvmFq9LtLJ3tYUxjv9GN6rxuI/edit?usp=sharing
(please refer to section 2.6 Additional Consideration -- Salt Consideration).
We do have a JIRA for the same as well -- https://jira.hyperledger.org/browse/FAB-5101 but didn't implement
it as we have decided to leave it to the user for now (also for simplicity & flexibility).

    The salt to the data can always be added by the client which submits the transaction proposal. For example,
in the following JSON content, there can be an additional field called salt and the user can add any random data
to avoid a dictionary attack.
{"menu": {
"id": "file",
"value": "File",
"popup": {
  "menuitem": [
    {"value": "New", "onclick": "CreateNewDoc()"},
    {"value": "Open", "onclick": "OpenDoc()"},
    {"value": "Close", "onclick": "CloseDoc()"}
  ]
}

"salt": 88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589

}}


The same can be done for the keys too, not just values. As far as I know, many developers who use private data
follow this approach. I agree that a few might be unaware of this. As Yacov mentioned, we should add this approach
to our doc.

Regards,
Senthil

On Mon, Oct 21, 2019 at 6:51 PM Ivan Ch <acizlan@...> wrote:PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver








 

 Like
 
 Brian Behlendorf
Oct 23   

Lemons into lemonade. Thanks David and others who turned this from flame war kindling to a positive outcome.
 
Brian
 
On 10/22/19 8:28 AM, David Enyeart wrote:

Thanks again Ivan for pointing out the documentation hole - here's the doc update that describes how private data is secured:
https://hyperledger-fabric.readthedocs.io/en/latest/private-data-arch.html#protecting-private-data-content


Dave Enyeart

"David Enyeart" ---10/21/2019 11:03:49 AM---Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fa

From: "David Enyeart" <enyeart@...>
To: "Senthil Nathan" <cendhu@...>
Cc: Ivan Ch <acizlan@...>fabric@...
Date: 10/21/2019 11:03 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Thanks for replying Yacov and Senthil. You're right that since the introduction of private data, Fabric recommends that private data be salted to avoid dictionary attacks. As this thread makes clear not everybody knows about the private data solution design considerations. I've opened Jira issue https://jira.hyperledger.org/browse/FAB-16885 to enhance the documentation with these considerations.


Dave Enyeart

"Senthil Nathan" ---10/21/2019 09:58:56 AM---Hi Ivan, Thank you for bringing this. We have discussed about including salt in

From: 
"Senthil Nathan" <cendhu@...>
To: 
Ivan Ch <acizlan@...>
Cc: 
fabric@...
Date: 
10/21/2019 09:58 AM
Subject: 
[EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: 
fabric@...




Hi Ivan,

    Thank you for bringing this. We have discussed about including salt in the private data design document --
https://docs.google.com/document/d/1ShrgrYPWLznZSZrl5cnvmFq9LtLJ3tYUxjv9GN6rxuI/edit?usp=sharing
(please refer to section 2.6 Additional Consideration -- Salt Consideration).
We do have a JIRA for the same as well -- https://jira.hyperledger.org/browse/FAB-5101 but didn't implement
it as we have decided to leave it to the user for now (also for simplicity & flexibility).

    The salt to the data can always be added by the client which submits the transaction proposal. For example,
in the following JSON content, there can be an additional field called salt and the user can add any random data
to avoid a dictionary attack.
{"menu": {
"id": "file",
"value": "File",
"popup": {
  "menuitem": [
    {"value": "New", "onclick": "CreateNewDoc()"},
    {"value": "Open", "onclick": "OpenDoc()"},
    {"value": "Close", "onclick": "CloseDoc()"}
  ]
}

"salt": 88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589

}}


The same can be done for the keys too, not just values. As far as I know, many developers who use private data
follow this approach. I agree that a few might be unaware of this. As Yacov mentioned, we should add this approach
to our doc.

Regards,
Senthil

On Mon, Oct 21, 2019 at 6:51 PM Ivan Ch <acizlan@...> wrote:PrivateData is marketed as a data privacy solution in Hyperledger Fabric. Unfortunately, this is just another serious security hole somehow went under the radar, and all projects using this function are at risk.  

It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.  

The logic behind Privated data is simple, it put data in a local embedded data store and put a hash of that data on blockchain.  

The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm will always get you the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.   However, this also means that anyone can “decrypt” the data behind the hash by launching dictionary attack.  

Hashing is cheap, the cost of each hash on a normal laptop cpu core is about 3 microseconds, basically I can create 1 billion candidate result hashes within one hour on a single laptop cpu, and check if they match to the hashes on hyperledger fabric DLT.   And I am just talking about using a single cpu on my laptop, not even 50% of its processing power  

Why is it dangerous? Because if an attacker is connected to a blockchain system, the attacker likely know the range of the data being hashed (for example, hashed data could be trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash.  

How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.   To their defense, hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value to counter parties because attackers would see it, so you have to create another p2p connection with counter party and send it over.

If you already have p2p connection with all the counter parties, what’s the point of using blockchain in the first place? just send your data over! It’s just scary that so many people are using this security hole and put their data in de facto clear text.  

Sure, if the hashed data is so big then it would harder to perform dictionary attack, but you better be very careful before using this feature because any mis-use will result in data leak, it is sad so many people actually believe this is a problem solver








 

-- 
Brian Behlendorf
Executive Director, Hyperledger
bbehlendorf@...
Twitter: @brianbehlendorf

 

 Like
 
Oct 23   

Hi Yacov, 

thanks for your reply, let me clarify the jargon here so more people can understand

pre-image: data itself and its salt
"Private data is disseminated in a point to point manner among peers even now.The peers that posses the private data, send the peers that don't (but are eligible of receiving it) the hash pre-images, and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block.

I don't see any technical obstacle that prevents you to add a salt per collection name for a given transaction, that will be concatenated to the computation of the hash of the key and the value for the said collection.
The salt can be part of the data element that is generated at the time of chaincode invocation, and will be passed along with the private data itself."
first of all, I appreciate you agree that another point 2 point connection must be established between orgs to pass the salt and the image itself, anything on chain can be used to launch pre-image/dictionary attack

of course there is no technical obstacle to create salt, but the issue here is that it creates a false sense that data is private and can be validated. let me explain:

you try to argue that the salted hash on the public chain is a proof that some data is "valid". this itself is a terrible argument because hashes (unlike digital signature, homomorphic encryption) is not something that others can verify when the data (hash) it put on public chain.

here is an example: my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

my point is that the claims about private data mislead people to believe this feature will either help to orgs to protect data or validate a pre-existing data, but neither is true and can be easily used by an adversary to decode data (if there is no salt or salt is known) or to trick people believe in wrong data like the sample above.











 

 Like
 
 Jay Guo
Oct 23   

I don't think that's a valid example for private data - Private data can only prevent your actually ID from being read by other unauthorized parties, as for whether that ID is valid or not, it's really up to your application to decide. If someone is simply allowed to put arbitrary data on chain without proving it, I'd say that's a problem with application design, instead of Private Data in Fabric.
 
Hopefully this makes sense
- J
toggle quoted messageShow quoted text

 

 

 Like
 
 Yacov
Oct 23   

Hi Ivan.

> you try to argue that the salted hash on the public chain is a proof that some data is "valid". this itself is a terrible argument because hashes (unlike digital signature, homomorphic encryption) is not something that others can verify when the data (hash) it put on public chain.

No, that's not what I am arguing.
I said: , and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block
which means that they do just that - ensure that the hash pre-image of the private data corresponds to the hash in the public block.

That's what private data is - a means for several organizations to send each other information without putting it on the blockchain, but still bind the data to the blockchain for non repudiation of the fact that the data was put there (not of any other world / business facts as in your example).


first of all, I appreciate you agree that another point 2 point connection must be established between orgs to pass the salt and the image itself, anything on chain can be used to launch pre-image/dictionary attack

Well, but this is already what is done now. This is how private data works in Fabric:
  • You (as the user) have the ability to put on the blockchain hashes of salted data.
  • The data is disseminated in a secure point to point connection between peers that are eligible of receiving the data.


- Yacov




From:        "Ivan Ch" <acizlan@...>
To:        fabric@...
Date:        10/23/2019 05:59 AM
Subject:        [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by:        fabric@...




Hi Yacov,

thanks for your reply, let me clarify the jargon here so more people can understand

pre-image:
 data itself and its salt
"Private data is disseminated in a point to point manner among peers even now.The peers that posses the private data, send the peers that don't (but are eligible of receiving it) the hash pre-images, and the receiving peers validate the hash pre-images indeed correspond to the hashes on the public block.

I don't see any technical obstacle that prevents you to add a salt per collection name for a given transaction, that will be concatenated to the computation of the hash of the key and the value for the said collection.The salt can be part of the data element that is generated at the time of chaincode invocation, and will be passed along with the private data itself."

first of all, I appreciate you agree that another point 2 point connection must be established between orgs to pass the salt and the image itself, anything on chain can be used to launch pre-image/dictionary attack

of course there is no technical obstacle to create salt, but the issue here is that it creates a false sense that data is private and can be validated. let me explain:

you try to argue that the salted hash on the public chain is a proof that some data is "valid". this itself is a terrible argument because hashes (unlike digital signature, homomorphic encryption) is not something that others can verify when the data (hash) it put on public chain.

here is an example:
 my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

my point is that the claims about private data mislead people to believe this feature will either help to orgs to protect data or validate a pre-existing data, but neither is true and can be easily used by an adversary to decode data (if there is no salt or salt is known) or to trick people believe in wrong data like the sample above.














 

 Like
 
 Alexandre Pauwels
Oct 23   

Hey Ivan,
Correct me if I'm wrong, but it seems you are thinking that the private data as implemented is flawed, and that the requirement to salt the data to secure it defeats the purpose of having the blockchain in the middle; again, let me know if this is a bad assumption of your thinking. However, the private-data store (which I'll call the pre-image store) and the chain of hashes (which I'll call the block store) exist for parallel but complementary reasons.
 
The block store cannot exist on its own as it stores no useful data which can be acted upon, this is obvious. It is simply a list of updates to salted hashes.
 
The pre-image store cannot exist on its own as, when you receive new information, you have no idea if the person giving you the information is giving you the same information that everyone else has. The purpose of the chain of hashes is to ensure that the plain-text information you have is the same copy of the plain-text information that everybody else has.
 
The role of ensuring that the data initially placed on the chain is accurate is NOT something that is determined by either data storage methods, it's something that's determined by the logic in your chaincode, e.g. in your example, you would be unable to send an update claiming your national ID is "7654321" in the first place, as the government which wrote the chaincode that you are calling would not allow you to do so. A better example would be to say that you are a bad actor and you would like to fool someone into thinking you are individual with ID "7654321". You would give them your public cert and your claimed ID along with a salt, and they would be unable to verify it as when they query for the national ID by the cert and then hashed it with the salt you gave, the hashes would not match.
 
Hope that makes sense,
Alex
toggle quoted messageShow quoted text

 

 

 Like
 
Oct 24   

Hi Alexandre, Yacov

Thanks for your reply and I appreciate the discussion. my hands are tight now so I will give my full response later today:

Yes, my point is private data design maybe flawed in two ways: one is fixable by adding salt and then use point2point connection to send pre-image data to intended recipient .

However, the second issue is more fundamental and may be difficult to solve. In short, private data design would only work if all participants are honest parties. maybe I should use something that's not always fixed like national ID such as "trade ID" in my earlier example. (I am still trying to avoid real life examples here as it may give bad guys a chance to look). 

cheers

Ivan

 

 Like
 
 David Enyeart
Oct 24   

Your second point is not specific to private data. Agreement on input data needs to be part of the application design, regardless of whether it is a private data scenario or not. For example the smart contract may require that each of the transactors submit their approval of a proposed data change on chain, before a final transaction verifies the approvals are in place and makes the change on chain.


Dave Enyeart

"Ivan Ch" ---10/23/2019 12:10:40 PM---Hi Alexandre, Yacov Thanks for your reply and I appreciate the discussion. my hands are tight now so

From: "Ivan Ch" <acizlan@...>
To: fabric@...
Date: 10/23/2019 12:10 PM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Hi Alexandre, Yacov

Thanks for your reply and I appreciate the discussion. my hands are tight now so I will give my full response later today:

Yes, my point is private data design maybe flawed in two ways: one is fixable by adding salt and then use point2point connection to send pre-image data to intended recipient .

However, the second issue is more fundamental and may be difficult to solve. In short, private data design would only work if all participants are honest parties. maybe I should use something that's not always fixed like national ID such as "trade ID" in my earlier example. (I am still trying to avoid real life examples here as it may give bad guys a chance to look).

cheers

Ivan



 

 Like
 
Oct 24   

Dave, Yacov, and Alex

Seems that the general response to this scenario is “this is an application design problem and should be solved by chaincode”

 
here is an example: my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.

But my argument here is that chaincode design can’t solve this problem, and I can assure you that there is a large number of DLT deployments are at risk because of this.

 

As I stated earlier, hashes cannot be verified by third parties like digital signature or ZKP algorithm.  There is almost no way to guard against adversaries from putting fake data and then trick others to believe the fake data is real.

 

Since chaincode can’t decode hashes so the only thing a chaincode can perform is to limit on number of updates. In most financial use cases (e.g. trade transactions) this is irrelevant since pre-image data are not constants in the first place. Even for constant data such as “national ID” in the aforementioned scenario, chaincode most likely will still allow at least a few updates to cover typos.

 

Leaving it to applications is easier said than done since there are so few ways to get it right and this functionality simply opens door for attackers and yet offers almost nothing.

 

This bug is neither an application design issue nor fabric implementation issue, but a methodology problem that private data feature promotes. My humble recommendation is to depreciate this functionality or at least put warning signs to people still plan to use it

 

 Like
 
 Senthil Nathan
Oct 24   

Hi Ivan,

  

    As far as I know, Blockchain/DLT platform itself does not claim to find fake data. However, one may build an application using blockchain to find fake data. An example from real-world  -- https://www.coindesk.com/new-york-times-confirms-its-using-blockchain-to-combat-fake-news

 

    Detecting fake data is a hard problem to solve. Some overview of ongoing research can be found here -- https://www.dropbox.com/s/pwoqrlfcyhw13pc/CombatingFakeNews.pdf?dl=0

 

Regards,

Senthil

toggle quoted messageShow quoted text

 

 

 Like
 
 David Enyeart
Oct 24   

You are essentially suggesting to add a warning that private data content can't be known by non-members of the collection. That is the whole point of private data and anybody considering an implementation will already know this. The non-members only validate against a hash of the data. The members can later share the private data content with non-members if a need-to-know arises, and the non-member can then validate the pre-image content against the hash on chain, with an understanding that only the group of transactors may have come to agreement on the data. This is the fundamental design of private data. Like any feature, It will be fit for some use cases, and not fit for others. I believe these considerations were already obvious, but hopefully this thread has provided some clarification. I am glad the thread has at least helped to improve the documentation around the importance of including a salt in your private data if it is predictable, to keep it secure.


Dave Enyeart

"Ivan Ch" ---10/24/2019 06:02:26 AM---Dave, Yacov, and Alex Seems that the general response to this scenario is “this is an application de

From: "Ivan Ch" <acizlan@...>
To: fabric@...
Date: 10/24/2019 06:02 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Major security hole in Hyperledger Fabric - Private Data is not private #fabric #fabric-questions #fabric-dstorage #database #dstorage #dstorage-fabric #fabric-chaincode #ssl
Sent by: fabric@...





Dave, Yacov, and Alex

Seems that the general response to this scenario is “this is an application design problem and should be solved by chaincode”
       

here is an example:
 my national ID is "1234567", but I am a bad guy and want others to believe that my national ID number is "7654321". so I put the false hash(salt, "7654321") on chain, and then send pre-images (salt, "7654321")  to whoever I want to convince. Since nobody can verify the hash(salt, "7654321")  when the hash was put on chain without prior knowledge of the data, an adversary can use the claims about private data functionality to trick people to believe forged data.But my argument here is that chaincode design can’t solve this problem, and I can assure you that there is a large number of DLT deployments are at risk because of this.
 
As I stated earlier, hashes cannot be verified by third parties like digital signature or ZKP algorithm.  There is almost no way to guard against adversaries from putting fake data and then trick others to believe the fake data is real.
 
Since chaincode can’t decode hashes so the only thing a chaincode can perform is to limit on number of updates. In most financial use cases (e.g. trade transactions) this is irrelevant since pre-image data are not constants in the first place. Even for constant data such as “national ID” in the aforementioned scenario, chaincode most likely will still allow at least a few updates to cover typos.
 
Leaving it to applications is easier said than done since there are so few ways to get it right and this functionality simply opens door for attackers and yet offers almost nothing.
 
This bug is neither an application design issue nor fabric implementation issue, but a methodology problem that private data feature promotes. My humble recommendation is to depreciate this functionality or at least put warning signs to people still plan to use it




 

 Like
 
12:30pm   

You are essentially suggesting to add a warning that private data content can't be known by non-members of the collection. That is the whole point of private data and anybody considering an implementation will already know this. The non-members only validate against a hash of the data. The members can later share the private data content with non-members if a need-to-know arises, and the non-member can then validate the pre-image content against the hash on chain, with an understanding that only the group of transactors may have come to agreement on the data. This is the fundamental design of private data.
Hi Dave,

that is not true. Private data is only known to the party sending the data hash and no one else (including members). that's where the security flaw comes because an adversary can use the chain hash to trick others to believe that's the data is legit.

unlike the "unsalted hash" issue with private data which is fixable. this is more of a methodology problem and many projects (including ones I am involved with) are required to use it by customers in the application design (because fabric claims this protect data) and it become obvious that there are security gaps almost impossible to overcome, unless all participants are honest (not a good assumption)

since Fabric is by far the most influential DLT platform, it should promote best practices and not tools that can be easily used to create security flaw.


Jay Guo
 

Hi Ivan,

There's a distinction between protecting data from being seen by others and proving the data is legit. Fabric Private Data is designed for the former, and the later is an application design problem (i.e. you need to have multiple parties to endorse original data before putting that on chain, ,while keeping it private from others).

Basically the semantics of your pre-image are not something Fabric could/should care.

- J

On Fri, Oct 25, 2019 at 12:30 PM Ivan Ch <acizlan@...> wrote:
You are essentially suggesting to add a warning that private data content can't be known by non-members of the collection. That is the whole point of private data and anybody considering an implementation will already know this. The non-members only validate against a hash of the data. The members can later share the private data content with non-members if a need-to-know arises, and the non-member can then validate the pre-image content against the hash on chain, with an understanding that only the group of transactors may have come to agreement on the data. This is the fundamental design of private data.
Hi Dave,

that is not true. Private data is only known to the party sending the data hash and no one else (including members). that's where the security flaw comes because an adversary can use the chain hash to trick others to believe that's the data is legit.

this is a methodology problem and many projects (including ones I am involved with) are required to use it by customers in the application design (because fabric claims this protect data) and it become obvious that there are security gaps almost impossible to overcome, unless all participants are honest (not a good assumption)

since Fabric is by far the most influential DLT platform, it should promote best practices and not tools that can be easily used to create security flaw.