Private data : issues and problems #fabric #fabric-questions #fabric-dstorage


Ivan Ch <acizlan@...>
 

Hi Vipin,

I was on vacation for a few weeks and I am now starting a new topic regarding the private data design since we are no longer talking about the original security issue (unsalted hash).

Hello Ivan,
I have been following this thread for a while.
Thanks for raising some of these issues.
While it is important to question and to challenge the assumptions underlying Hyperledger Fabric, the best way to get attention, answers and influence the design may not be by using language like "Major Security hole...". This raises hackles and creates an atmosphere of defensiveness.
 
First- The issue you raised at first (the salted hash) may be just related to documentation according to all who debated this let us drop that from the list.
So that leaves:
1) hashes on chain cannot be validated by any third party, so they can be used by adversaries to trick honest participants (open)-
The design of private data collections, setup in effect "a covert channel" between the people who exchange that information. I use the term "covert channel" guardedly, before the cryptographers and crypto engineers among us object strenuously to that term. All those who need to know have access to methods to check the hash. Please re-examine this and re-read the private channel documentation. In terms of the veracity of the data (or the claim); this is a problem that has to be solved anyway-in any blockchain; through attestation by the party who put the data on the chain (in other words the issuers of the claim). There are many ways to share these "covert" claims  - Edge architectures with certain proof on the chain and so forth- a la Aries supported by Indy etc.
Chain hash just don't solve any problem. ZKP would be the solution to the problem, hashes are not. Sure, some people would argue that ZKP is slow and premature, I have to disagree since protocols such as bulletproof and many other customized ZKP protocols are fairly efficient. I understand there are plenty of people like to use chain hash because it is easy and comfortable for them. however if we want to to move ahead we have to look for the best technology not what's making people comfortable at the moment.


 
2) private data use gossip to transact data, which would require all participants be connected with any other participant part of a chain. if there are 20 participants in a channel, each participant must open up their firewalls to all other 19 participants of a single channel (open)
 
This may not be as it seems as gossip protocols can transmit information using connections to a limited number of "near peers". Overlay this with the three types of nodes (i.e. endorsing peers, validating peers and orderers- with Anchor peers being special types of peers that can serve as the "gateways" for endorsing and validating peers. As far as the orderers, I am not aware of the exact network that they participate in (i.e. is it gossip driven?). Also this interaction can be over TLS which is a widely used method today to protect communications over the open internet. I believe Fabric has this feature.
the issue is not whether you can use secure protocol such as TLS to securely transmit data, the problem is you have to make pre-arrangements with all peers (open fire wall to each other), which is not possible in practice unless all nodes operate on the same cloud. 

 
You have a point about firewalls, the disposition of the components in a regulated enterprise may need some design modifications to accommodate  firewalls. Since Firewalls, whether  on prem or in the cloud are not monolithic (include multiple layers like the DMZ etc.) currently use reverse proxies (for incoming messages) and Socks compliant protocols for outgoing. In Corda Enterprise, there is a component called the "Float" which functions as a reverse proxy. I was involved in conversations around the design of this component, when I was working in a regulated financial institution. I do not know the status of "the float" since that is available only in Enterprise Corda. There are also multiple architectural patterns written up on the provisioning of the components inside firewalls. We need that thought process in fabric if it does not exist.
this problem actually gets bigger when we have to try to get all participants to do the same. each enterprise seem to have their own little ghost setting behind firewall. this is still doable, but a big husslle.

 
Another feature that is demanded by IT architecture and security teams in Enterprise are the componentization of nodes. By that I mean the breaking up of (say) any endorsing or validating peer into data access and smart contact execution layers with the possibility of scaling and housing in various parts of the enterprise stack.
 
All this points to having community involvement in Architecture best practices for projects and the presence and participation in such exercises so that the Fabric team can take advantage of expertise such as yours that exist in the open source community.
 
We must collaborate, otherwise why be in an open source consortium?
 
Best,
Vipin
I've been trying hard to convince my client to avoid using the private data feature :)  we are able to configure orderers like a shared cluster group so that all org can just make their peer nodes connected to the orderer service running on a cloud to bypass the firewall issue (each org would only need to open their firewall to the central orderer service), and then things got a lot more complicated when the private data feature kicks in. people somehow just assume that a feature is right just because its on fabric documentation


Ivan Ch <acizlan@...>
 

apparently the fabric maintainers has decided to falling deaf on this question. however the truth is I've been contacted privately by some current fabric maintainers who agree with me, and due to whatever reason wouldn't speak out. regardless a problem is a problem, I am reposting the summary of all problems related to private data here:

Security issues
1) hashes put on chain don't have salt added to it, which is vulnerable to dictionary attack (solved)

Methodology issues
1) hashes on chain cannot be validated by any third party, so they can be used by adversaries to trick honest participants (open)
2) private data use gossip to transact data, which would require all participants be connected with any other participant part of a chain. if there are 20 participants in a channel, each participant must open up their firewalls to all other 19 participants of a single channel (open)

Engineering issues:
1) when using k8s and behind load-balancers or proxies, users do not even get a chance to use a shared port (in the aforementioned example, each participant can't even open firewalls to 19 other participants without extensive hacking, and I assumed all participants need to deployed these hacked code to make it work(discussed)

patiently waiting for answers .....
 


Yacov
 

Regarding (2) in the methodology issues, you need to specify alternatives, otherwise it's not clear how you expect parties to interact with each other across the internet without direct communication.

Let's not forget, that if we have 2 parties- A,B,C and parties A and B don't want party C to know they have a business relation, then having A and B engage in a point to point communication is the only option.




From:        "Ivan Ch" <acizlan@...>
To:        fabric@...
Date:        12/11/2019 04:58 AM
Subject:        [EXTERNAL] Re: [Hyperledger Fabric] Private data : issues and problems #fabric #fabric-questions #fabric-dstorage
Sent by:        fabric@...




apparently the fabric maintainers has decided to falling deaf on this question. however the truth is I've been contacted privately by some current fabric maintainers who agree with me, and due to whatever reason wouldn't speak out. regardless a problem is a problem, I am reposting the summary of all problems related to private data here:

Security issues

1) hashes put on chain don't have salt added to it, which is vulnerable to dictionary attack (solved)

Methodology issues

1) hashes on chain cannot be validated by any third party, so they can be used by adversaries to trick honest participants (open)
2) private data use gossip to transact data, which would require all participants be connected with any other participant part of a chain. if there are 20 participants in a channel, each participant must open up their firewalls to all other 19 participants of a single channel (open)

Engineering issues:

1) when using k8s and behind load-balancers or proxies, users do not even get a chance to use a shared port (in the aforementioned example, each participant can't even open firewalls to 19 other participants without extensive hacking, and I assumed all participants need to deployed these hacked code to make it work. (discussed)

patiently waiting for answers .....
 




David Enyeart
 

I thought the open questions have been discussed... let me summarize and also share some new reference material...

(1) Data accuracy and agreement is the domain of the application, for example chaincode applications may require multiple parties to come to agreement on the data (regardless of channel data or private data), in addition to the technical endorsement and validation performed by peers. And private data can be shared and verified against the on-chain hashes as needed. Nobody is being tricked... each member is expected to inspect the chaincode application and understand any data/trust assumptions therein before joining the channel and transacting with the chaincode. As there have been misunderstandings about the various ways that private data can be used in applications (in this thread and others), the documentation has recently been extended to enumerate various usage patterns (some available in v1.4.x, some becoming available in upcoming v2.0):
https://hyperledger-fabric.readthedocs.io/en/latest/private-data/private-data.html#sharing-private-data

(2) You don't need to open lines of communication with every peer, you only need to open lines of communication for the parties that you intend to transact with and share private data, to meet the endorsement policy and private data requirements as specified for the chaincode application. The degree that you rely on peer-to-peer dissemination of private data versus application dissemination of private data to peers of endorsing organizations is entirely up to you. Again, see the sharing pattens mentioned above for details.

I expect there would be general agreement that setting up networks is non-trivial - this is precisely why various vendors have stood up offerings around Fabric.


Dave Enyeart

"Ivan Ch" ---12/10/2019 09:58:04 PM---apparently the fabric maintainers has decided to falling deaf on this question. however the truth is

From: "Ivan Ch" <acizlan@...>
To: fabric@...
Date: 12/10/2019 09:58 PM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Private data : issues and problems #fabric #fabric-questions #fabric-dstorage
Sent by: fabric@...





apparently the fabric maintainers has decided to falling deaf on this question. however the truth is I've been contacted privately by some current fabric maintainers who agree with me, and due to whatever reason wouldn't speak out. regardless a problem is a problem, I am reposting the summary of all problems related to private data here:

Security issues

1) hashes put on chain don't have salt added to it, which is vulnerable to dictionary attack (solved)

Methodology issues

1) hashes on chain cannot be validated by any third party, so they can be used by adversaries to trick honest participants (open)
2) private data use gossip to transact data, which would require all participants be connected with any other participant part of a chain. if there are 20 participants in a channel, each participant must open up their firewalls to all other 19 participants of a single channel (open)

Engineering issues:

1) when using k8s and behind load-balancers or proxies, users do not even get a chance to use a shared port (in the aforementioned example, each participant can't even open firewalls to 19 other participants without extensive hacking, and I assumed all participants need to deployed these hacked code to make it work. (discussed)

patiently waiting for answers .....





Gari Singh <garis@...>
 

I'm honestly a bit tired of your tone. It is unclear to me what you are looking for here. People responded to the salt / hash issue and the documentation has been updated. You offer no alternatives and do not even really specify what you want other than the fact that you believe requiring people to connect to other nodes in a distributed network is a design issue. My answer is simple: do not use blockchain. Move to a centralized database and call it a day.



1) hashes on chain cannot be validated by any third party, so they can be used by adversaries to trick honest participants

How does this trick honest participants? When you deploy chaincode, you specify an endorsement policy. Transactions which dod not meet the endorsement policy will be marked as invalid. You need to take a look at the overall transaction flow in Fabric: https://hyperledger-fabric.readthedocs.io/en/release-1.4/txflow.html.
The validation rules for Fabric include:
- check to make sure the block is valid and that it was obtained from the correct orderer as specified in the channel definition
- check to make sure the submitter was allowed to actually submit the transaction (e.g. is allowed to write to the chaincode)
- check that the transaction meets the endorsement policy for the chaincode that was invoked
- perform MVCC check

In the case of private data, the steps are the same. If you are not a member of the collection, even though you do not have the actual data you still follow all of the rules above to validate the transaction (including the MVCC check). Not sure how someone can "trick" anyone if you have a strong endorsement policy (e.g. majority, etc).

This is no different than how private transactions work in Quorum, Besu or even Corda if you choose not to have the notary execute transactions.




2)Ports

Opening inbound ports - this is required even if you do not use private data. You need to obtain endorsements by contacting peers from different organizations based on the endorsement policy. The port used for endorsement transactions is the same port used for gossip communication. So why is this an issue for private data? This is also not something limited to Fabric ... this is how blockchains work ... Bitcoin, Ethereum, Corda, Besu, etc ... you are not simply going to connect to a single node when using any of those products / protocols.

Opening outbound ports - really the same thing as above. Your applications will need to communicate with peers from multiple organizations with or without private data.

So again I do not see what you are asking for here ... if you need your data to be private between a limited set of members, you cannot broadcast via the orderer. So how exactly do you propose that this data is distributed? Your app still needs to call peers from multiple organizations ... if you choose, you cannot use gossip at all and rely on your application to send the private data to the peers from which it requests endorsement. You'll be guaranteed that the minimus number of peers/orgs required for endorsement will receive the private data but other peers in those orgs or other orgs in the collection may not (which is fine from a processing perspective).

3) Kubernetes - Again, not specific to private data nor to Fabric.

Use non-terminating proxies and you'll be fine. There are several options out there for doing SNI-based routing which allow you to use the same port (Envoy, HA Proxy, etc). Within Kubernetes, you can use the nginx ingress with ssl-passthrough enabled. If you are using OpenShift, you can use Routes with passthrough



-----------------------------------------
Gari Singh
Distinguished Engineer, CTO - IBM Blockchain
IBM Middleware
550 King St
Littleton, MA 01460
Cell: 978-846-7499
garis@...
-----------------------------------------

-----fabric@... wrote: -----
To: "Ivan Ch" <acizlan@...>
From: "David Enyeart"
Sent by: fabric@...
Date: 12/11/2019 03:34AM
Cc: fabric@...
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Private data : issues and problems #fabric #fabric-questions #fabric-dstorage

I thought the open questions have been discussed... let me summarize and also share some new reference material...

(1) Data accuracy and agreement is the domain of the application, for example chaincode applications may require multiple parties to come to agreement on the data (regardless of channel data or private data), in addition to the technical endorsement and validation performed by peers. And private data can be shared and verified against the on-chain hashes as needed. Nobody is being tricked... each member is expected to inspect the chaincode application and understand any data/trust assumptions therein before joining the channel and transacting with the chaincode. As there have been misunderstandings about the various ways that private data can be used in applications (in this thread and others), the documentation has recently been extended to enumerate various usage patterns (some available in v1.4.x, some becoming available in upcoming v2.0):
https://hyperledger-fabric.readthedocs.io/en/latest/private-data/private-data.html#sharing-private-data

(2) You don't need to open lines of communication with every peer, you only need to open lines of communication for the parties that you intend to transact with and share private data, to meet the endorsement policy and private data requirements as specified for the chaincode application. The degree that you rely on peer-to-peer dissemination of private data versus application dissemination of private data to peers of endorsing organizations is entirely up to you. Again, see the sharing pattens mentioned above for details.

I expect there would be general agreement that setting up networks is non-trivial - this is precisely why various vendors have stood up offerings around Fabric.


Dave Enyeart

"Ivan Ch" ---12/10/2019 09:58:04 PM---apparently the fabric maintainers has decided to falling deaf on this question. however the truth is

From: "Ivan Ch" <acizlan@...>
To: fabric@...
Date: 12/10/2019 09:58 PM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Private data : issues and problems #fabric #fabric-questions #fabric-dstorage
Sent by: fabric@...



apparently the fabric maintainers has decided to falling deaf on this question. however the truth is I've been contacted privately by some current fabric maintainers who agree with me, and due to whatever reason wouldn't speak out. regardless a problem is a problem, I am reposting the summary of all problems related to private data here:

Security issues
1) hashes put on chain don't have salt added to it, which is vulnerable to dictionary attack (solved)

Methodology issues
1) hashes on chain cannot be validated by any third party, so they can be used by adversaries to trick honest participants (open)
2) private data use gossip to transact data, which would require all participants be connected with any other participant part of a chain. if there are 20 participants in a channel, each participant must open up their firewalls to all other 19 participants of a single channel (open)

Engineering issues:
1) when using k8s and behind load-balancers or proxies, users do not even get a chance to use a shared port (in the aforementioned example, each participant can't even open firewalls to 19 other participants without extensive hacking, and I assumed all participants need to deployed these hacked code to make it work. (discussed)

patiently waiting for answers .....


Ivan Ch <acizlan@...>
 

I apologize for restarting this old topic, someone sent me a private message so I think the least I can do is to make the problem clear


On Wed, Dec 11, 2019 at 06:51 PM, Gari Singh wrote:
1) hashes on chain cannot be validated by any third party, so they can be used by adversaries to trick honest participants

How does this trick honest participants? When you deploy chaincode, you specify an endorsement policy. Transactions which dod not meet the endorsement policy will be marked as invalid. You need to take a look at the overall transaction flow in Fabric: https://hyperledger-fabric.readthedocs.io/en/release-1.4/txflow.html.
The validation rules for Fabric include:
- check to make sure the block is valid and that it was obtained from the correct orderer as specified in the channel definition
- check to make sure the submitter was allowed to actually submit the transaction (e.g. is allowed to write to the chaincode)
- check that the transaction meets the endorsement policy for the chaincode that was invoked
- perform MVCC check

In the case of private data, the steps are the same. If you are not a member of the collection, even though you do not have the actual data you still follow all of the rules above to validate the transaction (including the MVCC check). Not sure how someone can "trick" anyone if you have a strong endorsement policy (e.g. majority, etc).

This is no different than how private transactions work in Quorum, Besu or even Corda if you choose not to have the notary execute transactions.

The problem is that all the checks you mentioned above are checking something non-verifiable and therefore cannot be used to validate any business logic at all!!! because the actual data is unknown to verifiers/endorsers (who cares about endorsement policy if the endorser doesn't know whatever the heck he is endorsing. block, MVCC, and even transaction sender identity checks got nothing to do with the actual business logic).

of course if the data is in clear text than it would work, because then the endorsers would run the logic inside the chaincode to verify the business logic associated with the transactions being endorsed. Quorum has a host of problems but at least they are exploring crypto/ZKP options like the PoC they did with JPMorgan (to be fair, that was only a PoC, but at least it was a good attempt that fabric's been lacking). 

the bottom line is Private data feature does not solve the data privacy problem (or any problem to be honest) but it is given a name and making people believe it does. this is unfortunate because fabric's endorsement architecture is actually a much better platform than anything ethereum to run ZKP cryptos.



Yacov
 

> because the actual data is unknown to verifiers/endorsers

Well but you can have the endorsement policy be satisfied by a set of members that they are all in the collection policy.
Furthermore you can also have a custom endorsement policy for each collection.



From:        "Ivan Ch" <acizlan@...>
To:        fabric@...
Date:        02/03/2020 01:32 PM
Subject:        [EXTERNAL] Re: [Hyperledger Fabric] Private data : issues and problems #fabric #fabric-questions #fabric-dstorage
Sent by:        fabric@...




I apologize for restarting this old topic, someone sent me a private message so I think the least I can do is to make the problem clear


On Wed, Dec 11, 2019 at 06:51 PM, Gari Singh wrote:
1) hashes on chain cannot be validated by any third party, so they can be used by adversaries to trick honest participants

How does this trick honest participants? When you deploy chaincode, you specify an endorsement policy. Transactions which dod not meet the endorsement policy will be marked as invalid. You need to take a look at the overall transaction flow in Fabric: https://hyperledger-fabric.readthedocs.io/en/release-1.4/txflow.html.
The validation rules for Fabric include:
- check to make sure the block is valid and that it was obtained from the correct orderer as specified in the channel definition
- check to make sure the submitter was allowed to actually submit the transaction (e.g. is allowed to write to the chaincode)
- check that the transaction meets the endorsement policy for the chaincode that was invoked
- perform MVCC check

In the case of private data, the steps are the same. If you are not a member of the collection, even though you do not have the actual data you still follow all of the rules above to validate the transaction (including the MVCC check). Not sure how someone can "trick" anyone if you have a strong endorsement policy (e.g. majority, etc).

This is no different than how private transactions work in Quorum, Besu or even Corda if you choose not to have the notary execute transactions.

The problem is that all the checks you mentioned above are checking something non-verifiable and therefore cannot be used to validate any business logic at all!!! because the actual data is unknown to verifiers/endorsers (who cares about endorsement policy if the endorser doesn't know whatever the heck he is endorsing. block, MVCC, and even transaction sender identity checks got nothing to do with the actual business logic).

of course if the data is in clear text than it would work, because then the endorsers would run the logic inside the chaincode to verify the business logic associated with the transactions being endorsed. Quorum has a host of problems but at least they are exploring crypto/ZKP options like the PoC they did with JPMorgan (to be fair, that was only a PoC, but at least it was a good attempt that fabric's been lacking).

the bottom line is Private data feature does not solve the data privacy problem (or any problem to be honest) but it is given a name and making people believe it does. this is unfortunate because fabric's endorsement architecture is actually a much better platform than anything ethereum to run ZKP cryptos.