Private data collections recommended usage


omer.glam@...
 

Hello, 
 
We are in the process of developing a solution based on fabric, in our use cases multiple organization on a channel should be able to store private data (only available for their Organization) and also be able to share some of this private data with subset of organizations on the channel in a private manner.
 
We are looking into private data collections in order to fulfill this use case, in our research and design phase some ideas and patterns have emerged:
 
1. Single private collection per organization (implicit), where each organization will store his private data, and share this information by allowing other organizations to Query the collection using chaincode and managing ACL on the ledger to approve or reject reads from the collection.
 
2. Private collection between each pair of organizations in the channel, along side a private (implicit) collection per organization. each organization can save private information into his implicit collection, and when he desire to share this data he can write it into the shared collection between him and another organization, while managing the information about where this information is shared on his private collection. this can occur under the same transaction to preserve atomicity of the operation.
 
Of course each pattern have its advantages and disadvantages which are specific for the solution we are building, notably are the data duplication vs data access patterns, which may have different criteria and acceptance tolerance based on our overall solution.
 
My questions here are as follows:
 
a. from fabric perspective, what is the recommended pattern in terms of best practice ? what will be the overhead for each pattern (duplication of data vs frequent Chaincode queries to a single Organization) ? 
 
b. Does private data collection definition must be approved by each participating Organizations in the channel in order to function ?
 for example, given a channel with 5 organizations, 2 of those organizations wants to setup a private collection between them where both organizations are required for endorsement, does those two organizations require the other 3 organizations on the channel to approve the collections definition, or is it enough that those two organizations approve the collection definition themselves in order to use it? same question for modifying existing collection definition
 
c. Does usage of private data collections cause additional overhead in terms of performance in fabric?
 
 
Thank you,
Omer


David Enyeart
 

There is also a 3rd option which is to use an implicit collection per org, and have an organization share the private data with another organization's implicit private data collection on a need-to-know basis. The receiving organization can verify the private against the on-chain hash of the sending organization. This pattern is one of multiple patterns mentioned here:
https://hyperledger-fabric.readthedocs.io/en/latest/private-data/private-data.html#sharing-private-data

a. I would suggest either pattern 1) or pattern 3). Pattern 2) will be challenging to manage if you have many pairs of organizations. Use pattern 1 for less duplication of data and more fine grained access control. Use pattern 3 if you want clients to query only their own org's peer. Your choice.

b. If using explicit collection definitions, the entire set of collections must be agreed to based on the channel's configurable LifecycleEndorsement policy. You could for example require a majority of organizations to approve an updated chaincode definition with new collection definitions. Or you could require any two orgs to approve, or a single org acting in a channel administrator role. Your choice.

c. There is not yet a performance benchmark for private data. I would expect some performance impact, but I expect it would be less than a 2x hit.

Note that pattern 3 requires duplicating private data in each authorized organization's implicit private data collection. There is another future proposal called 'local collections' that would eliminate this duplication. Essentially a single common hash would be stored on-chain, and a client can send the private data to any org peers they desire. The receiving org peers would verify the private data against the on-chain hash, and store the private data locally. Whoever is interested in this feature can add a 'vote' for it at https://jira.hyperledger.org/browse/FAB-7593.


Dave Enyeart

omer.glam---05/12/2020 10:14:01 AM---Hello, We are in the process of developing a solution based on fabric, in our use cases multiple org

From: omer.glam@...
To: fabric@...
Date: 05/12/2020 10:14 AM
Subject: [EXTERNAL] [Hyperledger Fabric] Private data collections recommended usage
Sent by: fabric@...





Hello,

We are in the process of developing a solution based on fabric, in our use cases multiple organization on a channel should be able to store private data (only available for their Organization) and also be able to share some of this private data with subset of organizations on the channel in a private manner.

We are looking into private data collections in order to fulfill this use case, in our research and design phase some ideas and patterns have emerged:

1. Single private collection per organization (implicit), where each organization will store his private data, and share this information by allowing other organizations to Query the collection using chaincode and managing ACL on the ledger to approve or reject reads from the collection.

2. Private collection between each pair of organizations in the channel, along side a private (implicit) collection per organization. each organization can save private information into his implicit collection, and when he desire to share this data he can write it into the shared collection between him and another organization, while managing the information about where this information is shared on his private collection. this can occur under the same transaction to preserve atomicity of the operation.

Of course each pattern have its advantages and disadvantages which are specific for the solution we are building, notably are the data duplication vs data access patterns, which may have different criteria and acceptance tolerance based on our overall solution.

My questions here are as follows:

a. from fabric perspective, what is the recommended pattern in terms of best practice ? what will be the overhead for each pattern (duplication of data vs frequent Chaincode queries to a single Organization) ?

b. Does private data collection definition must be approved by each participating Organizations in the channel in order to function ?
for example, given a channel with 5 organizations, 2 of those organizations wants to setup a private collection between them where both organizations are required for endorsement, does those two organizations require the other 3 organizations on the channel to approve the collections definition, or is it enough that those two organizations approve the collection definition themselves in order to use it? same question for modifying existing collection definition

c. Does usage of private data collections cause additional overhead in terms of performance in fabric?


Thank you,
Omer




omer.glam@...
 

Hi David, thank you for the prompt response.
 
There is also a 3rd option which is to use an implicit collection per org, and have an organization share the private data with another organization's implicit private data collection on a need-to-know basis. The receiving organization can verify the private against the on-chain hash of the sending organization. This pattern is one of multiple patterns mentioned here:
 
interesting, can we update multiple, cross organizations implicit collections in that same transaction (given all of the collection's organizations are endorsing the transaction ) ? 
the goal is to be able to keep the data up to date across all collections where the data is shared, preferably doing so under a single transaction.
 
I would suggest either pattern 1) or pattern 3). Pattern 2) will be challenging to manage if you have many pairs of organizations. Use pattern 1 for less duplication of data and more fine grained access control. Use pattern 3 if you want clients to query only their own org's peer. Your choice.
 
can you elaborate on the challenges of many collections may impose?
in our perspective the static nature of collections definitions might make operating large amount of collections complex since it require the other organization approval on the new or updated collection definition, but for our use case we can solve this as part of our process of bootstrapping a new organization to the channel (creating the per organization collection between the new organizations and existing organizations).
are there are any other issues we are overlooking here?
 
Thank you,
Omer


David Enyeart
 

Yes, you can update multiple explicit and/or implicit private data collections in a single transaction. Each org-specific implicit private data collection has an endorsement policy of the associated organization. So a client from OrgA could submit an update for OrgA collection and OrgB collection, so long as they get an endorsement from an OrgA peer and OrgB peer. The updates would be applied atomically assuming the transaction is validated. The chaincode can have access control logic that either allows or disallows these cross-org updates, for example the chaincode can check that the client MSPID matches the endorsing peer's MSPID, if you want to restrict such cross-org updates for certain transactions, but allow it for other transactions.

You may be interested in a new sample that we are working on to demonstrate these concepts: https://github.com/hyperledger/fabric-samples/pull/174.
In the sample, private data associated with a transferred asset is deleted from seller's private data collection and added to buyer's private data collection in a single transaction that gets endorsed by both organizations.

In terms of the complexities associated with explicitly defined static private data collections, I was referring to same thing as you - managing an ever-growing number of collections that require approval from other organizations.

While it sounds like either existing approach would work in your scenario, it also sounds like the 'local collection' proposal (if implemented in the future) would simplify things for you, such that the private data hash is stored once in the common chaincode namespace, while the originating client chooses which organizations should receive the actual private data.


Dave Enyeart

omer.glam---05/13/2020 08:28:57 AM---Hi David, thank you for the prompt response. >

From: omer.glam@...
To: fabric@...
Date: 05/13/2020 08:28 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Private data collections recommended usage
Sent by: fabric@...





Hi David, thank you for the prompt response.

interesting, can we update multiple, cross organizations implicit collections in that same transaction (given all of the collection's organizations are endorsing the transaction ) ?
the goal is to be able to keep the data up to date across all collections where the data is shared, preferably doing so under a single transaction.
      I would suggest either pattern 1) or pattern 3). Pattern 2) will be challenging to manage if you have many pairs of organizations. Use pattern 1 for less duplication of data and more fine grained access control. Use pattern 3 if you want clients to query only their own org's peer. Your choice.

can you elaborate on the challenges of many collections may impose?
in our perspective the static nature of collections definitions might make operating large amount of collections complex since it require the other organization approval on the new or updated collection definition, but for our use case we can solve this as part of our process of bootstrapping a new organization to the channel (creating the per organization collection between the new organizations and existing organizations).
are there are any other issues we are overlooking here?

Thank you,
Omer