Re: Inconsistency of read result can be occurred when using StateCouchDB with cache #cache #couchdb


David Enyeart
 

Speaking more generally... JSON spec does not guarantee a certain order or formatting. What you put into a JSON database (whether CouchDB, MongoDB, etc) may not match exactly what you get back. Only the content is guaranteed to be the same. Sometimes you get 'lucky' and it is exactly the same, but it is not guaranteed to be the same. And even if you get lucky at first, certain data or formatting or upgraded components will likely break you in the future. If you are doing reads/writes in chaincode, your chaincode must ensure deterministic processing so that writes match across endorsing peers. This means you need to marshal the JSON in chaincode, and you need to understand the marshaling behavior of your library. There exists 'canonical JSON' libraries in most languages to provide deterministic marshaling (Go json.Marshal() itself is deterministic since it sorts keys).

In a nutshell, if you want to query JSON content, then use JSON data with CouchDB state database. If you want guarantees about your bytes staying the same, then use LevelDB state database or CouchDB with non-JSON data (Fabric will automatically save non-JSON data as a CouchDB binary attachment... although performance will not be as good as LevelDB).


Dave Enyeart

"Senthil Nathan" ---06/08/2020 11:17:06 PM---Hi Jeehoon Lim, Thank you for the nice explanation.

From: "Senthil Nathan" <cendhu@...>
To: Jeehoon Lim <jholim@...>
Cc: fabric@...
Date: 06/08/2020 11:17 PM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Inconsistency of read result can be occurred when using StateCouchDB with cache #couchdb #cache
Sent by: fabric@...





Hi Jeehoon Lim,

Thank you for the nice explanation. 
      And if there's an invocation that includes GetState, it would fail with the error 'Inconsistent Response Payload from Different peers.'

The read-write set constructed by the peer should be consistent across peers
      1. In the read-set, we include only the version not the actual data. 
      2. In the write-set, we store the value passed by the chaincode as-is. 
Only if the chaincode is non-deterministic, the read-write set would differ across peers.

I think, in your scenario, the chaincode is including the read state as-is in the chaincode response (without unmarshaling into the struct).  

In general, the chaincode marshals the struct and passes the bytes to the peer. When the marshaling is done on the struct, the ordering of keys would be the same as the order of fields present in the struct. When the chaincode retrieves the stored value from the peer, it will never be in the same order (irrespective of the usage of cache). 

Hence, it is necessary for the chaincode to unmarshal the received value to the struct before using the value for any other purpose. As we use json.Marshal on the map within the ledger code, it sorts the value by map keys (it is just a side-effect and is not done intentionally). We do this because we need to add a few more internal keys to the doc. This is the major reason for doing marshaling and unmarshaling using a map but not to sort by keys. 

I am not sure whether making the chaincode rely on the low-level peer implementation is a good idea. It will also be an unnecessary constraint on us not to change the low-level implementation details. In your case, I think, you need to unmarshal the received bytes into the struct before using it without having any assumption on the key order

For example, assume that the user submits the bytes of following doc in the invoke argument (without using struct, just []byte(jsonString))
{
"index":{
"fields":["docType","owner"]
},
"ddoc":"indexOwnerDoc",
"name":"indexOwner",
"type":"json"
}

There wouldn't be any in-consistent read-write set. However, when the value is read, it wouldn't be in the same order as passed by the user. If order matters for the receiver, it is recommended to use struct to process the json. 

Having said this, I am okay to explicitly make the peer always return the json values sorted by key (as a consistent behavior is recommended).   

Senthil
Regards,

On Tue, Jun 9, 2020 at 4:42 AM Jeehoon Lim <jholim@...> wrote:
    Hi all.

    I report a bug to Hyperledger Jira. ( https://jira.hyperledger.org/browse/FAB-17954 )

    Please check wheather it could be a real problem or not, and which is better solution for this.

    ==========================================================

    1.Condition

  • Use HLF v2.0.0 ~
  • Use CouchDB as statedb with cache
  • Chaincode use struct as data (The fields are not ordered alphabetically)
    2. Symptom
  • With single peer : It can occurred after calling GetState between 2 invocations of PutState on same key.
    Before 2nd PutState, the keys in the query result would be 
    alphabetically ordered.
    After 2nd PutState, they would be ordered as 
    fields in struct.
    But it's not a big problem.
  • Multiple peers : It can occurred one peer calling GetState between 2 invocations of PutState on same key, another peer does not.
    On this situation, GetState result from 2 peers would be different.
    And if there's an invocation that includes GetState, it would fail with the error 'Inconsistent Response Payload from Different peers.'


    3. Reason
    The cache of statecouchdb loads data when chaincode calls GetState, and update cache when process transaction - only the key is exist in cache.

    When loading, the data is marshalled as alphabetically ordererd.( keyValToCouchDoc func )
    When updating, the data is marshalled in chaincode as order of fields in struct.( In the writeset )


    4. Solutions

    It can be resolved by guiding the way marshalling value in the chaincode.
    But to remove the problem completely, the code should be changed.

    I make 2 solutions for this symptom.
    Solution 1 : Unmarshall and marshall data before update cache
         .Good points - Keep current architecture
         .Bad points - Not so efficient

    Solution 2 : Do not update cache - just remove when updated
         .Good points - Very simple
                                 No need of hold new value in committer
         .Bad points - Cache miss can be occurred one more time

    I've already test both solutions, and they solve the problem.

     





Join fabric@lists.hyperledger.org to automatically receive all group messages.