Re: CouchDB Double-Spending Problem


jeroiraz
 

Hi Ivan,

I've refreshed what we did when we extended the Fabric to support the relational model and revised the current source code of the Fabric.

Indeed you're right regarding the QueryResult not being included in the read-part, the digest I though it was being used for that in newer versions but instead was only related to the private data handled by the endorsing peer.

So your statement is indeed true. Even the situation could be difficult to identify if not taking into account before hand due to the fact that whenever a write is made to a key (in official version of the Fabric it's not possible to update several keys at once i.e. update ... where ...), the current version of the key is included in the read-part. So if the logic of the chaincode behave differently according to the query result and writes differs, then you might end up not updating the value of some values during simulation but yet committing the transaction due to the phantom read problem you described.

Regards,
Jero



On Sat, Apr 21, 2018 at 3:08 PM, Ivan Vankov <gatakka@...> wrote:

GetQueryResult interface have this comment:

The query is NOT re-executed during validation phase, phantom reads are
not detected. That is, other committed transactions may have added,
updated, or removed keys that impact the result set, and this would not
be detected at validation/commit time.  Applications susceptible to this
should therefore not use GetQueryResult as part of transactions that update
ledger, and should limit use to read-only chaincode operations."

https://github.com/hyperledger/fabric/blob/849e304dbfd372ee54d437bf176a514a9c9654c7/core/chaincode/shim/interfaces_stable.go#L143

If we use previous example, this means that if GetQueryResult is used to take all marbles owned by Bob, then this result will not be validated during commit. So if someone has changed or updated some of the Bobs marbles, then we have possible double spending, split brain, data divergence ... call it as you wish. If GetState is used instead, then result will be validated during commit time.

This is my understanding, this is what official documentation and code comments are explaining, this is what I have observed in version 1.0 during tests.

If this is changed in 1.1, please, update the documentation and comments, because it will be much more easy for us all to work with rich queries that validate the result set.


On 21.04.2018 20:25, jeronimo irazabal wrote:
Following the discussion

I think the statement that using Couchdb will make MVCC fail is not correct. Because that validation is made in the very same way in despite of the underlying state database being used (leveldb or couchdb). Whenever a chaincode makes use of the KeyValue API, the readwrite-set is prepared in-memory at the endorsing peer level by the transaction simulator. For rich-queries, the digest of the result is included in the read part (that last solution was not present on earlier versions of the Fabric).

When we added SQL capabilities at chaincode level, we had to create a new transaction simulator, use a different read-write set and a new validation procedure because we didn't employed MVCC. But the validation made when using Leveldb or Couchdb is the same.

However, I think Ivan had another motivation, that's to reduce the number of transactions that get invalidated due to MVCC, and his approach is to use different keys as much as possible. While this a required workaround due to MVCC, I'm not able to see why if using the same keys using Couchdb would make any difference with using leveldb instead.

Jerónimo



On Sat, Apr 21, 2018 at 1:44 PM, Kim Letkeman <kletkema@...> wrote:

Boiling the issue down to the MVCC check on key versions, which is how a conflict is detected -- i.e. transactions ran in parallel and read the same key versions, thus invalidating all parallel transactions after the first this is committed (which may or may not be the first that is endorsed).

So my question is: are you saying that the MVCC check can fail when using couchDB?

The reason I ask is that it is not clear that there was a key conflict in your scenario. If the new marble has a unique key that was never read when the first transaction occured, there is no reason to invalidate any of the transactions you mentioned.

Kim


Kim Letkeman
Senior Technical Staff Member, IBM Watson IoT

IoT Blockchain


Phone: +1 (613) 762-0352
E-mail:
kletkema@...


"Ivan Vankov" ---04/21/2018 04:24:29 AM---Author of the article here. The problem with COuchDB is that phantom reads can occurs. Let me give

From: "Ivan Vankov" <gatakka@...>
To: fabric@...
Date: 04/21/2018 04:24 AM
Subject: Re: [Hyperledger Fabric] CouchDB Double-Spending Proble
Sent by: fabric@...



Author of the article here.

The problem with COuchDB is that phantom reads can occurs. Let me give you example, imagine that Bob have 10 marbles and decide to transfer all of them to Alice. The chaincode read all marbles owned by Bob (during the stimulation) and prepare read/write set, then this set is send to orderer for commitment. When peers receive the block they will apply this RW set. It seems OK, but there is time between simulation and commitment, what will happen if some other transaction added new marble to Bob? This new marble will not be transfered to Alice, and no error will be raised. This is when using CouchDB. If you are using LevelDB, then before commit additional check will be made and this change in data will be detected and transaction will be invalidated, and the ledger will not be updated. There are many other "edge" cases when using CouchDB, this is a simple example.

So how to solve this? In some cases this may be desired behavior, or this phantoms reads may not cause any data degradation. All depends on the particular flow and data. But if you must protect yourself from this then you have couple of options:

1. Use LevelDB. You will lose richQueryes, but if you start using composite keys as indexes this can solve most of the limitations.

2. Application layer must guarantee the stability of the set between simulation and commitment time. There many ways to do this, and non of them is perfect. In general, you create a queue in app layer and put transactions in such a way so while Bob transfer marbles to Alice no other transaction add new marbles to Bob.

From my experience I found that this problems can be solved WHEN people stop modeling the data like in relational database. Denormalization of the data in Fabric can help you reorganize it in such a way so no collisions can happen. I know that data normalization is "embedded" in our minds, but data normalization is not effective here.

This is a new technology, all of us are learning and testing now, but god practices start to appear. Do not be afraid to experiment!







Join fabric@lists.hyperledger.org to automatically receive all group messages.