Re: Missed data/transactions while stress testing #hyperledger-fabric #fabric-questions #fabric-orderer #fabric #network
When you submit a transaction to ordering, it is not guaranteed to get ordered into a block. If an orderer encounters an issue (as it looks like yours has due to stress), the transaction may not get ordered. In a distributed system the time to commit cannot be reliably predicted, therefore the orderer returns success and then processes the submission asynchronously. Client applications need to listen for transaction events regardless since the transaction may ultimately get invalidated even if it is ordered. Most client applications will listen for transaction events, and then resubmit upon timeout or invalidation.
Hi. My team is running a couple of stress tests over HLF 2.3.0...
My team is running a couple of stress tests over HLF 2.3.0 in a network with 3 orderers and 2 peers (each peer is of different org), sending an array of JSON data through an API developed with Node SDK 2.2 (transactions are of approx. 50KB size).
While running a load of some millions of transactions we've observed that there were a couple of documents missing in the ledger. At the same time the API didn't received any error from these missing transactions.
During the test, we've noticed some WARN messages in orderer logs that might be a clue for this situation but anyways they are not getting returned as errors to the API. So we are not sure if these messages might be related to it or not:
[33m2021-02-01 12:10:06.708 UTC [orderer.common.broadcast] ProcessMessage -> WARN 88af9a2[0m [channel: ch1] Rejecting broadcast of normal message from 18.104.22.168:54444 with SERVICE_UNAVAILABLE: rejected by Order: aborted
[33m2021-01-26 14:16:16.530 UTC [orderer.common.cluster.step] sendMessage -> WARN 33245e[0m Stream 7 to orderer1(orderer1:443) was forcibly terminated because timeout (7s) expired
[33m2021-01-26 14:18:22.123 UTC [orderer.consensus.etcdraft] run -> WARN 19e666[0m WAL sync took 29.466370256 seconds and the network is configured to start elections after 5 seconds. Your disk is too slow and may cause loss of quorum and trigger leadership election. channel=ch1 node=2
So we have a couple of doubts that we would like to get some feedback from the community:
- are there any internal errors (more related to orderers) that might not return to the API and cause missing transactions?
- if this is true, which would be the best way to assure that this data get registered into the ledger? Wouldn't the orderer be "smart enough" that an error ocurred and replay the transaction itself after some time, as the transaction was already proposed and approved by a peer? Would any listener be able to catch failures like this so it enable us to do some replay in the API? If so, could someone provide an example, please?
Thanks in advance.