Peers with different heights #fabric #database #consensus


Joao Antunes
 

Hi to all,

Currently, in my setup, I have 2 organizations with 2 peers each. Also have 2 Orderers, one per each organization, and a CA per Organization too.
They have a Kafkas and Zookeepers consensus mechanism.
Running the `peer channel getinfo -c mychannel` command on all peers I receive the following:


Peer 1 org 1 - 

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}

Peer 2 org 1 - 

Blockchain info: {"height":2875,"currentBlockHash":"mz7qXXPLXNNMY5WMbOiuQdMebURa9NZL9FQsOu6Io3w=","previousBlockHash":"kfM/90uFTho48EXzphOX2ZFhIjgFKNzTjKK/z53hrhc="}

Peer 1 org 2 - 

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}

Peer 2 org 2 - 

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}




 

Peer 2 org 1 has a different height. Is there something that we can configure for it to be updated automatically? Is Kafka badly set up? Is something on the peer configs?

Currently running the network on 1.4 version.


David Enyeart
 

Often times issues like this are related to gossip misconfiguration.

Restart the two peers in org2 and then look at the peer logs. You should see some messages like this if everything is working well:

2019-11-06 19:30:35.997 EST [gossip.state] NewGossipStateProvider -> INFO 022 Updating metadata information, current ledger sequence is at = 7, next expected block is = 8
2019-11-06 19:30:38.002 EST [gossip.service] func1 -> INFO 032 Elected as a leader, starting delivery service for channel mychannel
2019-11-06 19:30:38.003 EST [deliveryClient] StartDeliverForChannel -> DEBU 033 This peer will pass blocks from orderer service to other peers for mychannel
2019-11-06 19:30:38.006 EST [deliveryClient] RequestBlocks -> DEBU 037 Starting deliver with block [8] for channel mychannel
2019-11-06 19:30:59.423 EST [gossip.channel] reportMembershipChanges -> INFO 047 Membership view has changed. peers went online: [[ 10.79.1.107:7053]] , current view: [[ 10.79.1.107:7053]]

In the above example, the peer is acting as the org leader and is disseminating blocks to other peers in the org. If the other peer is not in the 'membership view' (e.g. due to gossip misconfiguration or a network partition) then it won't be able to disseminate the blocks. You may see an error in peer logs explaining the reason.

If you are unsure about the gossip configuration, you could also force all peers to retrieve blocks from orderer by using the following config:
CORE_PEER_GOSSIP_USELEADERELECTION = false
CORE_PEER_GOSSIP_ORGLEADER = true

Note, two of the messages above were debug messages, so you'd have to set the logging as follows to see them:
FABRIC_LOGGING_SPEC=info:deliveryClient=debug

These messages should probably be promoted to Info messages so that it is more clear how peers are receiving blocks. I've pushed a change to do just that: https://gerrit.hyperledger.org/r/#/c/fabric/+/34275/.



Dave Enyeart

"Joao Antunes" ---11/06/2019 09:25:40 AM---Hi to all, Currently, in my setup, I have 2 organizations with 2 peers each. Also have 2 Orderers, o

From: "Joao Antunes" <joao.antunes@...>
To: fabric@...
Date: 11/06/2019 09:25 AM
Subject: [EXTERNAL] [Hyperledger Fabric] Peers with different heights #fabric #database #consensus
Sent by: fabric@...





Hi to all,

Currently, in my setup, I have 2 organizations with 2 peers each. Also have 2 Orderers, one per each organization, and a CA per Organization too.
They have a Kafkas and Zookeepers consensus mechanism.
Running the `peer channel getinfo -c mychannel` command on all peers I receive the following:


Peer 1 org 1 -

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}

Peer 2 org 1 -

Blockchain info: {"height":2875,"currentBlockHash":"mz7qXXPLXNNMY5WMbOiuQdMebURa9NZL9FQsOu6Io3w=","previousBlockHash":"kfM/90uFTho48EXzphOX2ZFhIjgFKNzTjKK/z53hrhc="}

Peer 1 org 2 -

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}

Peer 2 org 2 -

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}




Peer 2 org 1 has a different height. Is there something that we can configure for it to be updated automatically? Is Kafka badly set up? Is something on the peer configs?

Currently running the network on 1.4 version.





David Enyeart
 

Meant to say "Restart the two peers in org1"...


Dave Enyeart

"David Enyeart" ---11/06/2019 08:07:14 PM---Often times issues like this are related to gossip misconfiguration. Restart the two peers in org2 a

From: "David Enyeart" <enyeart@...>
To: "Joao Antunes" <joao.antunes@...>
Cc: fabric@...
Date: 11/06/2019 08:07 PM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Peers with different heights #fabric #database #consensus
Sent by: fabric@...





Often times issues like this are related to gossip misconfiguration.

Restart the two peers in org2 and then look at the peer logs. You should see some messages like this if everything is working well:

2019-11-06 19:30:35.997 EST [gossip.state] NewGossipStateProvider -> INFO 022 Updating metadata information, current ledger sequence is at = 7, next expected block is = 8
2019-11-06 19:30:38.002 EST [gossip.service] func1 -> INFO 032 Elected as a leader, starting delivery service for channel mychannel
2019-11-06 19:30:38.003 EST [deliveryClient] StartDeliverForChannel -> DEBU 033 This peer will pass blocks from orderer service to other peers for mychannel
2019-11-06 19:30:38.006 EST [deliveryClient] RequestBlocks -> DEBU 037 Starting deliver with block [8] for channel mychannel
2019-11-06 19:30:59.423 EST [gossip.channel] reportMembershipChanges -> INFO 047 Membership view has changed. peers went online: [[ 10.79.1.107:7053]] , current view: [[ 10.79.1.107:7053]]

In the above example, the peer is acting as the org leader and is disseminating blocks to other peers in the org. If the other peer is not in the 'membership view' (e.g. due to gossip misconfiguration or a network partition) then it won't be able to disseminate the blocks. You may see an error in peer logs explaining the reason.

If you are unsure about the gossip configuration, you could also force all peers to retrieve blocks from orderer by using the following config:
CORE_PEER_GOSSIP_USELEADERELECTION = false
CORE_PEER_GOSSIP_ORGLEADER = true

Note, two of the messages above were debug messages, so you'd have to set the logging as follows to see them:
FABRIC_LOGGING_SPEC=info:deliveryClient=debug


These messages should probably be promoted to Info messages so that it is more clear how peers are receiving blocks. I've pushed a change to do just that: https://gerrit.hyperledger.org/r/#/c/fabric/+/34275/.



Dave Enyeart

"Joao Antunes" ---11/06/2019 09:25:40 AM---Hi to all, Currently, in my setup, I have 2 organizations with 2 peers each. Also have 2 Orderers, o

From:
"Joao Antunes" <joao.antunes@...>
To:
fabric@...
Date:
11/06/2019 09:25 AM
Subject:
[EXTERNAL] [Hyperledger Fabric] Peers with different heights #fabric #database #consensus
Sent by:
fabric@...




Hi to all,

Currently, in my setup, I have 2 organizations with 2 peers each. Also have 2 Orderers, one per each organization, and a CA per Organization too.
They have a Kafkas and Zookeepers consensus mechanism.
Running the `peer channel getinfo -c mychannel` command on all peers I receive the following:


Peer 1 org 1 -

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}

Peer 2 org 1 -

Blockchain info: {"height":2875,"currentBlockHash":"mz7qXXPLXNNMY5WMbOiuQdMebURa9NZL9FQsOu6Io3w=","previousBlockHash":"kfM/90uFTho48EXzphOX2ZFhIjgFKNzTjKK/z53hrhc="}

Peer 1 org 2 -

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}

Peer 2 org 2 -

Blockchain info: {"height":4120,"currentBlockHash":"rmA39fxfCBU5AcGEOq6gErwtBILcucnhcAbnPQ7y2m0=","previousBlockHash":"toGGvdXZZwiCg2ncC7jcWkbUvfmuohEtT45YSUutZLA="}



Peer 2 org 1 has a different height. Is there something that we can configure for it to be updated automatically? Is Kafka badly set up? Is something on the peer configs?

Currently running the network on 1.4 version.







Joao Antunes
 

Hi,

Just making a small update.
I received another answer that suggested to do a peer node reset:

  1. Took a backup of peer docker container
  2. Took a backup of respective couchdb's data
  3. Stopped the chaincode container associated with this peer, if any
  4. Stopped the couchdb container of the peer
  5. Stopped the peer
  6. Since I was starting the peer using docker-compose file, I updated the peer startup command from 'peer node start' to 'peer node reset'. This will reset the peer's channel data to the genesis block.
  7. Next, again update the peer startup command from 'peer node reset' to 'peer node start'. This time since the peer does not have ledger data, it will pull the blocks from the other peers and rebuild its couchdb data.

(Thank you Mrudav Shukla)

Unfortunately, I don't have the 1.4.3 fabric version. So I scraped this solution.
I restart the peer and it's CouchDB. After the startup, the peer started to get the blocks that were missing and getting synchronized. 

At the end of this, all the peers were in sync.


On another test that we did on the same setup, now it's peer1-org1 that is out os sync.

I checked docker-compose.yml and I have no CORE_PEER_GOSSIP_USELEADERELECTION and CORE_PEER_GOSSIP_ORGLEADER variables defined. So it's acting by default. Was is the default behaviour? (And thank you, David Enyeart, for the explanation).
I can see some gossip in logs, but peer1-org1 is still out of sync.

For example:

2019-11-07 12:29:08.498 UTC [gossip.privdata] reconcile -> DEBU 65a92e Reconciliation cycle finished successfully. no items to reconcile

(at this stage I know that is still out of sync).


One question that came up; If one peer is out of sync, and we have an OR policy for endorsement, where we require one member from Org1 or Org2 to endorse the transaction, why is there no block created and sent to orderer?

Thank you all.


Joao Antunes
 

Another note regarding this:

Currently org2 is way more active in terms of gossip. Org1 is really "inactive".

Org1 logs:

2019-11-07 12:51:40.979 UTC [gossip.discovery] periodicalSendAlive -> DEBU 1ae61c Sleeping 5s

2019-11-07 12:51:41.581 UTC [gossip.election] waitForInterrupt -> DEBU 1ae61d 95c0c964c47844451090a1af28e3c7d4b500863398588989e20b33d80f9ac0c3 : Exiting

2019-11-07 12:51:41.581 UTC [gossip.election] IsLeader -> DEBU 1ae61e 95c0c964c47844451090a1af28e3c7d4b500863398588989e20b33d80f9ac0c3 : Returning true

2019-11-07 12:51:41.581 UTC [msp] GetDefaultSigningIdentity -> DEBU 1ae61f Obtaining default signing identity

2019-11-07 12:51:41.581 UTC [msp.identity] Sign -> DEBU 1ae620 Sign: plaintext: 1209666C6F7762696B65731804A20133...0D08F8DCBFD9EDA1A9EA1510B1691801 

2019-11-07 12:51:41.581 UTC [msp.identity] Sign -> DEBU 1ae621 Sign: digest: 86AEA3E8696FBCD622DCAC0DF058D2DED739D83A130EACB331613CDC0603DA22 

2019-11-07 12:51:41.581 UTC [gossip.election] waitForInterrupt -> DEBU 1ae622 95c0c964c47844451090a1af28e3c7d4b500863398588989e20b33d80f9ac0c3 : Entering

2019-11-07 12:51:42.869 UTC [msp] GetDefaultSigningIdentity -> DEBU 1ae623 Obtaining default signing identity

2019-11-07 12:51:42.869 UTC [msp.identity] Sign -> DEBU 1ae624 Sign: plaintext: 18012A340A221A2095C0C964C4784445...120E08DEA9878EEDA1A9EA15108FED01 

2019-11-07 12:51:42.869 UTC [msp.identity] Sign -> DEBU 1ae625 Sign: digest: 1904663DAC194AC78E09C68FD30771E22BAB35EADAFF4FEFE12D84A9872A26A0 

2019-11-07 12:51:42.869 UTC [msp] GetDefaultSigningIdentity -> DEBU 1ae626 Obtaining default signing identity

2019-11-07 12:51:42.869 UTC [msp.identity] Sign -> DEBU 1ae627 Sign: plaintext: 0A0F70656572322D6F7267313A37303531 

2019-11-07 12:51:42.869 UTC [msp.identity] Sign -> DEBU 1ae628 Sign: digest: B555158106298CB4B91D0A101BF545F9F577BABE611C2B0FDBE06D2DA984B1B9 

2019-11-07 12:51:43.759 UTC [gossip.discovery] periodicalReconnectToDead -> DEBU 1ae629 Sleeping 25s

2019-11-07 12:51:45.979 UTC [msp] GetDefaultSigningIdentity -> DEBU 1ae62a Obtaining default signing identity

2019-11-07 12:51:45.979 UTC [msp.identity] Sign -> DEBU 1ae62b Sign: plaintext: 18012A340A221A2095C0C964C4784445...120E08DEA9878EEDA1A9EA151090ED01 

2019-11-07 12:51:45.979 UTC [msp.identity] Sign -> DEBU 1ae62c Sign: digest: BFD957895083A6ED5971A01E47BE41BC04F0BCA940347DEE76A643F118E772B1 

2019-11-07 12:51:45.980 UTC [msp] GetDefaultSigningIdentity -> DEBU 1ae62d Obtaining default signing identity

2019-11-07 12:51:45.980 UTC [msp.identity] Sign -> DEBU 1ae62e Sign: plaintext: 0A0F70656572322D6F7267313A37303531 

 

2019-11-07 12:51:45.980 UTC [msp.identity] Sign -> DEBU 1ae62f Sign: digest: B555158106298CB4B91D0A101BF545F9F577BABE611C2B0FDBE06D2DA984B1B9 


David Enyeart
 

'peer node reset' should only be used if you suspect your peer's data is corrupted - it resets all channels to genesis block so that peer can re-pull/re-process blocks, but wouldn't change block dissemination behavior on your network.

If CORE_PEER_GOSSIP_USELEADERELECTION and CORE_PEER_GOSSIP_ORGLEADER environment variable overrides are not set, then it will fall back to the core.yaml configuration that is baked into the peer image, defaults can be found here:
https://github.com/hyperledger/fabric/blob/release-1.4/sampleconfig/core.yaml#L90-L100

Note that the private data reconciliation message is different... that is a background daemon thread that always runs to check whether there is any missing private data. It does not indicate a problem with block height or that there is actual missing private data, it's just checking, and in your case it found no problems.


Dave Enyeart

"Joao Antunes" ---11/07/2019 07:35:10 AM---Hi, Just making a small update.

From: "Joao Antunes" <joao.antunes@...>
To: fabric@...
Date: 11/07/2019 07:35 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Peers with different heights #fabric #database #consensus
Sent by: fabric@...





Hi,

Just making a small update.
I received another answer that suggested to do a peer node reset:
      1. Took a backup of peer docker container
      2. Took a backup of respective couchdb's data
      3. Stopped the chaincode container associated with this peer, if any
      4. Stopped the couchdb container of the peer
      5. Stopped the peer
      6. Since I was starting the peer using docker-compose file, I updated the peer startup command from 'peer node start' to 'peer node reset'. This will reset the peer's channel data to the genesis block.
      7. Next, again update the peer startup command from 'peer node reset' to 'peer node start'. This time since the peer does not have ledger data, it will pull the blocks from the other peers and rebuild its couchdb data.
(Thank you Mrudav Shukla)

Unfortunately, I don't have the 1.4.3 fabric version. So I scraped this solution.
I restart the peer and it's CouchDB. After the startup, the peer started to get the blocks that were missing and getting synchronized.

At the end of this, all the peers were in sync.


On another test that we did on the same setup, now it's peer1-org1 that is out os sync.

I checked docker-compose.yml and I have no CORE_PEER_GOSSIP_USELEADERELECTION and CORE_PEER_GOSSIP_ORGLEADER variables defined. So it's acting by default. Was is the default behaviour? (And thank you, David Enyeart, for the explanation).
I can see some gossip in logs, but peer1-org1 is still out of sync.

For example:

2019-11-07 12:29:08.498 UTC [gossip.privdata] reconcile -> DEBU 65a92e Reconciliation cycle finished successfully. no items to reconcile

(at this stage I know that is still out of sync).


One question that came up; If one peer is out of sync, and we have an OR policy for endorsement, where we require one member from Org1 or Org2 to endorse the transaction, why is there no block created and sent to orderer?

Thank you all.





Joao Antunes
 

Hi Dave,

You are right and thank you for sending the lines of the default behaviour.


After some investigations, I think I'm missing CORE_PEER_GOSSIP_EXTERNALENDPOINT.

I have CORE_PEER_GOSSIP_BOOTSTRAP configured on both peer2 from org2 and org1 and they both communicate with peer1-org2 and peer1-org1 respectively. But there is no gossip between orgs. Currently checking the effect of both variables in this setup.


David Enyeart
 

In addition to CORE_PEER_GOSSIP_EXTERNALENDPOINT for exposing the peer endpoints to other orgs, make sure you understand how anchor peers are configured to bootstap the cross-org communication:
https://hyperledger-fabric.readthedocs.io/en/release-1.4/gossip.html#anchor-peers
https://hyperledger-fabric.readthedocs.io/en/release-1.4/build_network.html#create-a-channel-configuration-transaction
https://hyperledger-fabric.readthedocs.io/en/release-1.4/build_network.html#update-the-anchor-peers

That being said, cross-org configuration is not strictly needed for block sync... Blocks should be synched correctly from org leader to org followers without the cross-org communication configured. Or like I said, you may want to force each peer to get blocks from orderer:
CORE_PEER_GOSSIP_USELEADERELECTION = false
CORE_PEER_GOSSIP_ORGLEADER = true

The cross-org communication becomes necessary if you are using private data or service discovery. Good luck!


Dave Enyeart

"Joao Antunes" ---11/07/2019 11:57:46 AM---Hi Dave, You are right and thank you for sending the lines of the default behaviour.

From: "Joao Antunes" <joao.antunes@...>
To: fabric@...
Date: 11/07/2019 11:57 AM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Peers with different heights #fabric #database #consensus
Sent by: fabric@...





Hi Dave,

You are right and thank you for sending the lines of the default behaviour.


After some investigations, I think I'm missing CORE_PEER_GOSSIP_EXTERNALENDPOINT.

I have CORE_PEER_GOSSIP_BOOTSTRAP configured on both peer2 from org2 and org1 and they both communicate with peer1-org2 and peer1-org1 respectively. But there is no gossip between orgs. Currently checking the effect of both variables in this setup.