Context deadline exceeded error #fabric-kubernetes #fabric-questions


anton.zaslavskiy@...
 

Hi Community,
 
Whenever we deploy hlf peers in k8s, especially if there are more than 2-3 of them, the "context deadline exceeded" error happens from time to time while calling peer cli commands (like `peer channel join`). Sometimes it works well, so the problem looks flaky.
 
Impact:
  •  To overcome it we have to retry multiple times while joining the channel.
 
  • I suspect that peers constantly go online and offline due to the same reason.
020-11-19 08:14:13.128 UTC [gossip.channel] reportMembershipChanges -> INFO 10406 [[defaultchannel] Membership view has changed. peers went offline:  [[nd-423-261-488.***.com:7051 ]] , current view:  [[nd-113-392-727.***.com:7051 ] [nd-652-480-342.***.com:7051 ] [nd-679-110-235.***.com:7051 ] [nd-603-674-180.***.com:7051 ] [nd-186-074-086.***.com:7051 ] [nd-766-765-673.***.com:7051 ] [nd-589-874-673.***.com:7051 ] [nd-097-937-249.***.com:7051 ] [nd-226-796-011.***.com:7051 ] [nd-840-816-472.***.com:7051 ] [nd-183-749-184.***.com:7051 ]]]
2020-11-19 08:14:21.324 UTC [comm.grpc.server] 1 -> INFO 10407 unary call completed grpc.service=discovery.Discovery grpc.method=Discover grpc.request_deadline=2020-11-19T08:14:31.293Z grpc.peer_address=172.31.45.237:41366 grpc.code=OK grpc.call_duration=2.347721ms
2020-11-19 08:14:21.332 UTC [comm.grpc.server] 1 -> INFO 10408 unary call completed grpc.service=discovery.Discovery grpc.method=Discover grpc.request_deadline=2020-11-19T08:14:31.303Z grpc.peer_address=172.31.45.237:41368 grpc.code=OK grpc.call_duration=2.33063ms
2020-11-19 08:14:21.419 UTC [endorser] callChaincode -> INFO 10409 finished chaincode: _lifecycle duration: 0ms channel= txID=dc42f3ed
2020-11-19 08:14:21.419 UTC [comm.grpc.server] 1 -> INFO 1040a unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=172.31.45.237:41370 grpc.code=OK grpc.call_duration=1.836976ms
2020-11-19 08:14:21.426 UTC [endorser] callChaincode -> INFO 1040b finished chaincode: _lifecycle duration: 2ms channel=defaultchannel txID=8458ebdf
2020-11-19 08:14:21.426 UTC [comm.grpc.server] 1 -> INFO 1040c unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=172.31.45.237:41372 grpc.code=OK grpc.call_duration=2.859235ms
2020-11-19 08:14:21.571 UTC [comm.grpc.server] 1 -> INFO 1040d unary call completed grpc.service=gossip.Gossip grpc.method=Ping grpc.request_deadline=2020-11-19T08:14:23.571Z grpc.peer_address=172.31.45.237:41382 grpc.peer_subject="CN=nd-423-261-488.***.com,L=Singapore,ST=Singapore,C=SG" grpc.code=OK grpc.call_duration=111.041µs
2020-11-19 08:14:21.572 UTC [lifecycle] CheckCommitReadiness -> INFO 1040e Successfully checked commit readiness of chaincode name 'freedomDividendContract' on channel 'defaultchannel' with definition {sequence: 1, endorsement info: (version: '1', plugin: 'escc', init required: false), validation info: (plugin: 'vscc', policy: '12202f4368616e6e656c2f4170706c69636174696f6e2f456e646f7273656d656e74'), collections: (<nil>)}
2020-11-19 08:14:21.572 UTC [endorser] callChaincode -> INFO 1040f finished chaincode: _lifecycle duration: 6ms channel=defaultchannel txID=c045923e
2020-11-19 08:14:21.573 UTC [comm.grpc.server] 1 -> INFO 10410 unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=172.31.45.237:41378 grpc.code=OK grpc.call_duration=7.370915ms
2020-11-19 08:14:21.592 UTC [comm.grpc.server] 1 -> INFO 10411 unary call completed grpc.service=gossip.Gossip grpc.method=Ping grpc.request_deadline=2020-11-19T08:14:23.592Z grpc.peer_address=172.31.45.237:41384 grpc.peer_subject="CN=nd-423-261-488.***.com,L=Singapore,ST=Singapore,C=SG" grpc.code=OK grpc.call_duration=99.86µs
2020-11-19 08:14:23.128 UTC [gossip.channel] reportMembershipChanges -> INFO 10412 [[defaultchannel] Membership view has changed. peers went online: [[nd-423-261-488.***.com:7051 ]] , current view: [[nd-113-392-727.***.com:7051 ] [nd-652-480-342.***.com:7051 ] [nd-679-110-235.***.com:7051 ] [nd-603-674-180.***.com:7051 ] [nd-186-074-086.***.com:7051 ] [nd-766-765-673.***.com:7051 ] [nd-589-874-673.***.com:7051 ] [nd-423-261-488.***.com:7051 ] [nd-226-796-011.***.com:7051 ] [nd-840-816-472.***.com:7051 ] [nd-183-749-184.***.com:7051 ] [nd-097-937-249.***.com:7051 ]]]
2020-11-19 08:14:24.423 UTC [endorser] callChaincode -> INFO 10413 finished chaincode: qscc duration: 1ms channel=defaultchannel txID=75e53418
2020-11-19 08:14:24.423 UTC [comm.grpc.server] 1 -> INFO 10414 unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=172.31.45.237:32906 grpc.peer_subject="CN=fabric-common" grpc.code=OK grpc.call_duration=2.863925ms
2020-11-19 08:14:24.990 UTC [endorser] callChaincode -> INFO 10415 finished chaincode: cscc duration: 0ms channel=defaultchannel txID=937e54a4
2020-11-19 08:14:24.990 UTC [comm.grpc.server] 1 -> INFO 10416 unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=172.31.45.237:32906 grpc.peer_subject="CN=fabric-common" grpc.code=OK grpc.call_duration=1.011878ms
2020-11-19 08:14:24.997 UTC [endorser] callChaincode -> INFO 10417 finished chaincode: qscc duration: 1ms channel=defaultchannel txID=324f411c
2020-11-19 08:14:24.997 UTC [comm.grpc.server] 1 -> INFO 10418 unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=172.31.45.237:32906 grpc.peer_subject="CN=fabric-common" grpc.code=OK grpc.call_duration=2.499983ms
2020-11-19 08:14:25.524 UTC [comm.grpc.server] 1 -> INFO 10419 unary call completed grpc.service=discovery.Discovery grpc.method=Discover grpc.peer_address=172.31.45.237:41388 grpc.peer_subject="CN=fabric-common" grpc.code=OK grpc.call_duration=965.549µs
2020-11-19 08:14:25.617 UTC [endorser] callChaincode -> INFO 1041a finished chaincode: lscc duration: 2ms channel=defaultchannel txID=061c3bb8
2020-11-19 08:14:25.617 UTC [comm.grpc.server] 1 -> INFO 1041b unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=172.31.45.237:32906 grpc.peer_subject="CN=fabric-common" grpc.code=OK grpc.call_duration=3.285879ms
2020-11-19 08:14:25.623 UTC [endorser] callChaincode -> INFO 1041c finished chaincode: _lifecycle duration: 0ms channel=defaultchannel txID=c11b3d5b
2020-11-19 08:14:25.624 UTC [comm.grpc.server] 1 -> INFO 1041d unary call completed grpc.service=protos.Endorser grpc.method=ProcessProposal grpc.peer_address=172.31.45.237:32906 grpc.peer_subject="CN=fabric-common" grpc.code=OK grpc.call_duration=2.003268ms
2020-11-19 08:14:26.338 UTC [comm.grpc.server] 1 -> INFO 1041e unary call completed grpc.service=gossip.Gossip grpc.method=Ping grpc.request_deadline=2020-11-19T08:14:28.338Z grpc.peer_address=172.31.45.237:41390 grpc.peer_subject="CN=nd-097-465-953.***.com,L=Portland,ST=Oregon,C=US" grpc.code=OK grpc.call_duration=129.671µs
2020-11-19 08:14:27.067 UTC [comm.grpc.server] 1 -> INFO 1041f unary call completed grpc.service=gossip.Gossip grpc.method=Ping grpc.request_deadline=2020-11-19T08:14:29.067Z grpc.peer_address=172.31.45.237:41392 grpc.peer_subject="CN=nd-097-465-953.***.com,L=Portland,ST=Oregon,C=US" grpc.code=OK grpc.call_duration=127.27µs
2020-11-19 08:14:28.128 UTC [gossip.channel] reportMembershipChanges -> INFO 10420 [[defaultchannel] Membership view has changed. peers went online: [[nd-097-465-953.***.com:7051 ]] , current view: [[nd-186-074-086.***.com:7051 ] [nd-766-765-673.***.com:7051 ] [nd-589-874-673.***.com:7051 ] [nd-097-465-953.***.com:7051 ] [nd-113-392-727.***.com:7051 ] [nd-652-480-342.***.com:7051 ] [nd-679-110-235.***.com:7051 ] [nd-603-674-180.***.com:7051 ] [nd-183-749-184.***.com:7051 ] [nd-097-937-249.***.com:7051 ] [nd-423-261-488.***.com:7051 ] [nd-226-796-011.***.com:7051 ] [nd-840-816-472.***.com:7051 ]]]
2020-11-19 08:14:31.325 UTC [comm.grpc.server] 1 -> INFO 10421 unary call completed grpc.service=discovery.Discovery grpc.method=Discover grpc.request_deadline=2020-11-19T08:14:41.289Z grpc.peer_address=172.31.45.237:41394 grpc.code=OK grpc.call_duration=1.644114ms
  •  Additionally, sometimes peers just stuck and go offline, but their /healthz endpoint works and responds with 200. Restart of the peer helps though.
2020-11-26 04:40:42.992 UTC [comm.grpc.server] 1 -> INFO 1f04 streaming call completed grpc.service=protos.Deliver grpc.method=DeliverFiltered grpc.peer_address=172.31.37.66:62660 grpc.peer_subject="CN=fabric-common" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=14h50m10.591964512s
2020-11-26 04:40:46.771 UTC [gossip.comm] sendToEndpoint -> WARN 1f05 Failed obtaining connection for 172.31.36.151:7051, PKIid:4c59fb2758a919f5857acb1607d557d110d3c2f9a995500a649d75e75307bdcf reason: context deadline exceeded
2020-11-26 04:40:46.772 UTC [gossip.discovery] expireDeadMembers -> WARN 1f06 Entering [4c59fb2758a919f5857acb1607d557d110d3c2f9a995500a649d75e75307bdcf]
2020-11-26 04:40:46.772 UTC [gossip.discovery] expireDeadMembers -> WARN 1f07 Closing connection to Endpoint: nd-136-026-294.***.com:7051, InternalEndpoint: 172.31.36.151:7051, PKI-ID: 4c59fb2758a919f5857acb1607d557d110d3c2f9a995500a649d75e75307bdcf, Metadata:
2020-11-26 04:40:46.772 UTC [gossip.discovery] expireDeadMembers -> WARN 1f08 Exiting
2020-11-26 04:40:49.775 UTC [gossip.comm] sendToEndpoint -> WARN 1f09 Failed obtaining connection for 172.31.36.151:7051, PKIid:4c59fb2758a919f5857acb1607d557d110d3c2f9a995500a649d75e75307bdcf reason: context deadline exceeded
2020-11-26 04:40:51.299 UTC [gossip.channel] reportMembershipChanges -> INFO 1f0a [[defaultchannel] Membership view has changed. peers went offline: [[nd-136-026-294.***.com:7051 172.31.36.151:7051]] , current view: []]
2020-11-26 04:40:52.777 UTC [gossip.comm] sendToEndpoint -> WARN 1f0b Failed obtaining connection for 172.31.36.151:7051, PKIid:4c59fb2758a919f5857acb1607d557d110d3c2f9a995500a649d75e75307bdcf reason: context deadline exceeded
2020-11-26 04:40:55.779 UTC [gossip.comm] sendToEndpoint -> WARN 1f0c Failed obtaining connection for 172.31.36.151:7051, PKIid:4c59fb2758a919f5857acb1607d557d110d3c2f9a995500a649d75e75307bdcf reason: context deadline exceeded
2020-11-26 04:40:56.073 UTC [gossip.election] beLeader -> INFO 1f0d f6b42b2b2aca8b81935e5bb3e1e8ffcec568bcb64f3b522737cb3bfd8342c846 : Becoming a leader
2020-11-26 04:40:56.074 UTC [gossip.service] func1 -> INFO 1f0e Elected as a leader, starting delivery service for channel defaultchannel
2020-11-26 04:40:56.074 UTC [deliveryClient] StartDeliverForChannel -> INFO 1f0f This peer will retrieve blocks from ordering service and disseminate to other peers in the organization for channel defaultchannel
2020-11-26 04:40:58.781 UTC [gossip.comm] sendToEndpoint -> WARN 1f10 Failed obtaining connection for 172.31.36.151:7051, PKIid:4c59fb2758a919f5857acb1607d557d110d3c2f9a995500a649d75e75307bdcf reason: context deadline exceeded
2020-11-26 04:41:05.424 UTC [core.comm] ServerHandshake -> ERRO 1f11 TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:33424
2020-11-26 04:41:06.426 UTC [core.comm] ServerHandshake -> ERRO 1f12 TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:33426
2020-11-26 04:41:08.233 UTC [core.comm] ServerHandshake -> ERRO 1f13 TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:33456
2020-11-26 04:41:33.427 UTC [core.comm] ServerHandshake -> ERRO 1f14 TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:33564
2020-11-26 04:41:34.430 UTC [core.comm] ServerHandshake -> ERRO 1f15 TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:33570
2020-11-26 04:41:36.121 UTC [core.comm] ServerHandshake -> ERRO 1f16 TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:33574
2020-11-26 04:42:01.430 UTC [core.comm] ServerHandshake -> ERRO 1f17 TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:33688
...
...
2020-11-26 08:51:15.311 UTC [core.comm] ServerHandshake -> ERRO 255b TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:39532
2020-11-26 08:51:16.841 UTC [core.comm] ServerHandshake -> ERRO 255c TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.31.36.151:39538
Some details about the network:
  • network has 1 application channel
  • multiple organizations are added to this channel dynamically
  • each organization has it's own anchor peer + each organization may have several "secondary" peers
  • there is only 1 orderer in the network
  • examples of the configurations can't be found below
Orderer:
ADMIN_MSP_PATH: /var/hyperledger/admin_msp
FABRIC_CA_CLIENT_HOME: /var/hyperledger/fabric-ca-client
FABRIC_LOGGING_SPEC: warning
GODEBUG: netdns=go
ORDERER_CFG_PATH: /var/hyperledger/config
ORDERER_FILELEDGER_LOCATION: /var/hyperledger/ledger
ORDERER_GENERAL_BATCHSIZE_MAXMESSAGECOUNT: "10"
ORDERER_GENERAL_BATCHTIMEOUT: 1s
ORDERER_GENERAL_GENESISFILE: /hl_config/genesis/genesis.block
ORDERER_GENERAL_GENESISMETHOD: file
ORDERER_GENERAL_GENESISPROFILE: initial
ORDERER_GENERAL_LEDGERTYPE: file
ORDERER_GENERAL_LISTENADDRESS: 0.0.0.0
ORDERER_GENERAL_LISTENPORT: "7050"
ORDERER_GENERAL_LOCALMSPDIR: /var/hyperledger/msp
ORDERER_GENERAL_LOCALMSPID: RG-488-419-ORD-MSP
ORDERER_GENERAL_LOGLEVEL: debug
ORDERER_GENERAL_MAXWINDOWSIZE: "1000"
ORDERER_GENERAL_ORDERERTYPE: solo
ORDERER_GENERAL_TLS_CERTIFICATE: /var/hyperledger/tls/server/pair/tls.crt
ORDERER_GENERAL_TLS_CLIENTAUTHREQUIRED: "false"
ORDERER_GENERAL_TLS_CLIENTROOTCAS: "/var/hyperledger/tls/client/cert/*"
ORDERER_GENERAL_TLS_ENABLED: "true"
ORDERER_GENERAL_TLS_PRIVATEKEY: /var/hyperledger/tls/server/pair/tls.key
ORDERER_GENERAL_TLS_ROOTCAS: /var/hyperledger/tls/server/cert/cacert.pem
ORDERER_METRICS_PROVIDER: prometheus
ORDERER_OPERATIONS_LISTENADDRESS: 0.0.0.0:8443
Anchor peer of each org
ADMIN_MSP_PATH: /var/hyperledger/admin_msp
CORE_LEDGER_STATE_STATEDATABASE: goleveldb
CORE_LOGGING_CAUTHDSL: warning
CORE_LOGGING_GOSSIP: info
CORE_LOGGING_GRPC: error
CORE_LOGGING_LEDGER: info
CORE_LOGGING_LEVEL: info
CORE_LOGGING_MSP: warning
CORE_LOGGING_PEER: info
CORE_LOGGING_POLICIES: warning
CORE_METRICS_PROVIDER: prometheus
CORE_OPERATIONS_LISTENADDRESS: 0.0.0.0:9443
CORE_PEER_ADDRESS: nd-980-845-068.***.com:7051
CORE_PEER_ADDRESSAUTODETECT: "true"
CORE_PEER_COMMITTER_ENABLED: "true"
CORE_PEER_DISCOVERY_PERIOD: 60s
CORE_PEER_DISCOVERY_TOUCHPERIOD: 60s
CORE_PEER_GOSSIP_BOOTSTRAP: ""
CORE_PEER_GOSSIP_ENDPOINT: ""
CORE_PEER_GOSSIP_EXTERNALENDPOINT: nd-980-845-068.***.com:7051
CORE_PEER_GOSSIP_ORGLEADER: "false"
CORE_PEER_GOSSIP_USELEADERELECTION: "true"
CORE_PEER_ID: nw-767-370-1-nd-980-845-068
CORE_PEER_LISTENADDRESS: 0.0.0.0:7051
CORE_PEER_LOCALMSPID: RG-488-419-MSP
CORE_PEER_MSPCONFIGPATH: /var/hyperledger/msp
CORE_PEER_NETWORKID: nid1
CORE_PEER_PROFILE_ENABLED: "true"
CORE_PEER_TLS_CERT_FILE: /var/hyperledger/tls/server/pair/tls.crt
CORE_PEER_TLS_CLIENTAUTHREQUIRED: "false"
CORE_PEER_TLS_ENABLED: "true"
CORE_PEER_TLS_KEY_FILE: /var/hyperledger/tls/server/pair/tls.key
CORE_PEER_TLS_ROOTCERT_FILE: /var/hyperledger/tls/server/cert/cacert.pem
CORE_VM_ENDPOINT: tcp://localhost:2375
FABRIC_CFG_PATH: /var/hyperledger/fabric_cfg
GODEBUG: netdns=go
ORD_TLS_PATH: /var/hyperledger/tls/ord/cert
PEER_CFG_PATH: /var/hyperledger/config
Secondary peer:
ADMIN_MSP_PATH: /var/hyperledger/admin_msp
CORE_LEDGER_STATE_STATEDATABASE: goleveldb
CORE_LOGGING_CAUTHDSL: warning
CORE_LOGGING_GOSSIP: info
CORE_LOGGING_GRPC: error
CORE_LOGGING_LEDGER: info
CORE_LOGGING_LEVEL: info
CORE_LOGGING_MSP: warning
CORE_LOGGING_PEER: info
CORE_LOGGING_POLICIES: warning
CORE_METRICS_PROVIDER: prometheus
CORE_OPERATIONS_LISTENADDRESS: 0.0.0.0:9443
CORE_PEER_ADDRESS: nd-136-026-294.***.com:7051
CORE_PEER_ADDRESSAUTODETECT: "true"
CORE_PEER_COMMITTER_ENABLED: "true"
CORE_PEER_DISCOVERY_PERIOD: 60s
CORE_PEER_DISCOVERY_TOUCHPERIOD: 60s
CORE_PEER_GOSSIP_BOOTSTRAP: nd-980-845-068.***.com:7051
CORE_PEER_GOSSIP_ENDPOINT: ""
CORE_PEER_GOSSIP_EXTERNALENDPOINT: nd-136-026-294.***.com:7051
CORE_PEER_GOSSIP_ORGLEADER: "false"
CORE_PEER_GOSSIP_USELEADERELECTION: "true"
CORE_PEER_ID: nw-767-370-1-nd-136-026-294
CORE_PEER_LISTENADDRESS: 0.0.0.0:7051
CORE_PEER_LOCALMSPID: RG-488-419-MSP
CORE_PEER_MSPCONFIGPATH: /var/hyperledger/msp
CORE_PEER_NETWORKID: nid1
CORE_PEER_PROFILE_ENABLED: "true"
CORE_PEER_TLS_CERT_FILE: /var/hyperledger/tls/server/pair/tls.crt
CORE_PEER_TLS_CLIENTAUTHREQUIRED: "false"
CORE_PEER_TLS_ENABLED: "true"
CORE_PEER_TLS_KEY_FILE: /var/hyperledger/tls/server/pair/tls.key
CORE_PEER_TLS_ROOTCERT_FILE: /var/hyperledger/tls/server/cert/cacert.pem
CORE_VM_ENDPOINT: tcp://localhost:2375
FABRIC_CFG_PATH: /var/hyperledger/fabric_cfg
GODEBUG: netdns=go
ORD_TLS_PATH: /var/hyperledger/tls/ord/cert
PEER_CFG_PATH: /var/hyperledger/config
 
P.S.
 
I believe these unanswered topics both relate to the same issue:
 
https://lists.hyperledger.org/g/fabric/message/6343
https://lists.hyperledger.org/g/fabric/message/5726?p=,,,20,0,0,0::relevance,,posterid%3A1369019,20,2,0,30700654