Topics

RAFT based orderer crash #fabric #raft #fabric-orderer


mariya.k@...
 

Hi 

I successfully deployed orderer and 3 raft node part of network(kubernate based cluster)  and dont see any error.  But below error after I restart just orderer node (orderer + raft0, raft1,raft2 raft3) 

I could delete all nodes and deploy again, but if any reason in production orderer node restarted (kubernate deployment) dont see any way to recover without shutdown and restart

channel=e2e-orderer-syschan node=1 panic: tocommit(6) is out of range [lastIndex(5)]. Was the raft log corrupted, truncated, or lost? goroutine 130 [running]: 

Mariya

------------------------------
Setup:

 Version: 2.0.1

 Commit SHA: 1cfa5da

 Go version: go1.13.4

 OS/Arch: linux/amd64

 

Error:
-------------------------------
020-03-22 07:21:06.745 UTC [orderer.common.server] Main -> INFO 3de Starting orderer: Version: 2.0.1 Commit SHA: 1cfa5da Go version: go1.13.4 OS/Arch: linux/amd64 2020-03-22 07:21:06.746 UTC [orderer.common.server] Main -> INFO 3df Beginning to serve requests 2020-03-22 07:21:06.760 UTC [grpc] HandleSubConnStateChange -> DEBU 3e0 pickfirstBalancer: HandleSubConnStateChange: 0xc0000aaab0, READY 2020-03-22 07:21:06.760 UTC [grpc] HandleSubConnStateChange -> DEBU 3e1 pickfirstBalancer: HandleSubConnStateChange: 0xc0000aad40, READY 2020-03-22 07:21:06.762 UTC [grpc] HandleSubConnStateChange -> DEBU 3e2 pickfirstBalancer: HandleSubConnStateChange: 0xc0003e0f90, READY 2020-03-22 07:21:08.952 UTC [orderer.common.cluster] Step -> DEBU 3e3 Connection from raft0.fabric(10.1.0.205:38250) 2020-03-22 07:21:08.955 UTC [orderer.common.cluster.step] handleMessage -> DEBU 3e4 Received message from raft0.fabric(10.1.0.205:38250): ConsensusRequest for channel e2e-orderer-syschan with payload of size 28 2020-03-22 07:21:08.956 UTC [orderer.consensus.etcdraft] Step -> INFO 3e5 1 [term: 1] received a MsgHeartbeat message with higher term from 2 [term: 2] channel=e2e-orderer-syschan node=1 2020-03-22 07:21:08.956 UTC [orderer.consensus.etcdraft] becomeFollower -> INFO 3e6 1 became follower at term 2 channel=e2e-orderer-syschan node=1 2020-03-22 07:21:08.956 UTC [orderer.consensus.etcdraft] commitTo -> PANI 3e7 tocommit(6) is out of range [lastIndex(5)]. Was the raft log corrupted, truncated, or lost? channel=e2e-orderer-syschan node=1 panic: tocommit(6) is out of range [lastIndex(5)]. Was the raft log corrupted, truncated, or lost? goroutine 44 [running]: github.com/hyperledger/fabric/vendor/go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0000de6e0, 0x0, 0x0, 0x0) /go/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/zapcore/entry.go:229 +0x546 github.com/hyperledger/fabric/vendor/go.uber.org/zap.(*SugaredLogger).log(0xc000012300, 0x4, 0x1085297, 0x5d, 0xc000aa9440, 0x2, 0x2, 0x0, 0x0, 0x0) /go/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/sugar.go:234 +0x100 github.com/hyperledger/fabric/vendor/go.uber.org/zap.(*SugaredLogger).Panicf(...) /go/src/github.com/hyperledger/fabric/vendor/go.uber.org/zap/sugar.go:159 github.com/hyperledger/fabric/common/flogging.(*FabricLogger).Panicf(0xc000012308, 0x1085297, 0x5d, 0xc000aa9440, 0x2, 0x2) /go/src/github.com/hyperledger/fabric/common/flogging/zap.go:74 +0x7c github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.(*raftLog).commitTo(0xc00014b650, 0x6) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/log.go:203 +0x131 github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.(*raft).handleHeartbeat(0xc000a35e00, 0x8, 0x1, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/raft.go:1324 +0x54 github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.stepFollower(0xc000a35e00, 0x8, 0x1, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/raft.go:1269 +0x459 github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.(*raft).Step(0xc000a35e00, 0x8, 0x1, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/raft.go:971 +0x1398 github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.(*node).run(0xc0009aa060, 0xc000a35e00) /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/node.go:357 +0x10d0 created by github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft.StartNode /go/src/github.com/hyperledger/fabric/vendor/go.etcd.io/etcd/raft/node.go:233 +0x407

 


Jean-Gaël Dominé <jgdomine@...>
 

Hi,

We had the same issue in our network.
To solve it, we found a way to persist the following folder in each orderer: /var/hyperledger/production/orderer/etcdraft
Without it restored after a restart, the orderer could not recover.

https://lists.hyperledger.org/g/fabric/topic/raft_orderer_issue/68228360?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,68228360
https://lists.hyperledger.org/g/fabric/topic/32652092

JG