Weird "panic: runtime error" after multiple transactions
Tomás Peixinho
Good afternoon fellow colleagues,
I'm developing an application using Hyperledger Fabric and I've found a weird error that I'm not sure how to fix.
I'm running everything on just one machine, with two orgs, one peer on each, and one orderer. My application receives a file with multiple lines (right now, around 5000), which correspond to one transaction each (technically, two transactions each, because
I have to do a query by partial key, and then a separate transaction for the actual update, as per recommendation of Dave Enyart, from a previous question of mine). So, 5000 lines, around 10000 transactions. They are being handled concurrently using threads.
And the peers are being alternated, so that one doesn't happen to take all the load, while the other does nothing.
My problem now is that it usually ends with one of the peers randomly crashing with an Exit Code 2, and the logs from that peer saying "panic: runtime error: invalid memory address or nil pointer dereference". I say randomly because it can happen after a 1000
lines (2000 transactions) or 4000 lines (8000 transactions), which is what I find weird!
I've read that it could be because of problems with certificates or with misconfigured paths for them, but I feel that if that were the case, it would happen much sooner, or actually right at the start.
My question now is (and I know that I haven't provided much information or any logs, and that it's difficult to find the problem like this) if there is any obvious situation in which this might happen? Or if there are a few cases that might lead to this error?
Something that might point me in the right direction of what I need to be looking at to see if I have any configuration problems. Or could it simply be a problem of my physical machine? Could it be just a memory problem, which crashes the peer if the traffic
gets too heavy?
These errors are really not obvious sometimes, and I was just wondering if someone might have some insight as to what the problem might be, based on previous experiences or something.
Thanks in advance
Tomás
|
||||||||||
|
||||||||||
David Enyeart
You would need to provide the full stack trace from the panic message in order to troubleshoot. Good afternoon fellow colleagues, I'm developing an application...
I'm developing an application using Hyperledger Fabric and I've found a weird error that I'm not sure how to fix. I'm running everything on just one machine, with two orgs, one peer on each, and one orderer. My application receives a file with multiple lines (right now, around 5000), which correspond to one transaction each (technically, two transactions each, because I have to do a query by partial key, and then a separate transaction for the actual update, as per recommendation of Dave Enyart, from a previous question of mine). So, 5000 lines, around 10000 transactions. They are being handled concurrently using threads. And the peers are being alternated, so that one doesn't happen to take all the load, while the other does nothing. My problem now is that it usually ends with one of the peers randomly crashing with an Exit Code 2, and the logs from that peer saying "panic: runtime error: invalid memory address or nil pointer dereference". I say randomly because it can happen after a 1000 lines (2000 transactions) or 4000 lines (8000 transactions), which is what I find weird! I've read that it could be because of problems with certificates or with misconfigured paths for them, but I feel that if that were the case, it would happen much sooner, or actually right at the start. My question now is (and I know that I haven't provided much information or any logs, and that it's difficult to find the problem like this) if there is any obvious situation in which this might happen? Or if there are a few cases that might lead to this error? Something that might point me in the right direction of what I need to be looking at to see if I have any configuration problems. Or could it simply be a problem of my physical machine? Could it be just a memory problem, which crashes the peer if the traffic gets too heavy? These errors are really not obvious sometimes, and I was just wondering if someone might have some insight as to what the problem might be, based on previous experiences or something. Thanks in advance Tomás |
||||||||||
|
||||||||||
Tomás Peixinho
I was reading about similar issues in fabric version 1.4.2 (the images I'm using on my containers are 1.4.1, so I'm not sure if it is related at all or not), and how it was related to a certain timeout when one of the containers stops (from an issue on Jira
from August 2019,
https://jira.hyperledger.org/browse/FAB-16292). Don't know if that's what is happening. Last execution, it stopped after approximately 15 minutes, but I've seen it crash sooner and later.
The full error message is as follows:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1c pc=0xb4e221]
goroutine 1265 [running]:
github.com/hyperledger/fabric/vendor/github.com/syndtr/goleveldb/leveldb.(*dbIter).Release(0xc0021fc510)
/opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/syndtr/goleveldb/leveldb/db_iter.go:343 +0xe1
github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/statedb/stateleveldb.(*kvScanner).Close(0xc003ce1fb0)
/opt/gopath/src/github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/statedb/stateleveldb/stateleveldb.go:265 +0x34
github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/txmgr/lockbasedtxmgr.(*resultsItr).Close(0xc007fefe00)
/opt/gopath/src/github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/txmgr/lockbasedtxmgr/helper.go:422 +0x34
github.com/hyperledger/fabric/core/chaincode.(*TransactionContext).CloseQueryIterators(0xc0050ee090)
/opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/transaction_context.go:113 +0xd7
github.com/hyperledger/fabric/core/chaincode.(*Handler).Notify(0xc000494120, 0xc005926400)
/opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/handler.go:538 +0x1db
github.com/hyperledger/fabric/core/chaincode.(*Handler).handleMessageReadyState(0xc000494120, 0xc005926400, 0x42, 0xc003ac1950)
/opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/handler.go:204 +0x233
github.com/hyperledger/fabric/core/chaincode.(*Handler).handleMessage(0xc000494120, 0xc005926400, 0x3, 0x0)
/opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/handler.go:185 +0x187
github.com/hyperledger/fabric/core/chaincode.(*Handler).ProcessStream(0xc000494120, 0x7f6f0247ba30, 0xc0031cc170, 0x0, 0x0)
/opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/handler.go:435 +0x40e
github.com/hyperledger/fabric/core/chaincode.(*ChaincodeSupport).HandleChaincodeStream(0xc00054c360, 0x7f6f0247ba30, 0xc0031cc170, 0x7f6f0247ba30, 0xc0031cc170)
/opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/chaincode_support.go:191 +0x362
github.com/hyperledger/fabric/core/chaincode.(*ChaincodeSupport).Register(0xc00054c360, 0x13bd860, 0xc0031cc170, 0x139c300, 0xc00054c360)
/opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/chaincode_support.go:196 +0x63
github.com/hyperledger/fabric/protos/peer._ChaincodeSupport_Register_Handler(0x118c540, 0xc00054c360, 0x13bbe80, 0xc0005769a0, 0x1fc5e60, 0xc0032ef230)
/opt/gopath/src/github.com/hyperledger/fabric/protos/peer/chaincode_shim.pb.go:1066 +0xad
github.com/hyperledger/fabric/vendor/google.golang.org/grpc.(*Server).processStreamingRPC(0xc00031c000, 0x13bf540, 0xc002258a80, 0xc0003f3000, 0xc0003487b0, 0x1ea8a40, 0xc0032ef200, 0x0, 0x0)
/opt/gopath/src/github.com/hyperledger/fabric/vendor/google.golang.org/grpc/server.go:1124 +0x8bd
github.com/hyperledger/fabric/vendor/google.golang.org/grpc.(*Server).handleStream(0xc00031c000, 0x13bf540, 0xc002258a80, 0xc0003f3000, 0xc0032ef200)
/opt/gopath/src/github.com/hyperledger/fabric/vendor/google.golang.org/grpc/server.go:1212 +0x12aa
github.com/hyperledger/fabric/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0039e3ba0, 0xc00031c000, 0x13bf540, 0xc002258a80, 0xc0003f3000)
/opt/gopath/src/github.com/hyperledger/fabric/vendor/google.golang.org/grpc/server.go:686 +0x9f
created by github.com/hyperledger/fabric/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
/opt/gopath/src/github.com/hyperledger/fabric/vendor/google.golang.org/grpc/server.go:684 +0xa1De: David Enyeart <enyeart@...>
Enviado: quarta-feira, 10 de fevereiro de 2021 19:31 Para: Tomás Peixinho <tom.peixinho@...> Cc: fabric@... <fabric@...> Assunto: Re: [Hyperledger Fabric] Weird "panic: runtime error" after multiple transactions You would need to provide the full stack trace from the panic message in order to troubleshoot. Good afternoon fellow colleagues, I'm developing an application...
I'm developing an application using Hyperledger Fabric and I've found a weird error that I'm not sure how to fix. I'm running everything on just one machine, with two orgs, one peer on each, and one orderer. My application receives a file with multiple lines (right now, around 5000), which correspond to one transaction each (technically, two transactions each, because I have to do a query by partial key, and then a separate transaction for the actual update, as per recommendation of Dave Enyart, from a previous question of mine). So, 5000 lines, around 10000 transactions. They are being handled concurrently using threads. And the peers are being alternated, so that one doesn't happen to take all the load, while the other does nothing. My problem now is that it usually ends with one of the peers randomly crashing with an Exit Code 2, and the logs from that peer saying "panic: runtime error: invalid memory address or nil pointer dereference". I say randomly because it can happen after a 1000 lines (2000 transactions) or 4000 lines (8000 transactions), which is what I find weird! I've read that it could be because of problems with certificates or with misconfigured paths for them, but I feel that if that were the case, it would happen much sooner, or actually right at the start. My question now is (and I know that I haven't provided much information or any logs, and that it's difficult to find the problem like this) if there is any obvious situation in which this might happen? Or if there are a few cases that might lead to this error? Something that might point me in the right direction of what I need to be looking at to see if I have any configuration problems. Or could it simply be a problem of my physical machine? Could it be just a memory problem, which crashes the peer if the traffic gets too heavy? These errors are really not obvious sometimes, and I was just wondering if someone might have some insight as to what the problem might be, based on previous experiences or something. Thanks in advance Tomás |
||||||||||
|
||||||||||
Manish
Hi Tomás, I had seen something similar in a test flake in the past and the issue was a race condition between the closing of the levelDB and acquiring an iterator. I am not sure if that's app;icable here still, to clear any doubts, can you check your leveldb logs (located at {peer.fileSystemPath in core.yaml}/ledgersData/stateLeveldb/LOG near the time of this panic. Do you see something like this - "db@close"? If not, sharing the log may help further. Thanks, Manish On Wed, Feb 10, 2021 at 2:57 PM Tomás Peixinho <tom.peixinho@...> wrote:
|
||||||||||
|
||||||||||
Tomás Peixinho
I can't seem to find that file. I can't check from inside the peer, because the peer exited. If I try to run "docker-compose run" with the peer id, it says "service not found", which seems expected, and I can't exec to it because it is down. And from the outside,
I can't seem to find any logs in "/var/lib/docker/volumes/network_peer1.org1.example.com/_data", which I think is the right path. That last folder should have a ledgersData subfolder in it but it's completely empty. Am I looking in the wrong place?
As for the logs, the one from the peer, apart from the final stack trace which I have already shared, don't really show me anything, it's just blocks being received and transactions being commited. From the orderer, there's nothing out of the ordinary, it's
just grpc calls from what I assume are the blocks being sent to the peers.
Also, I can't seem to find the core.yaml file. Any idea where it might be?
De: Manish Sethi <manish.sethi@...>
Enviado: quarta-feira, 10 de fevereiro de 2021 21:21 Para: Tomás Peixinho <tom.peixinho@...> Cc: David Enyeart <enyeart@...>; fabric@... <fabric@...> Assunto: Re: [Hyperledger Fabric] Weird "panic: runtime error" after multiple transactions Hi Tomás,
I had seen something similar in a test flake in the past and the issue was a race condition between the closing of the levelDB and acquiring an iterator. I am not sure if that's app;icable here still, to clear any doubts, can you check your leveldb logs
(located at {peer.fileSystemPath in core.yaml}/ledgersData/stateLeveldb/LOG near the time of this panic. Do you see something like this - "db@close"? If not, sharing the log may help further.
Thanks,
Manish
On Wed, Feb 10, 2021 at 2:57 PM Tomás Peixinho <tom.peixinho@...> wrote:
|
||||||||||
|
||||||||||
David Enyeart
For finding the peer files, I find it easier to change the peer's volume in docker-compose-test-net.yaml (if using test network) to a local directory like this (last entry): I can't seem to find that file. I can't check from inside the peer,...
As for the logs, the one from the peer, apart from the final stack trace which I have already shared, don't really show me anything, it's just blocks being received and transactions being commited. From the orderer, there's nothing out of the ordinary, it's just grpc calls from what I assume are the blocks being sent to the peers. Also, I can't seem to find the core.yaml file. Any idea where it might be? De: Manish Sethi <manish.sethi@...> Enviado: quarta-feira, 10 de fevereiro de 2021 21:21 Para: Tomás Peixinho <tom.peixinho@...> Cc: David Enyeart <enyeart@...>; fabric@... <fabric@...> Assunto: Re: [Hyperledger Fabric] Weird "panic: runtime error" after multiple transactions Hi Tomás, I had seen something similar in a test flake in the past and the issue was a race condition between the closing of the levelDB and acquiring an iterator. I am not sure if that's app;icable here still, to clear any doubts, can you check your leveldb logs (located at {peer.fileSystemPath in core.yaml}/ledgersData/stateLeveldb/LOG near the time of this panic. Do you see something like this - "db@close"? If not, sharing the log may help further. Thanks, Manish On Wed, Feb 10, 2021 at 2:57 PM Tomás Peixinho <tom.peixinho@...> wrote:
The full error message is as follows: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x1c pc=0xb4e221] goroutine 1265 [running]: github.com/hyperledger/fabric/vendor/github.com/syndtr/goleveldb/leveldb.(*dbIter).Release(0xc0021fc510) /opt/gopath/src/github.com/hyperledger/fabric/vendor/github.com/syndtr/goleveldb/leveldb/db_iter.go:343 +0xe1 github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/statedb/stateleveldb.(*kvScanner).Close(0xc003ce1fb0) /opt/gopath/src/github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/statedb/stateleveldb/stateleveldb.go:265 +0x34 github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/txmgr/lockbasedtxmgr.(*resultsItr).Close(0xc007fefe00) /opt/gopath/src/github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/txmgr/lockbasedtxmgr/helper.go:422 +0x34 github.com/hyperledger/fabric/core/chaincode.(*TransactionContext).CloseQueryIterators(0xc0050ee090) /opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/transaction_context.go:113 +0xd7 github.com/hyperledger/fabric/core/chaincode.(*Handler).Notify(0xc000494120, 0xc005926400) /opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/handler.go:538 +0x1db github.com/hyperledger/fabric/core/chaincode.(*Handler).handleMessageReadyState(0xc000494120, 0xc005926400, 0x42, 0xc003ac1950) /opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/handler.go:204 +0x233 github.com/hyperledger/fabric/core/chaincode.(*Handler).handleMessage(0xc000494120, 0xc005926400, 0x3, 0x0) /opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/handler.go:185 +0x187 github.com/hyperledger/fabric/core/chaincode.(*Handler).ProcessStream(0xc000494120, 0x7f6f0247ba30, 0xc0031cc170, 0x0, 0x0) /opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/handler.go:435 +0x40e github.com/hyperledger/fabric/core/chaincode.(*ChaincodeSupport).HandleChaincodeStream(0xc00054c360, 0x7f6f0247ba30, 0xc0031cc170, 0x7f6f0247ba30, 0xc0031cc170) /opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/chaincode_support.go:191 +0x362 github.com/hyperledger/fabric/core/chaincode.(*ChaincodeSupport).Register(0xc00054c360, 0x13bd860, 0xc0031cc170, 0x139c300, 0xc00054c360) /opt/gopath/src/github.com/hyperledger/fabric/core/chaincode/chaincode_support.go:196 +0x63 github.com/hyperledger/fabric/protos/peer._ChaincodeSupport_Register_Handler(0x118c540, 0xc00054c360, 0x13bbe80, 0xc0005769a0, 0x1fc5e60, 0xc0032ef230) /opt/gopath/src/github.com/hyperledger/fabric/protos/peer/chaincode_shim.pb.go:1066 +0xad github.com/hyperledger/fabric/vendor/google.golang.org/grpc.(*Server).processStreamingRPC(0xc00031c000, 0x13bf540, 0xc002258a80, 0xc0003f3000, 0xc0003487b0, 0x1ea8a40, 0xc0032ef200, 0x0, 0x0) /opt/gopath/src/github.com/hyperledger/fabric/vendor/google.golang.org/grpc/server.go:1124 +0x8bd github.com/hyperledger/fabric/vendor/google.golang.org/grpc.(*Server).handleStream(0xc00031c000, 0x13bf540, 0xc002258a80, 0xc0003f3000, 0xc0032ef200) /opt/gopath/src/github.com/hyperledger/fabric/vendor/google.golang.org/grpc/server.go:1212 +0x12aa github.com/hyperledger/fabric/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0039e3ba0, 0xc00031c000, 0x13bf540, 0xc002258a80, 0xc0003f3000) /opt/gopath/src/github.com/hyperledger/fabric/vendor/google.golang.org/grpc/server.go:686 +0x9f created by github.com/hyperledger/fabric/vendor/google.golang.org/grpc.(*Server).serveStreams.func1 /opt/gopath/src/github.com/hyperledger/fabric/vendor/google.golang.org/grpc/server.go:684 +0xa1 De: David Enyeart <enyeart@...> Enviado: quarta-feira, 10 de fevereiro de 2021 19:31 Para: Tomás Peixinho <tom.peixinho@...> Cc: fabric@... <fabric@...> Assunto: Re: [Hyperledger Fabric] Weird "panic: runtime error" after multiple transactions You would need to provide the full stack trace from the panic message in order to troubleshoot. Dave Enyeart From: "Tomás Peixinho" <tom.peixinho@...> To: "fabric@..." <fabric@...> Date: 02/10/2021 01:34 PM Subject: [EXTERNAL] [Hyperledger Fabric] Weird "panic: runtime error" after multiple transactions Sent by: fabric@... Good afternoon fellow colleagues, I'm developing an application...
I'm developing an application using Hyperledger Fabric and I've found a weird error that I'm not sure how to fix. I'm running everything on just one machine, with two orgs, one peer on each, and one orderer. My application receives a file with multiple lines (right now, around 5000), which correspond to one transaction each (technically, two transactions each, because I have to do a query by partial key, and then a separate transaction for the actual update, as per recommendation of Dave Enyart, from a previous question of mine). So, 5000 lines, around 10000 transactions. They are being handled concurrently using threads. And the peers are being alternated, so that one doesn't happen to take all the load, while the other does nothing. My problem now is that it usually ends with one of the peers randomly crashing with an Exit Code 2, and the logs from that peer saying "panic: runtime error: invalid memory address or nil pointer dereference". I say randomly because it can happen after a 1000 lines (2000 transactions) or 4000 lines (8000 transactions), which is what I find weird! I've read that it could be because of problems with certificates or with misconfigured paths for them, but I feel that if that were the case, it would happen much sooner, or actually right at the start. My question now is (and I know that I haven't provided much information or any logs, and that it's difficult to find the problem like this) if there is any obvious situation in which this might happen? Or if there are a few cases that might lead to this error? Something that might point me in the right direction of what I need to be looking at to see if I have any configuration problems. Or could it simply be a problem of my physical machine? Could it be just a memory problem, which crashes the peer if the traffic gets too heavy? These errors are really not obvious sometimes, and I was just wondering if someone might have some insight as to what the problem might be, based on previous experiences or something. Thanks in advance Tomás |
||||||||||
|
||||||||||
Tomás Peixinho
After checking the logs (with some help from Dave Enyart), I don't see any "db@close". I see the "db@open" and the hours seem correct.
18:49:21.592582 log@legend F·NumFile S·FileSize N·Entry C·BadEntry B·BadBlock Ke·KeyError D·DroppedEntry L·Level Q·SeqNum T·TimeElapsed
18:49:21.696795 db@open opening
18:49:21.698575 version@stat F·[] S·0B[] Sc·[]
18:49:21.702405 db@janitor F·2 G·0
18:49:21.702453 db@open done T·5.62227ms
De: Manish Sethi <manish.sethi@...>
Enviado: quarta-feira, 10 de fevereiro de 2021 21:21 Para: Tomás Peixinho <tom.peixinho@...> Cc: David Enyeart <enyeart@...>; fabric@... <fabric@...> Assunto: Re: [Hyperledger Fabric] Weird "panic: runtime error" after multiple transactions Hi Tomás,
I had seen something similar in a test flake in the past and the issue was a race condition between the closing of the levelDB and acquiring an iterator. I am not sure if that's app;icable here still, to clear any doubts, can you check your leveldb logs
(located at {peer.fileSystemPath in core.yaml}/ledgersData/stateLeveldb/LOG near the time of this panic. Do you see something like this - "db@close"? If not, sharing the log may help further.
Thanks,
Manish
On Wed, Feb 10, 2021 at 2:57 PM Tomás Peixinho <tom.peixinho@...> wrote:
|
||||||||||
|
||||||||||
Manish
I would have expected more in the leveldb logs after opening, as you get this crash after a while. Anyways, looking at goleveldb code, the other possibility is that in the Fabric code, under some circumstances, we end up invoking this Release function in parallel goroutine. I'll suggest opening a Jira for this and attach the Fabric logs and leveldb logs. Thanks, Manish On Thu, Feb 11, 2021 at 5:09 PM Tomás Peixinho <tom.peixinho@...> wrote:
|
||||||||||
|