Re: Proposal : Hyperledger Fabric block archiving


Gari Singh <garis@...>
 

Hi Atsushi -

Thanks for sharing your efforts to date.

Overall, I like the idea of providing a utility to do this as generally we tell people that they can do this but don't provide any tools for doing so.

I do, however, have concerns about integrating any part of this functionality into the actual peer binary itself. I don't actually think you need to do that.
I think running a separate "archive client" without modifying the peer is the way to go. It keeps this functionality clean and separate from the peer and allows it to progress on its own.

It seems the only thing you really wanted to use the peer for was to propagate information to other peers within the same organization. My take here is that you can more easily do something such as having the archive client write its status to a file in the "archiver repository". This way other archiver clients within the same organization can simply periodically poll this file for status. Additionally, you can also use this same file repository to maintain some type of process lock file such that you'll only have one archiver client actively performing the archival.

-- G

-----------------------------------------
Gari Singh
Distinguished Engineer, CTO - IBM Blockchain
IBM Middleware
550 King St
Littleton, MA 01460
Cell: 978-846-7499
garis@...
-----------------------------------------

-----fabric@... wrote: -----
To: "Manish" <manish.sethi@...>
From: "Yacov"
Sent by: fabric@...
Date: 11/25/2019 04:21PM
Cc: nekia <atsushin@...>, "fabric@..." <fabric@...>
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Proposal : Hyperledger Fabric block archiving

Hey Atsushi,

one thing I noticed while skimming the code, is that while you send the ArchivedBlockFileMsg via gossip, you are not ensuring it is eventually propagated to peers successfully.

This means that if a peer didn't get the message, it won't archive your file.

I suggest that you think of a more robust mechanism, like periodically comparing digests of ranges.

The code in https://github.com/hyperledger-labs/fabric-block-archiving/blob/master/gossip/gossip/pull/pullstore.go is a generic pull mechanism based on digests. You might want to give it a look.


- Yacov.



From: "Manish" <manish.sethi@...>
To: nekia <atsushin@...>
Cc: "fabric@..." <fabric@...>
Date: 11/25/2019 10:50 PM
Subject: [EXTERNAL] Re: [Hyperledger Fabric] Proposal : Hyperledger Fabric block archiving
Sent by: fabric@...



Hi Atsushi,

Thanks for your proposal and at high level the objective makes sense to me and below is my high level observations that you may want to consider.

First, the fundamental assumption that you make is that all the block files are same across peers is incorrect. The block files are not guaranteed to contain same number of blocks across peers. This is because a block file is bounded by the file size and not by the number of blocks. Further, the size of a given block may vary slightly on each peer. Though the header and the data section of a blocks are of same size across peers but this difference in overall size could be caused by the metadata section which contains concenter signatures. In addition, on some of the peers, the metadata may also include block commit hash as an additional data. So, either you have to operate at the block numbers (i.e., during purging an archiver client on a peer deals a file that should be purged partially based on where in the file the target block is located) or if you want to deal at the files level the archiver client could just consider files up to previous file.

Second, there are certain kind of queries for which a peer assumes the presence of block files in the current way. This primarily includes history queries, blocks related queries, and txid related queries. These queries may start failing or lead to crashes or unexpected results if you simply delete the files. I did not see any details in your design how you plan to handle these. The potential solutions may range from simply denying these kind of queries to more sophisticated solution such as serving the queries by involving the achiever repository. However, in either of these the challenge would be to know that the desired block/ transaction has been purged from the local peer (e.g., consider blockByHash or transactionByTxid kind of queries.)

Third, somewhat similar to the second point above, peer has a feature wherein it rebuilds the statedb and historydb if they are dropped and peer is simply restarted. For this feature as well it relies on the percense of blockfiles.

Fourth, I am not sure if gossip is the right communication mechanism that you want to employ for this communication. An archiver client perhaps can simply poll (or register for updates with) the archiver repository.

Finally, I would like to understand in more details what are the benefits of having a separate repository? Why not simply let the files be there on the anchor peer and purge from other peers? If the answer is compression, then ideally we should explore a choise of writing the data in blockfiles in compressed format.


Hope this helps.

Thanks,
Manish

On Thu, Nov 14, 2019 at 10:26 PM nekia <atsushin@...> wrote:
Hello everybody,

I’d like to propose a new feature ‘block archiving’ for Hyperledger Fabric. We are working on this block archiving project which is listed under Hyperledger Labs repository. Our current main efforts are focused on improvement of reliability. If we could get some feedback on our proposed feature from members involved in Hyperledger Fabric implementation, it’ll be quite useful for further improvement of UX.

- Hyperledger Fabric Block Archiving
https://github.com/hyperledger-labs/fabric-block-archiving

This enhancement for Hyperledger Fabric is aiming to:

- Reduce the total amount of storage space required for an organisation to operate a Hyperledger Fabric network by archiving block data into repository.
- For organisations, operate a Hyperledger Fabric network with low resourced nodes, such as a IoT edge devices for example.

- Our proposal
https://github.com/hyperledger-labs/hyperledger-labs.github.io/blob/master/labs/fabric-block-archiving.md

- Technical overview
https://github.com/nekia/fabric-block-archiving/blob/techoverview/BlockVault%20-%20Technical%20Overview.pdf


Kind regards,
Atsushi Neki
RocketChat: nekia

Atsushi Neki
Senior Software Development Engineer

Fujitsu Australia Software Technology Pty Ltd
14 Rodborough Road, Frenchs Forest NSW 2086, Australia
T +61 2 9452 9036 M +61 428 223 387
AtsushiN@...
fastware.com.au



Disclaimer
The information in this e-mail is confidential and may contain content that is subject to copyright and/or is commercial-in-confidence and is intended only for the use of the above named addressee. If you are not the intended recipient, you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you have received this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000 or by reply e-mail to the sender and delete the document and all copies thereof.

Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email communication, it is the receiver’s responsibility to scan all communication and any files attached for computer viruses and other defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage (whether direct, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may result directly or indirectly from this communication or any files attached.

If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe@...

Join fabric@lists.hyperledger.org to automatically receive all group messages.