Re: [Hyperledger Project TSC] What's a Project? Code + Community


Vipin Bharathan
 

Hi all,
Nothwithstanding Virgil, this can be done by tweaking the proposal as Chris has already done by adding dependency. I would suggest a few changes to this text.  
Other Projects on which this project is dependent if any, must be listed, and  the majority of each such project's maintainers must sign off on the proposal before it is considered by the TSC.  
Simplicity should be the aim, I have not found a better way to express the first phrase
Regards
Vipin  

On Apr 4, 2017 8:56 AM, "Christopher Ferris via hyperledger-tsc" <hyperledger-tsc@lists.hyperledger.org> wrote:

Thanks, Dan. I tend to agree. Further, if there are disputes between projects that cannot be reconciled, they can present their case to the TSC. 

Chris

On Mon, Apr 3, 2017 at 11:32 PM, Middleton, Dan <dan.middleton@...> wrote:

Attempting to distill the thread here…

 

I read Brian and Chris as preferring a flat hierarchy albeit with some open discussion points (tags, etc.).

 

Dan O’Prey affirmed the distinctions they raised between technical oversight and marketing.

 

Hart raises a risk about a small project harming a large project.

That’s a valid risk, but a large project might be more capable of squelching a smaller project. This is, in fact, a risk that I am more concerned about.

 

Despite having initiated some of this hierarchy discussion I find myself inclined to keep a flat project structure. Or at least I don’t think setting infrastructure projects as gate keepers is likely to be beneficial.

 

Thanks,

Dan

 

From: hyperledger-tsc-bounces@lists.hyperledger.org [mailto:hyperledger-tsc-bounces@...] On Behalf Of Hart Montgomery via hyperledger-tsc
Sent: Monday, April 03, 2017 17:08
To: Dan O'Prey <dan@...>; Christopher Ferris <chris.ferris@...>; Brian Behlendorf (bbehlendorf@...rg) <bbehlendorf@...rg>
Cc: hyperledger-tsc@...ger.org


Subject: Re: [Hyperledger Project TSC] What's a Project? Code + Community

 

Hi Everyone,

 

Thanks for all the discussion on this.  I’ll wade back in the discussion with some of my thoughts on what people have proposed so far.

 

The block of text that seems to be causing the most controversy so far seems to be the following:

 

"We probably would then want to require (in the proposal) approval from some subset/majority/<whatever> of the maintainers of the projects that the proposed project depends on, and we could additionally require some periodic communication between projects and their dependencies, which could be done without TSC involvement if the higher-level projects wanted. There can also be a mechanism with which maintainers of a higher-level project can deal with a sub-project that has gotten way out of line."

 

I included this because I worry that, if projects continue to proliferate, eventually someone will create a project that causes the wider community a lot of grief.  I’ll give a hypothetical example.  Suppose I were to create a Fortran SDK for Fabric.  It’s not really needed, and the Fabric team is not too keen on it, but I get it approved as a project anyways because it seems harmless enough and a way to get more people involved in Hyperledger.  Now additionally suppose that I’ve added some cool feature that maybe other SDKs don’t have that seems particularly appealing to a subset of developers.  I’ve also created a really slick website that makes my SDK seem to be the greatest thing in distributed systems since bitcoin.  Because of this my SDK becomes extremely popular.  However, I’ve been really sloppy and rushed in my coding, and there are numerous bugs and security issues that cause users of my Fortran SDK huge amounts of problems.  Developers that are discovering Hyperledger and Fabric for the first time try out my SDK with Fabric and have all kinds of problems, but don’t know enough about the system to pinpoint my code as the culprit.  Instead, it looks like Fabric is having issues, and the core Fabric team is absolutely deluged dealing with problems that are all my fault.

 

Next, of course the core Fabric team tries to reach out to me, but I have my head stuck… somewhere… and tell them it’s their problem, not mine, and to just make Fabric accommodate my sloppy code.  To make me go away or fix my code, they have to elevate this to the TSC, at which point, in order to moderate this dispute, the TSC members who aren’t Fabric experts will have to figure out what is going on for themselves, which will take a lot of time and effort that they probably don’t want to spend.  So, essentially a huge amount of community resources are wasted because of my rogue actions in this hypothetical instance, when it would be best for everyone if the Fabric core devs could put their feet down and force me in line or even stop the project in the first place.

 

I know this isn’t the most realistic example (and some disputes might have to go to the TSC regardless of implementing some sort of project hierarchy), but I do worry that something like this will happen on a smaller scale if we treat all projects equally at this point in time.  It seems to me that, for instance, it should be the Fabric core devs setting the SDK spec (as is happening currently as Chris mentioned in his response) rather than the SDK teams getting to decide (although they could certainly provide suggestions).  The dependency graph would fully indicate this.  Additionally, to continue the example, I think it’d be a good thing if SDK teams checked in with core projects every so often without having to go through the TSC.  This is obviously currently the state of things due to community overlap, and the policy here could be determined by the individual projects, but we may want to consider stating this formally somewhere.

 

To sum up my rambling thus far, I guess I am not convinced that the flat Apache project structure makes sense here.  When smaller dependent projects can potentially ruin a user’s experience with a distributed ledger, and a neophyte user might not be able to figure out who is to blame, I’d like to give the core distributed ledgers a built-in edge in any disputes with SDKs or other dependent projects so that these sorts of problems can be resolved quickly and efficiently.  We could do this with a graph instead of rigid tiers for project flexibility.   But many people around here have been around open source software for far longer than I have, and I might be much too paranoid about something like this happening, so I am open to being persuaded that I’m wrong about this.

 

In another direction, as for tags and a dependency graph:  I really think that these will evolve into almost the same thing.  One of the reasons I thought that the dependency graph would be nice is that it would be a convenient way to show modularity—if a higher-level project worked with multiple ledgers, for instance, like Composer might in the future, then this dependency graph would be a very nice reference for putting “modules” together to form a complete blockchain package.  We could use tags in this way, and I think if we implemented tags now we’d almost immediately want to use tags like [Fabric] or [STL] to indicate which higher-level projects worked with which distributed ledgers.  In the long run, I’d imagine the tag system would eventually just look like edge lists from the dependency graph, with perhaps some extra identifying tags.  So as long as people are willing to use tags for modularity features, then I think the tags/dependency graph debate essentially comes down to graph representation, on which I don’t have a strong opinion.  Whether we use these for governance issues or not is another matter (which I hopefully addressed adequately above).

 

This probably reads like “Arma virumque cano, Troiae qui primus ab oris” to most people making it this far, so sorry for the long email and thanks for reading it.  Feel free to let me know if you think I’ve made a mistake in my reasoning, or if you have questions, comments, or requests for clarifications.

 

Thanks,

Hart

                                                             

From: hyperledger-tsc-bounces@lists.hyperledger.org [mailto:hyperledger-tsc-bounces@...] On Behalf Of Dan O'Prey via hyperledger-tsc
Sent: Monday, April 03, 2017 6:48 AM
To: Christopher Ferris <chris.ferris@...>
Cc: hyperledger-tsc <hyperledger-tsc@...dger.org>
Subject: Re: [Hyperledger Project TSC] What's a Project? Code + Community

 

Quick marketing POV:

 

Agree with Chris that how the projects are organized and governed is a separate concern from how we promote them, as long as there is reasonable linkage.

 

I think Brian's tagging proposal helps solve the marketing issue as we need to ensure people landing on our site have a way to quickly filter different project types to get what they need. It also reinforces the Apache-style umbrella organization. So agree with the concept but also agree it'll be hard to have sufficient granularity in the taxonomy to be effective. If these can be refined over time I'm strongly supportive.



Dan O'Prey

Chief Marketing Officer
c: 
+1 646 468 0213
e: 
dan@... 

Image removed by sender.Digital Asset Holdings, LLC


96 Spring Street, 8th Floor
New York, NY 10012
digitalasset.com

 

On Mon, Apr 3, 2017 at 9:23 AM, Christopher Ferris via hyperledger-tsc <hyperledger-tsc@...dger.org> wrote:

Comments inlined avec <cbf></cbf>

 

On Mon, Apr 3, 2017 at 3:18 AM, Brian Behlendorf via hyperledger-tsc <hyperledger-tsc@...dger.org> wrote:

Chris wrote:

All, I've updated the wiki for the Project Proposal Template to add 'Dependent Projects' heading and description. I think that this covers Hart's proposal, that we all seemed to like. I also think that what we need is to add those dependency relationships to the list of projects, which I have updated as well.

He's referencing https://wiki.hyperledger.org/community/proposal-template-for-a-hyperledger-improvement-project-hip

But Hart's proposal went a fair bit deeper than just this change, and went along with others proposing a tiered project system, with sub-projects reporting into other projects, which is a pretty big governance change that I don't think we've fully wrestled with.  If I understood Hart's proposition right, we would be asking the Fabric maintainers to perform a governance/oversight role the SDK projects, rather than the board, which runs the risks of detachment.

<cbf>Fair enough, but the TSC isn't currently directly involved in projects other than to move projects along the project lifecycle, mediate disputes between projects, and to provide an overall governance framework within which they operate. Quoting from Hart's proposal:

"We probably would then want to require (in the proposal) approval from some subset/majority/<whatever> of the maintainers of the projects that the proposed project depends on, and we could additionally require some periodic communication between projects and their dependencies, which could be done without TSC involvement if the higher-level projects wanted. There can also be a mechanism with which maintainers of a higher-level project can deal with a sub-project that has gotten way out of line."

This is the essence upon which I based my revisions.</cbf> 

I am starting this new thread because I want us to get to a deeper consensus first about what a "project" really is at Hyperledger.  I have a bias coming from Apache, Apache has no such concept as a "sub-project".  Each project has equal standing when it comes to the way they're

<cbf>Right, and it seemed to me that Hart was making a point that even a project that isn't a "sub-project" may have (or place) certain dependencies on another(s) and that we wanted to capture that and provide that such dependencies, by being explicit, would naturally mean that those projects needed to respect those interdependencies and collaborate as appropriate. He even said that on review of a project proposal with dependency, the depended project could place some required constraints, but that that was something between those projects, not a function of the TSC.</cbf>  

organized, supported and managed. There was even a formative experience with two "umbrella projects" within the ASF - Jakarta and XML - where it was deemed that having intermediate PMCs (Project Management Committees, like our maintainers per project) between these smaller teams and the Apache Board (who performs the TSC-like role) resulted in not enough oversight and

<cbf>I made a very cursory pass of the ASF projects and immediately stumbled on the Ant Committee https://projects.apache.org/committee.html?ant which is described as a "Top Level" project, and each of the "sub-projects" are managed by that committee, yet each is listed on the projects page.</cbf> 

guidance. Not every Apache project gets equal marketing or drives equal excitement, simply because some are more popular and prolific than others.  But when they existed, sub-projects felt like second-class citizens of the ecosystem, less recognized or respected and less expected to follow the policies of the community.  [The one place this is different is that Apache's Incubator is set up as a "project", with PMCs for each podling, for bootstrapping.]

The issues we have with a flat hierarchy are two:

* that it could lead to a messy "projects" web presence if we listed tens or hundreds of separate projects, many of which have duplicated names.

* that the bandwidth through the TSC is limited, and its ability to evaluate whether a smaller "sub-"project has merit, as well as properly provide oversight of those projects.

<cbf>This gets to the heart of Hart's proposal - that depended projects review the proposal and come to consensus as to its merit before the proposal is reviewed by the TSC, and this is reflected in my edit.</cbf> 

I agree with both of these concerns.  Let me lay out a way to think about this, and then let's return to those concerns at the end.

I'd like to propose the following definition:

A "project" at Hyperledger is a collection of software, documentation, and software development assets (email archives, bug database slice, chat channels, etc), paired with a specific team of named developer-maintainers working together to constantly improve and maintain that collection.  That maintainers are jointly responsible for every line of code within the project, but their work is open to view and collaboration with anyone.  These maintainers define their own release schedule and roadmap independent of other projects, but are strongly encouraged to look for opportunities to reduce duplication of effort with other projects. Maintainers are responsible for growing their community (including adding new maintainers over time), responding to security notices, making regular releases, updating their project-specific landing pages on the Hyperledger web site and wiki, and reporting monthly on project activity to the TSC.

This definition should help us think about the Fabric SDK situation - are the SDK teams clearly separate projects from the ones they are dependent upon?  If they have independent release streams, independent maintainers (even if lots of overlap), and accept the responsibility to report upstream to the TSC once a month on their own, then it makes sense.  It probably also makes

<cbf>We've currently got three independent Fabric SDK projects: Java, Go, and Python. Each has its own set of maintainers, and each manages its own release schedule. There is coordination between the projects (e.g. there is strong desire to have the Java SDK delivered coincident with the Fabric 1.0 release, and the two are working towards that desired outcome). There is a specification that all of the SDK projects, including the Node.js one that is considered part of Fabric, co-developed (excepting the Go project which didn't exist at the time) and by which each has agreed to conform. This status quo seems like it's working.</cbf> 

sense to keep them separate from each other.  It might be nice for them to do releases that closely sync with Fabric releases, but the freedom not to be required to do that might be important too.  But to deal with the question of granularity, let's get more specific about sizing and add this to the definition:

Scope and size: The declared scope and desired functionality of a project should be aimed at a development effort that requires more than 2 named maintainers for the first few years.  But, if should not be so big as to make it difficult for that project's maintainers to be collectively responsible for every line of code in the project.

<cbf>This starts to get into organizational dynamics such as Docker, K8s and others have, where there are sub-projects within a project for which a specific set of maintainers are responsible. Are these independent projects? No. It is merely a means of organizing maintainers such that the sphere of responsibility is reasonable for a human being to be capable of handling. There may also be aspects of the project that require different expertise, and it may make sense to apportion maintainers along those lines. In the end, this is likely something that larger projects will sort out for themselves as the pressures on the maintainers grow. I don't think that there's a one-size fits all solution to this that we should be imposing.</cbf> 

We need to ensure diverse code coverage - no "oh, that's code only Bob understands, don't touch it", because Bob may move on without warning, or not be available when a security notice comes in regarding his code.  A project with only 1 or 2 maintainers may struggle to avoid that fate, and if it did then the TSC would have to either recruit more people to a project it doesn't understand well, or close the project, both of which are difficult.  If an SDK is really just a tiny shim that is an afternoon's worth of work, it doesn't deserve to be its own project.  This means the Fabric Go SDK community should be large enough to be interesting to more than a few developers, to merit the organizational overhead that being a separate project brings. 

<cbf>The SDKs are far from an afternoon's work. I am sure that Troy and team put lots of effort, and there were four maintainers assigned, initially.</cbf> 

At the other extreme, a project that just combined all the SDKs into a single Fabric SDK project might also flounder, because there would be too much variety within the project for the maintainers to collectively take ownership of every line of code.  Sometimes a project will grow in size and developer community such that spinning out a new independent project makes sense.

<cbf>each SDK is a different language, so requires a different skill set. Certainly there are some who are polyglot, but requiring maintainers be adept and responsible for SDKs written in languages in which they are not adept is asking for trouble.</cbf> 

But because we don't want hundreds of confusingly named projects in a big long list, one way to

<cbf>https://projects.apache.org/projects.html lists ~344 and at least a few share commonality in name. There is structure, but it isn't obvious.</cbf> 

solve that is with a taxonomy.  Right now we use "Frameworks" and "Modules" on the web site, but even that feels insufficient.  So let's add this to the definition:

Taxonomy: Projects are granted one or more of the following tags by the TSC at the time of the project creation: "distributed ledger", "smart contract engine", "client library", "graphical interface", "utility library", "sample applications" or "other".  These tags can be adjusted over time by the TSC.

<cbf>This seems a premature optimization. What does this give us? Is a project that is a "distributed ledger" but also has smart contract capabilities tagged with both? Where would Burrow, STL, Iroha or Fabric fall in this taxonomy?</cbf> 

Let's go with that list for now; we can obviously set it later and adjust it over time.  The specific tags are less important than the concept.

This tagging would then drive a rendering on the main Projects landing page on the web site, on the Wiki, and everywhere else that we present a list of projects.  The Fabric Go SDK and the other SDKs would be "utility libraries", promoted somewhere below most of the other categories, where a longer list of a lot of repeated words ("Fabric Go SDK" "Fabric Java SDK" etc...) is less problematic than up top or on a global alphabetical "Projects" page.  Some projects could map to multiple tags.  We would highlight most prominently, at least for now, those projects that are "distributed ledger" tagged, because they are foundational - you can use a DLT without a smart contract engine, but likely can't do the opposite.  But at some point we would shift into presenting projects based on other data, like developer and release activity.  This might also steer us towards a common architectural view of the different efforts underway at Hyperledger - for example, could "Chaincode" be a separate project, one that doesn't have to be a "sub" to any other as it could be used elsewhere, but one which was incorporated into "Fabric" releases and disk images.  Maybe this gets us further down the Global Sync Log vision that Tamas had shared.

With this, do we then need "dependent projects" and a tiered hierarchy?  Let's return to the concerns:

<cbf>Again, I don't recall us concluding that we were having a tiered hierarchy, but rather a graph.</cbf> 

* Too many projects listed; hopefully with a structured Projects main page, we can manage this complexity more effectively.  I will add that Hyperledger staff recognize the need to significantly upgrade our current landing and marketing pages around each project, and will be coming to you for help with this.

* Bandwidth on the TSC.  I don't think we're going to see dozens of project submissions each week.  Right now the pace of 1 every 2 weeks or so seems likely to continue for awhile, though.  I think this works if the TSC gives projects more of the benefit of the doubt on the technical review, especially if the submission comes with the endorsement of another project (e.g., the Fabric developers endorsing the Fabric Go SDK project) but holding firm on the need to see a real maintainer community, one that will make good on its responsibilities as per the above.

The second place where bandwidth will be an issue is reviewing updates from each project to the TSC, especially if we wanted to adopt Apache's practice of voting to accept the reports during our calls.  This can not be avoided at some level; ultimately the TSC is responsible for everything built at Hyperledger, so rate limiting the project creation step is to be expected.  One way Apache (whose board has to approve approximately 30 project reports on its monthly calls) is for the projects to submit their updates ahead of time, then the TSC members can discuss via email if anything seems amiss, and during the calls only bring up exceptions that require deeper discussion.

This email is already War and Peace so I'll end it here.  The basic proposition is that with a clear definition for what we mean by "project", including a sense of appropriate size/scope and a tagging taxonomy for rendering the list in a structured way, we can avoid the issues cropping up as we see more and more projects under the umbrella.  If you agree, I'll try and work this into a

 

<cbf>I'm not sure that I do, yet. Again, I think there are multiple problems you're addressing here. One is: "how does Hyperledger market and rationalize its collection of projects" and the other is: "how do we (the TSC and the various projects) manage interdependencies". I think that the changes I made to the governance docs addresses the latter. I am unclear that there is anything else that needs to be changed to accommodate Hart's proposal, but I am open to discussion if others disagree.</cbf> 

 

set of wiki pages that reflect this view, and then we'll get started on making changes to the web pages appropriately.

Brian

 

-- 
Brian Behlendorf
Executive Director, Hyperledger
bbehlendorf@...g
Twitter: @brianbehlendorf


_______________________________________________
hyperledger-tsc mailing list
hyperledger-tsc@...ger.org
https://lists.hyperledger.org/mailman/listinfo/hyperledger-tsc

 


_______________________________________________
hyperledger-tsc mailing list
hyperledger-tsc@...ger.org
https://lists.hyperledger.org/mailman/listinfo/hyperledger-tsc

 


This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.digitalasset.com/emaildisclaimer.html. If you are not the intended recipient, please delete this message.



_______________________________________________
hyperledger-tsc mailing list
hyperledger-tsc@lists.hyperledger.org
https://lists.hyperledger.org/mailman/listinfo/hyperledger-tsc

Join toc@lists.hyperledger.org to automatically receive all group messages.