Re: [Hyperledger Project TSC] What's a Project? Code + Community


Dan O'Prey <dan@...>
 

Quick marketing POV:

Agree with Chris that how the projects are organized and governed is a separate concern from how we promote them, as long as there is reasonable linkage.

I think Brian's tagging proposal helps solve the marketing issue as we need to ensure people landing on our site have a way to quickly filter different project types to get what they need. It also reinforces the Apache-style umbrella organization. So agree with the concept but also agree it'll be hard to have sufficient granularity in the taxonomy to be effective. If these can be refined over time I'm strongly supportive.


Dan O'Prey

Chief Marketing Officer
c: +1 646 468 0213
e: dan@... 
Digital Asset Holdings, LLC
96 Spring Street, 8th Floor
New York, NY 10012
digitalasset.com

On Mon, Apr 3, 2017 at 9:23 AM, Christopher Ferris via hyperledger-tsc <hyperledger-tsc@...> wrote:
Comments inlined avec <cbf></cbf>

On Mon, Apr 3, 2017 at 3:18 AM, Brian Behlendorf via hyperledger-tsc <hyperledger-tsc@lists.hyperledger.org> wrote:

Chris wrote:

All, I've updated the wiki for the Project Proposal Template to add 'Dependent Projects' heading and description. I think that this covers Hart's proposal, that we all seemed to like. I also think that what we need is to add those dependency relationships to the list of projects, which I have updated as well.

He's referencing https://wiki.hyperledger.org/community/proposal-template-for-a-hyperledger-improvement-project-hip

But Hart's proposal went a fair bit deeper than just this change, and went along with others proposing a tiered project system, with sub-projects reporting into other projects, which is a pretty big governance change that I don't think we've fully wrestled with.  If I understood Hart's proposition right, we would be asking the Fabric maintainers to perform a governance/oversight role the SDK projects, rather than the board, which runs the risks of detachment.

<cbf>Fair enough, but the TSC isn't currently directly involved in projects other than to move projects along the project lifecycle, mediate disputes between projects, and to provide an overall governance framework within which they operate. Quoting from Hart's proposal:
"We probably would then want to require (in the proposal) approval from some subset/majority/<whatever> of the maintainers of the projects that the proposed project depends on, and we could additionally require some periodic communication between projects and their dependencies, which could be done without TSC involvement if the higher-level projects wanted. There can also be a mechanism with which maintainers of a higher-level project can deal with a sub-project that has gotten way out of line."
This is the essence upon which I based my revisions.</cbf> 

I am starting this new thread because I want us to get to a deeper consensus first about what a "project" really is at Hyperledger.  I have a bias coming from Apache, Apache has no such concept as a "sub-project".  Each project has equal standing when it comes to the way they're

<cbf>Right, and it seemed to me that Hart was making a point that even a project that isn't a "sub-project" may have (or place) certain dependencies on another(s) and that we wanted to capture that and provide that such dependencies, by being explicit, would naturally mean that those projects needed to respect those interdependencies and collaborate as appropriate. He even said that on review of a project proposal with dependency, the depended project could place some required constraints, but that that was something between those projects, not a function of the TSC.</cbf>  

organized, supported and managed. There was even a formative experience with two "umbrella projects" within the ASF - Jakarta and XML - where it was deemed that having intermediate PMCs (Project Management Committees, like our maintainers per project) between these smaller teams and the Apache Board (who performs the TSC-like role) resulted in not enough oversight and

<cbf>I made a very cursory pass of the ASF projects and immediately stumbled on the Ant Committee https://projects.apache.org/committee.html?ant which is described as a "Top Level" project, and each of the "sub-projects" are managed by that committee, yet each is listed on the projects page.</cbf> 

guidance. Not every Apache project gets equal marketing or drives equal excitement, simply because some are more popular and prolific than others.  But when they existed, sub-projects felt like second-class citizens of the ecosystem, less recognized or respected and less expected to follow the policies of the community.  [The one place this is different is that Apache's Incubator is set up as a "project", with PMCs for each podling, for bootstrapping.]

The issues we have with a flat hierarchy are two:

* that it could lead to a messy "projects" web presence if we listed tens or hundreds of separate projects, many of which have duplicated names.

* that the bandwidth through the TSC is limited, and its ability to evaluate whether a smaller "sub-"project has merit, as well as properly provide oversight of those projects.

<cbf>This gets to the heart of Hart's proposal - that depended projects review the proposal and come to consensus as to its merit before the proposal is reviewed by the TSC, and this is reflected in my edit.</cbf> 

I agree with both of these concerns.  Let me lay out a way to think about this, and then let's return to those concerns at the end.

I'd like to propose the following definition:

A "project" at Hyperledger is a collection of software, documentation, and software development assets (email archives, bug database slice, chat channels, etc), paired with a specific team of named developer-maintainers working together to constantly improve and maintain that collection.  That maintainers are jointly responsible for every line of code within the project, but their work is open to view and collaboration with anyone.  These maintainers define their own release schedule and roadmap independent of other projects, but are strongly encouraged to look for opportunities to reduce duplication of effort with other projects. Maintainers are responsible for growing their community (including adding new maintainers over time), responding to security notices, making regular releases, updating their project-specific landing pages on the Hyperledger web site and wiki, and reporting monthly on project activity to the TSC.

This definition should help us think about the Fabric SDK situation - are the SDK teams clearly separate projects from the ones they are dependent upon?  If they have independent release streams, independent maintainers (even if lots of overlap), and accept the responsibility to report upstream to the TSC once a month on their own, then it makes sense.  It probably also makes

<cbf>We've currently got three independent Fabric SDK projects: Java, Go, and Python. Each has its own set of maintainers, and each manages its own release schedule. There is coordination between the projects (e.g. there is strong desire to have the Java SDK delivered coincident with the Fabric 1.0 release, and the two are working towards that desired outcome). There is a specification that all of the SDK projects, including the Node.js one that is considered part of Fabric, co-developed (excepting the Go project which didn't exist at the time) and by which each has agreed to conform. This status quo seems like it's working.</cbf> 

sense to keep them separate from each other.  It might be nice for them to do releases that closely sync with Fabric releases, but the freedom not to be required to do that might be important too.  But to deal with the question of granularity, let's get more specific about sizing and add this to the definition:

Scope and size: The declared scope and desired functionality of a project should be aimed at a development effort that requires more than 2 named maintainers for the first few years.  But, if should not be so big as to make it difficult for that project's maintainers to be collectively responsible for every line of code in the project.

<cbf>This starts to get into organizational dynamics such as Docker, K8s and others have, where there are sub-projects within a project for which a specific set of maintainers are responsible. Are these independent projects? No. It is merely a means of organizing maintainers such that the sphere of responsibility is reasonable for a human being to be capable of handling. There may also be aspects of the project that require different expertise, and it may make sense to apportion maintainers along those lines. In the end, this is likely something that larger projects will sort out for themselves as the pressures on the maintainers grow. I don't think that there's a one-size fits all solution to this that we should be imposing.</cbf> 

We need to ensure diverse code coverage - no "oh, that's code only Bob understands, don't touch it", because Bob may move on without warning, or not be available when a security notice comes in regarding his code.  A project with only 1 or 2 maintainers may struggle to avoid that fate, and if it did then the TSC would have to either recruit more people to a project it doesn't understand well, or close the project, both of which are difficult.  If an SDK is really just a tiny shim that is an afternoon's worth of work, it doesn't deserve to be its own project.  This means the Fabric Go SDK community should be large enough to be interesting to more than a few developers, to merit the organizational overhead that being a separate project brings. 

<cbf>The SDKs are far from an afternoon's work. I am sure that Troy and team put lots of effort, and there were four maintainers assigned, initially.</cbf> 

At the other extreme, a project that just combined all the SDKs into a single Fabric SDK project might also flounder, because there would be too much variety within the project for the maintainers to collectively take ownership of every line of code.  Sometimes a project will grow in size and developer community such that spinning out a new independent project makes sense.

<cbf>each SDK is a different language, so requires a different skill set. Certainly there are some who are polyglot, but requiring maintainers be adept and responsible for SDKs written in languages in which they are not adept is asking for trouble.</cbf> 

But because we don't want hundreds of confusingly named projects in a big long list, one way to

<cbf>https://projects.apache.org/projects.html lists ~344 and at least a few share commonality in name. There is structure, but it isn't obvious.</cbf> 

solve that is with a taxonomy.  Right now we use "Frameworks" and "Modules" on the web site, but even that feels insufficient.  So let's add this to the definition:

Taxonomy: Projects are granted one or more of the following tags by the TSC at the time of the project creation: "distributed ledger", "smart contract engine", "client library", "graphical interface", "utility library", "sample applications" or "other".  These tags can be adjusted over time by the TSC.

<cbf>This seems a premature optimization. What does this give us? Is a project that is a "distributed ledger" but also has smart contract capabilities tagged with both? Where would Burrow, STL, Iroha or Fabric fall in this taxonomy?</cbf> 

Let's go with that list for now; we can obviously set it later and adjust it over time.  The specific tags are less important than the concept.

This tagging would then drive a rendering on the main Projects landing page on the web site, on the Wiki, and everywhere else that we present a list of projects.  The Fabric Go SDK and the other SDKs would be "utility libraries", promoted somewhere below most of the other categories, where a longer list of a lot of repeated words ("Fabric Go SDK" "Fabric Java SDK" etc...) is less problematic than up top or on a global alphabetical "Projects" page.  Some projects could map to multiple tags.  We would highlight most prominently, at least for now, those projects that are "distributed ledger" tagged, because they are foundational - you can use a DLT without a smart contract engine, but likely can't do the opposite.  But at some point we would shift into presenting projects based on other data, like developer and release activity.  This might also steer us towards a common architectural view of the different efforts underway at Hyperledger - for example, could "Chaincode" be a separate project, one that doesn't have to be a "sub" to any other as it could be used elsewhere, but one which was incorporated into "Fabric" releases and disk images.  Maybe this gets us further down the Global Sync Log vision that Tamas had shared.

With this, do we then need "dependent projects" and a tiered hierarchy?  Let's return to the concerns:

<cbf>Again, I don't recall us concluding that we were having a tiered hierarchy, but rather a graph.</cbf> 

* Too many projects listed; hopefully with a structured Projects main page, we can manage this complexity more effectively.  I will add that Hyperledger staff recognize the need to significantly upgrade our current landing and marketing pages around each project, and will be coming to you for help with this.

* Bandwidth on the TSC.  I don't think we're going to see dozens of project submissions each week.  Right now the pace of 1 every 2 weeks or so seems likely to continue for awhile, though.  I think this works if the TSC gives projects more of the benefit of the doubt on the technical review, especially if the submission comes with the endorsement of another project (e.g., the Fabric developers endorsing the Fabric Go SDK project) but holding firm on the need to see a real maintainer community, one that will make good on its responsibilities as per the above.

The second place where bandwidth will be an issue is reviewing updates from each project to the TSC, especially if we wanted to adopt Apache's practice of voting to accept the reports during our calls.  This can not be avoided at some level; ultimately the TSC is responsible for everything built at Hyperledger, so rate limiting the project creation step is to be expected.  One way Apache (whose board has to approve approximately 30 project reports on its monthly calls) is for the projects to submit their updates ahead of time, then the TSC members can discuss via email if anything seems amiss, and during the calls only bring up exceptions that require deeper discussion.

This email is already War and Peace so I'll end it here.  The basic proposition is that with a clear definition for what we mean by "project", including a sense of appropriate size/scope and a tagging taxonomy for rendering the list in a structured way, we can avoid the issues cropping up as we see more and more projects under the umbrella.  If you agree, I'll try and work this into a

<cbf>I'm not sure that I do, yet. Again, I think there are multiple problems you're addressing here. One is: "how does Hyperledger market and rationalize its collection of projects" and the other is: "how do we (the TSC and the various projects) manage interdependencies". I think that the changes I made to the governance docs addresses the latter. I am unclear that there is anything else that needs to be changed to accommodate Hart's proposal, but I am open to discussion if others disagree.</cbf> 

set of wiki pages that reflect this view, and then we'll get started on making changes to the web pages appropriately.

Brian


-- 
Brian Behlendorf
Executive Director, Hyperledger
bbehlendorf@...g
Twitter: @brianbehlendorf

_______________________________________________
hyperledger-tsc mailing list
hyperledger-tsc@...ger.org
https://lists.hyperledger.org/mailman/listinfo/hyperledger-tsc



_______________________________________________
hyperledger-tsc mailing list
hyperledger-tsc@lists.hyperledger.org
https://lists.hyperledger.org/mailman/listinfo/hyperledger-tsc



This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.digitalasset.com/emaildisclaimer.html. If you are not the intended recipient, please delete this message.

Join toc@lists.hyperledger.org to automatically receive all group messages.