So, rather than this being a bug on the s themselves, this is a bug in the metho

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Severely flawed methodology, wrong results about velocity HOT 5 CLOSED

cncf commented on July 26, 2024 1

Severely flawed methodology, wrong results

from velocity.

Comments (5)

dankohn commented on July 26, 2024

Thanks for the detailed comments, though I unsurprisingly disagree. The velocity metrics are attempting to measure project velocity; that is how fast a project is moving. They are not trying to measure goodness, or importance, or conciseness of code, or any one of a million other factors you might want to measure. But many developers strongly prefer to use projects that many other people are using and contributing to and velocity is a decent measure of that.

Now, it's fine to consider whether a project like left-pad might score highly in some metric of importance because it is referenced by so many other libraries. But this effort never claimed to measure importance, just velocity, and we should be able to agree that left-pad has a very small number of issues, commits, and authors.

Note that although Asay cites me as his source, I had nothing to do with his article and do not endorse it. However, I do very much stand behind the article I wrote.

from velocity.

joepie91 commented on July 26, 2024

If the article were just about 'velocity' (insofar that is a quantifiable metric), then this wouldn't be a problem. The problem is caused by statements like this:

What differentiates the most successful open source projects? One commonality is that most of them are backed by either one company or a group of companies collaborating together

So, tracking the projects with the highest developer velocity can help illuminate promising areas in which to get involved, and what are likely to be the successful platforms over the next several years.

So, what’s the takeaway? Software development is hard. Running a large open source project is even harder. So, it is often helpful to have backing from an individual company or a consortium of them working through a software foundation.

The structure and funding from a foundation or corporate sponsor provides more confidence that the project will remain active and stable over the long term.

All of these are nonsensical conclusions, based on the data you have. "Success" is not reasonably measured in contributions; it is measured in stability, adoption, level of support, and many other things that do not directly correlate to the absolute amount of contributors or issues/PRs (as I've already explained above).

The claim that "it's helpful to have backing from [a company to run an open-source project]" is also completely unfounded; in no way does the data prove that statement to be the case. The only thing you've proven with your data is that large monolithic projects with lots of commits/authors/issues are often corporately-backed; which, aside from being obvious for the reasons I've already described, simply doesn't translate to any of the other claims you're making.

Stability is also something that doesn't come from a high amount of contributions; quite the opposite, it's something that comes from feature-completeness and relatively few contributions, ie. the exact opposite of what monolithic projects turn into. The ideal project doesn't need to remain 'active' for more than a few months because it is done.

This is like saying "the sky is blue, you can just look up to see that, therefore blue is the most important color" -- all you've proven is that the sky is blue (or at least appears so), but the second part of the statement ("blue is the most important color") is just thrown in there without any backing, even if it seems superficially related. It's the same problem here; your data does not support your conclusions.

And to address one particular point separately:

But many developers strongly prefer to use projects that many other people are using and contributing to and velocity is a decent measure of that.

Not only is this potentially wrong (How many is "many"? Is it statistically significant?), it's also often a misguided approach on the part of the developers who do prefer monolithic "high-velocity" projects; often they have given no consideration to how well particular features are supported, for example, and end up replacing tools or dependencies down the line.

It also has no relevance to real-world concerns like operational/development cost, ease of development, maintainability, security, and so on.

In short: you need to either fix the methodology to support the conclusions you're drawing, or fix the conclusions you're drawing to reflect the data from your current methodology. But as it stands, the conclusions here are nonsensical, and this sounds more like a marketing fluff piece than like legitimate research.

EDIT:

Thanks for the detailed comments, though I unsurprisingly disagree.

That's a very worrying comment. In a research context, one should always be willing to have their research critically assessed; your statement, however, implies that you've already made up your mind and that no amount of criticism is going to change it. That's not how good research is produced.

from velocity.

namliz commented on July 26, 2024

@joepie91 let me congratulate you on the counterexample you provided in your original post. This is called Simpson's paradox and you're entirely not wrong in pointing it out.

I'll further agree with you that the number of commits, authors, comments, pull requests, stars, 'etc - these are in fact all biased metrics in a sense. The bias is towards large and popular projects.

the wrong unit of measure ("project") is used; a more accurate measurement would have been by feature.

At this point things start to make less sense, because that's just, like, your opinion man.
Allow me to make a simple analogy, behold: the Box Office Mojo's top movies of 2017.

Sadly the list is dominated by big budget CGI-ladden corporate backed drivel and the box office returns do not strictly correlate to quality - although let me remind you, quality is subjective, and some of us enjoyed The Emoji Movie and Angular2: Electric Boogaloo quite a bit.

There's no accounting for taste, I agree with you, these people are simply wrong and Lego Batman and React are much better.

Incidentally, do you have any specific and actionable suggestions on how the methodology be changed to account for this?

Measuring how many contributions, maintainers, and users there are on a feature-basis a grand idea. I haven't got the foggiest how you'd collect such data across the entire open source ecosystem. Would be neat.

Furthermore, I think a code quality metric based on stability and elegance of algorithms and impact and what not - kind of like a rotten tomatoes for repositories, with subjective ratings by expert reviewers, is a fantastic idea.

I might actually have to go build that (damn you), but it currently doesn't exist and would measure entirely different things.

To get back to the subject at hand:

In other words: the way you're measuring favours corporate open-source projects, and has therefore already decided the outcome of the research before even starting on it.

it's also often a misguided approach on the part of the developers who do prefer monolithic "high-velocity" projects; often they have given no consideration to how well particular features are supported, for example, and end up replacing tools or dependencies down the line.

Well no, look here. You don't always get to pick a technology on technical merits alone unless you develop in a vacuum.

The robustness of an ecosystem, and dare I say for lack of better term, velocity, surrounding a project is a useful feature of sorts to base your decision on. Not sole feature, but useful and interesting one. For better and worse.

If I had to pick a JS framework for my next project - they are all absolutely dreadful in my opinion - I'd definitely want to get some sense of adoption.

It isn't a coincidence that there are corporate projects at the top of the list (Angular2, React, Polymer, that other one Ebay made that wise asses like to throw into the mix occasionally to mess with your sanity) - definitely all kinds of agendas at play there.

I could run away from Angular2 because I've seen this pattern of adoption/hype before with the corporate train wreck that is Angular1... or towards it if I wanted to pick up essentially a front end COBOL that will ensure me lucrative employment for, shudder, decades to come.

Or, behold, I could point to the one man show of Vue.js giving the big boys a run for their money and convince the stake holders in my company that they'd totally be able to hire Vue.js developers in the future because it has a robust community growing at a high velocity.

Ain't that grand?

People are free to draw their own conclusions from the underlying data, the charts produced by this project are obviously interesting and relevant (even if it doesn't cover your specific curiosities) and are based on the best metrics available at hand.

I'd be very happy to see a better chart from you instead of complaining that popular things are in general popular for the wrong reasons, that's not useful.

from velocity.

joepie91 commented on July 26, 2024

Incidentally, do you have any specific and actionable suggestions on how the methodology be changed to account for this?
Measuring how many contributions, maintainers, and users there are on a feature-basis a grand idea. I haven't got the foggiest how you'd collect such data across the entire open source ecosystem.

To be clear - I'm not saying that measuring on a per-feature basis is the end-all-be-all of accurate research in this area, and there may very well be other issues with it that I haven't considered; but it would address this particular issue with the current methodology.

It is indeed a much, much more difficult metric to obtain; but there exists no law that research must be simple to carry out. The influence of corporate funding on open-source is simply a really difficult thing to quantify (or even research!), and a lot of work will be needed to paint a comprehensive picture of it. Certainly more work than a few scripts scraping GitHub for contributor counts.

That it's really difficult to obtain accurate results, doesn't in any way justify the publication of inaccurate (but easy-to-obtain) results. It simply means that the gathered data did not lead to a useful conflusion, and that it should be either discarded or, ideally, published with a very clear description of the issues it has, such that it can still be used in other research.

That's fundamentally my issue here; conclusions are drawn from this data that the data doesn't support, and they are presented as conclusive and accurate, when they really aren't in the slightest. That leads to articles like that of TechRepublic, which in turn leads to misguided ideas among the general public.

I might actually have to go build that (damn you), but it currently doesn't exist and would measure entirely different things.

I'd be very happy to see something like this exist :) However, it too is no small task to undertake, and there are currently easier wins to be had in other areas; for example, teaching developers how to recognize quality and support issues with dependencies early on by themselves.

The robustness of an ecosystem, and dare I say for lack of better term, velocity, surrounding a project is a useful feature of sorts to base your decision on. Not sole feature, but useful and interesting one. For better and worse.

The problem here is that you're not taking into account that this is a relative metric. For a large, monolithic thing that does many things and needs constant maintenance, you need a stable support network behind it, and you need to have a pool of competent developers to work with it. Training a new developer to work with it is expensive.

This is not true for small, modular dependencies that do one thing; they're "just a bit of other code in the language I know" that any developer can work with after looking at it for ten minutes, because there's no large proprietary ecosystem built around it with decades of habits and oddities that could never be ironed out for backwards compatibility reasons, or components that didn't quite work together because the project maintainers hadn't anticipated your usecase.

Yes, the support base for small modular dependencies is often a lot smaller; but at the same time, the support requirements are also smaller, and typically by a far larger margin than the support base.

When you look at it on a scale of "does the support base meet my support requirements", the answer is going to be "yes" far more often for a modular dependency than for a monolithic one, even when the modular dependencies are one-man projects and the monolithic dependencies are corporate-backed. For many modular dependencies, the support requirements are zero.

(I can't speak for Vue in particular; I don't use it and have no experience with it, and I don't know how modular it really is. This is about modular dependencies in general.)

People are free to draw their own conclusions from the underlying data, the charts produced by this project are obviously interesting and relevant (even if it doesn't sadly cover your specific curiosities) and are based on the best metrics available at hand.

Again: my problem is with the conclusions that are being drawn in the article. If this were a raw data dump, or if the conclusions being drawn were accurate and supported by the data, there'd be no problem. But as it stands, the conclusions in the article are wrong; and "the best metrics available at hand" simply do not meet the minimum bar required to support those conclusions, and therefore should not be used for it. Sometimes the answer is not to publish at all, not to stubbornly push through to have something.

I'd be very happy to see a better chart from you instead of complaining that popular things are in general popular for the wrong reasons, that's not useful.

A better chart of what? I'm pointing out that the conclusions in the article do not match the data, and the remark about developers picking tools for the wrong reasons was simply an example of an unsupported leap of logic being made here.

There's absolutely no obligation to present alternative data or conclusions, when reviewing and criticizing somebody elses research. The criticism stands on its own. I'd be happy to have a discussion about what useful conclusions can be drawn from the data collected here, but that is not what this issue is about, and it's a separate discussion to have.

from velocity.

lukaszgryglicki commented on July 26, 2024

Just a comment.
This is not a GitHub only data.
Data also comes from CloudFoundry, Jira, OpenStack, SVN, LKMA ...
But yes, the main source of data is GitHub.

from velocity.

Severely flawed methodology, wrong results about velocity HOT 5 CLOSED

Comments (5)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent