Code Monkey home page Code Monkey logo

Comments (91)

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024 3

I will make the wrong choice so that we can argue about it.

from juleps.

tkelman avatar tkelman commented on May 20, 2024 3

We should move this aspect of discussion to its own issue, but I think it's totally reasonable today to require that Julia packages must have a git repo (or git mirror of something else) as the development source of record. What we should try to keep feasible is allowing the flexibility of downloading release tags at install time to users' systems in a form other than a full git clone though.

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024 2

I really hope that package management and compatibility can be managed
outside of the actual codebase as much as possible. In fact I wish that we
didn't use git tags at all. Forcing package authors to add new commits (and
tag them) just to fix a dependency resolution is ridiculous. Please lets
put all requirements outside of the actual package repo. Let a core group
of people manage those dependencies for the curated metadata, with advice
from authors. Private metadatas will be easier to manage as well.

On Wednesday, November 16, 2016, Tony Kelman [email protected]
wrote:

Splitting a discussion without posting to that effect in the discussion
itself isn't terribly effective.

Compatibility constraints are either correct, too tight, or too loose with
respect to the time and set of available dependency versions when you state
them. As new versions become available, a previously correct set of
constraints can become too tight if it doesn't include working versions,
too loose if it does not indicate new breakage, or remain correct.
Compatibility claims that were too tight or too loose when they were first
made may need to be amended after the fact.

If making personal registries is simple, then I don't think it's worth
worrying about how to amend compatibility for unregistered packages. Source
releases should be immutable, compatibility often needs to be amended, so
compatibility should be tracked outside of the source. If you need to amend
compatibility for an unregistered package, then create a personal registry
to track it.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492n9oSA1HZIK2ZVNgjFtcMlQUuF8oks5q-51RgaJpZM4Kyxf_
.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024 1

The notion that you can build a functioning ecosystem of reusable software without authors thinking about versioning at all strikes me as incredibly implausible, not to mention totally unscalable. Who's going to be spending all of their time figuring out how to version every single registered package? Your answer here seems to be "I dunno, but not me." If you want to develop software that way, that's cool – then don't register your packages. What I'm proposing will support unregistered packages much better, but it won't change the fact that following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time.

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024 1

I think @tkelman is on the right track. One thing that I think would be very valuable is if tags didn't have to apply only to a single package, but could actually refer to a set of commits for related packages. This would make it easier for authors (like me) to say "here's the new version of this ecosystem". It would also greatly improve the dependency resolution problem, as groups of packages would be seen as one unit when doing resolution. All of this depends on tagging being separate from the repo contents.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024 1

I have to say I think this is not the right direction at all. Under this "external versioning" proposal, the source repo for a project would have no information about what other packages or libraries it depends on. Instead, you need to check out the project and then find its registry – if there is one – in order to be able to even know what other packages it needs.

Tagging multiple packages with a single version adds yet another layer of dependency and complexity to an already complex system. If packages are all being versioned together in lockstep, why are they separate packages in the first place? This can already be accomplished by just giving all the packages the same version number – then tell people to use the 2.3.1 version of all of the packages.

from juleps.

tkelman avatar tkelman commented on May 20, 2024 1

If you require a commit to exist in the package repo for a compatibility update and refer to that commit as the source of that version that gets downloaded, then you have to wait for a possibly non responsive package author to act to fix compatibility issues with their package. Or redirect to a fork any time this happens.

We want more automation for testing and auditing in the public registry, but we also want it to be easy and not require too much infrastructure to maintain a separate one.

from juleps.

simonbyrne avatar simonbyrne commented on May 20, 2024 1

Request registration, which can be rejected or accepted. Once a version is registered, then do the tagging – this is one place where the registry having commit access to the repo would be handy. Unfortunately, there are no pull requests for tags. This could be part of future PkgDev functionality.

The only realistic way I could see this working is that the registry itself maintain a fork of all the repositories, and point to those instead: releases could then be git tags which are signed by the registry.

This may also address Tony's concerns, in that the registry maintainers can then push updates to REQUIRE files in the fork, without any input required by package author. It would also address the problem of package authors deleting their repos in a fit of spite, a la NPM's leftpad problem.

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024 1

That sounds pretty reasonable @simonbyrne. And my point above was that "Package author requests new release" could just as easily be "community requests new release" without any hiccups (with the social understanding that we should default to the author's wishes whenever feasible).

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024 1

And don't forget #3: user api. Make it dirt-simple for everyone involved to
follow best practices... So then they might.

I agree these can be designed separately.

On Tue, Nov 22, 2016 at 6:27 PM Stefan Karpinski [email protected]
wrote:

Maybe we should separate the two jobs of a registry:

  1. Validation: checking that a proposed version makes sense – that it
    satisfies certain requirements and checks.
  2. Collection: keeping package and version metadata in a centralized
    location.

The former is the part that requires intelligence and automation while the
latter is dead simple.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492sI7is9lNIJdKJULbLvDqneBII-Pks5rA3p-gaJpZM4Kyxf_
.

from juleps.

simonbyrne avatar simonbyrne commented on May 20, 2024 1

Archiving past versions is a good idea, but doing so by having every registry also maintain git forks of all packages is making our "github as cdn" abuse worse.

As I understand it, GitHub is fairly intelligent about not unnecessarily replicating data across forks (thanks to git's immutable objects), so I don't think this is really an issue.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

If we allow compatibility of versions to be mutated after the fact (as we do now in METADATA), one major issue is that it will be impossible, when compatibility has been modified later, to know what the state of compatibility constraints on versions actually were when versions were resolved. This could hide resolution bugs and generally makes understanding the system harder.

One possible solution is for each modification of compatibility constraints to increment a build number of a version or something like that, so 1.2.3 is the version with its original compatibility, while 1.2.3+1 would be a version with potentially modified compatibility or other metadata changes, which would get its own metadata in the registry, but share the same source tree.

At that point, however, I have to question why 1.2.3+1 wouldn't simply be called 1.2.4. The main objection seems to be that it's annoying / hard to create patches and package maintainers often aren't as responsive as we'd like. Which makes me think that we should just make it easier to make this kind of patch update and make it possible without the package maintainers involvement.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

In particular, patches don't need to be made on the main repository of a project, they can be made on a fork as long as they are eventually upstreamed back to the main repo.

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

+1 UUIDs

from juleps.

tkelman avatar tkelman commented on May 20, 2024

The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases.

The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases.

If the latest patch release always supersedes previous ones in the the same major-minor series, then you can always just make a new patch. The only way needing 1.2.3+1 rather than 1.2.19 makes sense is if you want a version with compatibility fixes but without any bugfixes. That seems like a somewhat implausible situation. How would this be necessary? If such a situation did occur, we could always allow publishing 1.2.3+1 with updated compatibility but without bug fixes.

The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term.

That means we'd have to record the state of all registries in the environment, which ties the meaning of an environment to the history of registries in a way that we are (or at least I am) trying to avoid. If version compatiblity is immutable (in either 1.2.3+1 or 1.2.4 form), then you can always tell just by looking at the compatibility info for those version whether they are correct. You can't tell if they were optimal at the time, but you can verify correctness.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

If the latest patch release always supersedes previous ones in the the same major-minor series

This is not a good idea, as I've said before - there's not a lot of precedent for allowing code changes to completely supercede old versions. If there's going to be a second class of dependency resolution for complete replacement, then it should not be allowing code changes. People break their api in bugfix releases even if we tell them not to, and downstream packages are going to need to be able to use api's that only existed in early patch releases. And this situation might not be noticed immediately, so there could be enough later patch and minor releases that there isn't room to fix the situation by making a new set of renumbered releases.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

So are you ok with the idea of version metadata – especially compatibility – being immutable, but having 1.2.3+1 supercede 1.2.3 with no source code changes, only metadata changes?

from juleps.

tkelman avatar tkelman commented on May 20, 2024

Yes, that seems like a mostly equivalent way of accomplishing the same thing as modifying compatibility in metadata. It records more history permanently (not just in git history), maybe that could be useful though.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

I do think we should keep a log of version history used by local registry copies over time, so you could feasibly implement an "undo" of a global update operation. That's a separate issue though.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Or are you entirely against the idea that version metadata be immutable?

from juleps.

martinholters avatar martinholters commented on May 20, 2024

Creating such a metadata-only update would be simplified if the metadata was only part of the registry, not the package itself, i.e. 1.2.3+1 could have the same hashes stored as 1.2.3. Actually, it would have to, to enforce the "no source code changes" policy. This would a) allow easy automatic verification of this policy and b) simplify metadata-only updates by non-package-maintainers.

Would that be an option? (Or is that already the idea and I misread the proposal?)

from juleps.

simonbyrne avatar simonbyrne commented on May 20, 2024

The example I gave in the other thread illustrates why patches are insufficient:

  1. Pkg B v2.0.0 depends on v1.2 of Pkg A
  2. Pkg C v3.0.0 depends on v1.2 of Pkg A
  3. Pkg A v1.3.0 is tagged with new features
  4. Pkg B v2.1.0 is tagged using features of Pkg A v1.3.0, but forgets to update the version requirement
  5. Pkg B v2.1.1 is tagged fixing this.

Now user installs Pkg B and Pkg C: the end result would be:

  • Pkg A v1.2.x (as this is the latest version compatible with Pkg C)
  • Pkg B v2.1.0 (as this is the latest version compatible with Pkg A v1.2)
  • Pkg C v3.0.0

which would be broken.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

@martinholters: Yes, having compatibility info not live in the package repo is definitely a possibility, but it would make it harder for unregistered packages to participate in version resolution. Since making unregistered packages easier to work with was one of the major requests for Pkg3, that's a bit of a problem. Also, if we move compatibility info out of the package itself, where does the developer edit it? The obvious answer is in the registry but I feel like that's not tremendously obvious or developer-friendly.

@simonbyrne: This wouldn't be the result under what I've proposed since the existence of Pkg B v2.1.1 would prevent resolution from ever choosing Pkg B v2.1.0 – that's what "strongly favor the latest patch release" is meant to convey. Instead you would get A v1.2.x, B v2.0.0 and C v3.0.0. In the other approach being discussed here, B v2.1.0+1 would fix B v2.1.0's dependencies and would similarly hide B v2.1.0 from consideration when resolving new versions.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

The core of @tkelman's objection (assuming he's not against the idea of immutable version metadata entirely, which would be good to get an answer on), seems to be that updating version metadata via new patches allows metadata fixes to be mixed with bug fixes – well, technically arbitrary source code changes, since people may not just fix bugs in patch versions. But if people stick with bug fixes in patches, this won't be a problem: why would you want a buggier version? Yes, people will screw up bug fixes, but then the appropriate action is to make another patch that fixes the fix.

Fixing version metadata for 1.2.3 by releasing 1.2.4 is less flexible that adding another level of metadata-changes-only versioning like 1.2.3+1. So why not just add another layer and semantically separate metadata changes from code changes of any kind? One reason is that semantic versioning already has three layers of versioning, which is already a lot to deal with and reason about, and adding another one seems complicated and unnecessary. At the level of practical development, people only use branches corresponding to major/minor versions: patches occur on branches with names like release-1.2 – if you want to make a new 1.2.x release, you tag the tip of release-1.2. How would this workflow change with metadata-only changes like 1.2.3+1? You need a branch for each patch release now: you'd make metadata-only fixes on release-1.2.3 and you'd need a branch like that for every single release. That just seems ridiculous. If you make metadata fixes via new patch releases, mixed in with other bug fixes, then the current workflow doesn't change at all – just fix version metadata on the release-1.2 branch and tag a new patch.

My perspective is that we want to design the package manager so that making patch versions that do anything besides fixing bugs is problematic. This will actively encourage package developers to only fix bugs in patches. Two feature of the proposed design that encourage this are:

  1. Have newer patches fully supercede older ones with the same major/minor version.
  2. Not allowing version dependencies to specify versions at patch granularity.

Both of these design choices assume that patches with the same major/minor version are equivalent aside from metadata updates and bug fixes. If a package maintainer violates this assumption by adding or removing functionality in a patch, it will cause problems. Problems lead to complaints, which will provide feedback to the maintainer and help them learn that this is bad practice and not do it in the future. This is not based on some sort of groundless optimism that people will do things correctly on their own, it's based on the principle that people respond to feedback and that we can design a system that actively causes people to receive corrective feedback. Is this limiting the ways that package developers can version their packages and have things work smoothly? Yes, but I think that's a good thing.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

If a compatibility-only change can be done only at the registry level without needing the source to change at all, then there's no need for a branch for a compatibility revision.

Designing the system to be intentionally rigid and inherently flawed in the face of a behavior that people will commonly do (a recent example, changing the type of a single parameter of a single function - that breaks the api but seems like a minor change), and in a way that cannot be easily fixed once newer versions have been published, is why I think this goal is a bad idea.

The core job of a package manager is if source has been published as a release version, it should be possible to depend on it. Demoting the patch level of versioning from this is unnecessary, adds friction to the system, and doesn't gain us anything. Downstream users are the ones who face problems from versioning mistakes, and are incapable of fixing them or working around them without cooperation from the upstream author, or forking the package and re-releasing a new series of different version numbers. We don't gain enough for this to be worth it.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

What qualifies as a bugfix is not always clear cut either. In fixing one bug, you can often accidentally (or intentionally!) break something else that downstream users were depending on. And these issues don't get identified immediately. By the time some of these issues are found, the upstream author may have moved on to a newer release series, that the downstream users don't have time to upgrade to right away (especially if there was a past release that worked fine for them). What option does downstream have to get their code working again? They could publish a fork without any of the more recent releases, but why have we made them go to that trouble when a patch level upper bound would serve the exact same purpose?

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

The problem with having registry-only compatibility changes is that it:

  1. makes compatibility confusing since there are multiple conflicting – and changing – sources of what a version's compatibility actually is, and it
  2. makes registered and unregistered packages work completely differently – registered packages have a mechanism for amending compatibility while unregistered ones don't.

The process I'm proposing is straightforward and the same for registered or unregistered packages: keep definitive compatibility info in Config.toml; when compatibility needs to be adjusted, just edit Config.toml on the appropriate release branch, commit the changes and publish the tip of the release branch as a new patch.

Preferring the latest patch for version resolution doesn't make it impossible to use older patches, nor does it force users to upgrade to the latest patch – if what they're using works, no problem:

  • if you're already using v2.1.0 and it works, no problem
  • if an environment records v2.1.0 and you run it, you get v2.1.0
  • if you install or upgrade, then yes, you’ll always get v2.1.1 instead of v2.1.0
  • but you can still explicitly ask for v2.1.0, e.g. with pkg> add A = 2.1.0

The example you allude to (where was this?) with a changed type parameter is a simple broken patch. The correct fix in such a situation if you depend on the package to exclude that specific broken patch, which solves the problem; if you're the package maintainer, the fix is to revert the part of the change that broke compatibility for someone and make a new patch release. Neither is a big problem.

I would love an actual problematic case that can't be handled with what I'm proposing instead of general arguments about what package managers should or shouldn't do. If there's some problem scenario, I want to know about it. The kind of example @simonbyrne presented is exactly what I'm talking about (hopefully my answer to that is convincing to him). The Compat example in #3, is also exactly what I'm talking about: the fact that minor updates to packages with many dependents (Compat being the most extreme example) would force patching of all dependents is a devastating problem with my original proposal, hence #15 (comment).

from juleps.

tkelman avatar tkelman commented on May 20, 2024

The problem is the "broken patch" is broken from the perspective of downstream users who were using the old api, but intended as a new api by the upstream author. Upstream isn't going to revert it. Downstream then needs to indicate that all future patches are broken. That's not possible in this proposal, every new upstream release would break the downstream until downstream gets a chance to add another broken patch to their list.

It's not possible for compatibility to be set in stone and never change - compatibility depends on the entire set of possible interacting versions of dependencies, it always changes as new versions get released.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

You are proposing making it impossible to declare version compatibility bounds at patch granularity. That's necessary in the case above, where

package B depends on package A, which is at say v 1.3.3 when package B gets written (and it relies on a feature that was new in 1.3.0)
package A breaks api between versions 1.3.5 and 1.3.6
package A makes many more 1.3.x releases, several 1.4.y, and has started on 2.0.0
package B gets a report that it doesn't work any more with package A v1.4.3

Assuming the author of package B can remember or recover from environment info what version of package A did work, there's no way in this proposal of reflecting its requirements since it can't express an upper bound on A v 1.3.6 that caused the problem. It could say every patch from 1.3.6 on is broken, but if those have to be listed individually then it becomes incorrect as soon as an additional 1.3.17 backport gets released. The most practical solution to immediately get a working version of its dependency is to republish a fork of the old version of package A.

What problem is solved by disallowing requirements at patch granularity, and disallowing expressing requirements as ranges?

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

The subject of this issue is immutability of compatibility, which is orthogonal to patch granularity. I was trying to unmuddy the discussion by splitting #3 in to this issue and #15, which would be a better place to discuss patch granularity, although that's explicitly about the opposite complaint: that the granularity is too fine, which I already conceded.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

Splitting a discussion without posting to that effect in the discussion itself isn't terribly effective.

Compatibility constraints are either correct, too tight, or too loose with respect to the time and set of available dependency versions when you state them. As new versions become available, a previously correct set of constraints can become too tight if it doesn't include working versions, too loose if it does not indicate new breakage, or remain correct. Compatibility claims that were too tight or too loose when they were first made may need to be amended after the fact.

If making personal registries is simple, then I don't think it's worth worrying about how to amend compatibility for unregistered packages. Source releases should be immutable, compatibility often needs to be amended, so compatibility should be tracked outside of the source. If you need to amend compatibility for an unregistered package, then create a personal registry to track it.

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

+1.618 for allowing me to become unconcerned with anything git related

from juleps.

tkelman avatar tkelman commented on May 20, 2024

@tbreloff package authors need to be responsible for dependency versioning. What features are you using, when things break how do you fix or work around them, etc. That comes with the territory of having dependencies. If you get any help you're lucky, but you can't expect other people to do this for you.

An outside-of-the-source copy of the dependency information may need to take priority here though, as in the existing system where metadata is used for registered packages, the package's copy of REQUIRE isn't actually used except at tag time to populate the initial content.

A compatibility-only revision release could be a mechanism for this, but it needs to be possible to do that for any published release, not just the latest within a minor series. Compatibility is about the rest of the world with respect to a fixed version of a package - we shouldn't be mixing the release numbering or resolution mechanism for outside-world compatibility within the same system (and constraints) that we use for a package's own source.

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024

So then maybe what I'd like is a little more subtle. It would be nice if
the larger community had a mechanism to tag and fix dependecies in place of
authors that don't have the time or knowledge to keep up with the process.
How many times a day do you have to tell people exactly what they need to
do and how to do it in order to properly register or tag? Wouldn't it be
easier for everyone involved if you just did it yourself? You're the one
with commit access to metadata, so why go through the silly and pointless
steps that make it seem like the author has anything valuable to add? I'd
be happy with v1.2+ and v1.2.3+ if it means problems are immediately solved
by the people who understand the right way to solve them.

tl;dr Manage as much as possible from within metadata(s) without
necessarily requiring the author

On Thursday, November 17, 2016, Tony Kelman [email protected]
wrote:

@tbreloff https://github.com/tbreloff package authors need to be
responsible for dependency versioning. What features are you using, when
things break how do you fix or work around them, etc. That comes with the
territory of having dependencies. If you get any help you're lucky, but you
can't expect other people to do this for you.

An outside-of-the-source copy of the dependency information may need to
take priority here though, as in the existing system where metadata is used
for registered packages, the package's copy of REQUIRE isn't actually used
except at tag time to populate the initial content.

A compatibility-only revision release could be a mechanism for this, but
it needs to be possible to do that for any published release, not just the
latest within a minor series. Compatibility is about the rest of the world
with respect to a fixed version of a package - we shouldn't be mixing the
release numbering or resolution mechanism for outside-world compatibility
within the same system (and constraints) that we use for a package's own
source.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492h_v_zsbEfvzKw1j5Q2UwR_c06Fgks5q-_OagaJpZM4Kyxf_
.

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024

without authors thinking about versioning at all

Of course there's a middle ground. Authors think about the high level versioning, but not necessarily the gritty details (that frequently are due to other packages out of their control). Those details should either be handled by automation or by expert guidance, depending on the situation.

Your answer here seems to be "I dunno, but not me."

When it comes to curated metadata repos, if I'm not a curator then the final responsibility is not mine. Package authors can guide versioning (and should be encouraged to do as much as possible themselves) but this mentality that curators should never make changes to the thing they're curating, but instead to enact social pressure on package authors until they make the exact change that the curator could have done in the first place... it's just stupid. I want to see the curation as disjoint from the code.

following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time.

I couldn't agree more, which is why I care so much about making it dirt-simple to "do the right thing".

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

@StefanKarpinski @tbreloff. Each of you is right, In important measure

I have seen the need for handholding in the less well traveled regions of the deep end of the pool. increases superlinearly. @tkelman The work you do helping us deal with tags and git when it goes on a bender probably is more informative than predictive.

This Summer and next Fall I expect for Julia a flood of new and very active involvement. Something is going feel the extra weight. 🚶‍♂️ (mmph, 😢) "I do not want to play with git" (😢, mmph)

between update and upgrade. ?uplift

from juleps.

simonbyrne avatar simonbyrne commented on May 20, 2024

Perhaps it would be useful to gather some data.

  • Do we have examples where the versions were tagged incorrectly, or other "broken" version resolution cases. How were these resolved?
  • How do other package managers handle these problems?

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

@tkelman Do you recall any of my chained missteps?

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

@simonbyrne I can share some subjective sense of what went wrong on a few occasions. I don't know how to try finding the events and extracting the file changes. By far the worst experience with git was not about tags, I tried to prepare Julia's source for deprecating symbol in favor of Symbol. I had put in the time and all the alterations were ready, and, as I recall, passed testing. Before the changes happened, someone suggested one other change to include. It was a legit request, but it was not one more of the changes that I had made work. All I remember after that is frustration building, many attempts to get what had been ok to become ok and just as many failures. Then someone else took on the task.

With tags, more than once a delay to adjust something minor, has been enough to drift away from METADATA prime and things get out of sync -- I have found the additional doings that entails so everything is back in sync and there is no residual issue/renaming/omission not intuitive and different from what goes on absent needing to readjust something. That does not work for me. And I no longer try to make it work. Instead I will erase all relevant forks, detag one or more tags and try again.

I have gotten the local tags and the github tags to be incongruous twice without much idea of how. At some point, with frequent pulling or pushing as appropriate the remote had tags through 0.1.2 and the local through 0.1.8. I had to unmake and remake them and I am not confident it really is all fixed.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Actually, I've considered requiring that registered packages give admins of the registry in question commit access so that they can fix things as necessary, but that's not really a package manager design choice. Alternatively, since the Pkg3 design makes it possible to have multiple sources for a package, curators can have forks of any packages and tag versions on their forks. I'm not sure what else you've got in mind, @tbreloff? Are you advocating for taking compatibility information out of the repo entirely? That basically makes using unregistered packages impossible, which to me is the wrong direction.

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

+1 "give registry admins commit access"

from juleps.

tkelman avatar tkelman commented on May 20, 2024

If the goal is to treat registered and unregistered packages the same (the logic in Pkg2 is made more complicated and error prone because of the different treatment), then I have an idea that doesn't require changing the way we do version resolution, and doesn't try to make compatibility info immutable.

What if there is no such thing as an unregistered package at all, but a package can contain its own registry info as part of the same repository? It would just need its own information, so basically the same data as Config.toml but in an append-only registry history format instead of as git revisions. Registries are living history records that are designed to live on master (or tip of a registry-vN branch). As long as the files are disjoint between what a registry stores (just a single file for a single package, presumably) and the package source code, then couldn't they come from the same repo? If the redundancy of having 2 clones at different sha's and used for diifferent purposes bothers anyone, they can split the registry file to a different repo.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

So what is associated with the version? Some subtree of the repo excluding the version metadata?

from juleps.

tkelman avatar tkelman commented on May 20, 2024

I need to reread whether the registry format is fully specced out here yet. I'm imagining something like a Registry.toml or JuliaRegistry.toml at top level that contains a list of package names and paths to individual detail files (to support directory sharding for large registries). For a "self-registered" package that contains its own registry info, there would only be one package name and version details file.

The release commit of such a package would have to have its own compatibility info in Config.toml but can't save its own hash to its registry history until after the release tag, but since registries don't usually operate from tags I don't think that's a problem.

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

I think this thread is on the right track. @tbreloff envisions a new tag power__¹__. The capability to tag with designated scope requires that tags be scope associated. Let's ensure that multiscopic tags play well with tags now in use: The absence of associated scope is not absence of scope, it is package scope.

This would give Julia a comparative advantage. I recommend defining tag scope to be yet more plyable. Leveraging the internals of Julia's type system to support scope as scoping over MaybeScope (simple scope | superscope | set of scopes | nothing) and to ensure there is a ground state. Use URIs.

¹ (Do you prefer yesterday's tags? Have you heard the all the new tags come cold brewed, you'll love their refined taste!)

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

@StefanKarpinski I did not read that the emphasis was on "external" -- my comment was in response to a [possibly imagined] proposal to allow tags to tag other tagged things in what seemed to be an elegant and easily expressed way. 👎 Complete decoupling of a thing from its constitutive self.

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024

@StefanKarpinski too bad... I was hoping to be able to trash MetaPkg when Pkg3 was released, but it seems like it will still be needed. I suppose tooling to jointly test/tag/publish as well as auto-adding version dependency limits is all we need... it doesn't necessarily need to be supported at the Pkg3 level.

from juleps.

JeffreySarnoff avatar JeffreySarnoff commented on May 20, 2024

Sam Boyer thought so you want to write a package manager of interest.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

Versioning info would be present in the repo, but that copy wouldn't be the definitive source - only the best effort using information available at the time of tagging. Compatibility info needs to be possible to amend in order to fix any mistakes, or update past source releases for new information that was not available when they were tagged. I don't think we should force a new source release to update outside-world compatibility information - that makes it difficult to depend long-term on anything more than a single release per minor series. Compatibility is stored separately from the source in registries anyway, and amending it should be possible without touching a source release.

Multiple packages, if they depend on all being specific version numbers, should be expressed by compatibility bounds.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

A few observations:

  1. It’s confusing for the checked out source version of a package to say one thing about compatibility while the registry for the package says something else.
  2. When a package’s source is checked out somewhere, we should apply any registry updates to its config file.
  3. It will be annoying if this is done in a git repo and the file is being tracked by git since that will make it dirty – the changed version should be committed.
  4. Since this commit exists in our heads, it should also exist in the actual source repo.
  5. Even if there isn’t an actual commit for it, when you update the compatibility of a version, a new commit is implied – whether we actually make it or not. If we canonicalize config files, this commit has a completely predictable tree hash. It seems generally less confusing for the commit to actually exist rather than for it to only exist in our heads.
  6. Updated version of compatibility information for a package should be upstreamed back into the package source anyway, so that when future versions are tagged, they also include those changes. In other words, having this commit in the git repo isn’t just less confusing, it’s also useful/necessary for package development.

All of this points to creating and upstreaming source repo commits that correspond to modifications of each version’s compatibility information, instead of just making the changes in a registry. Moreover, the new commit with updated config info is what you should check out as the source of the package.

What about evolving the compatibility metadata of a version in-place in its registry without changing what we call that version? We do this now – so what’s the problem?

  1. Pkg3 will support multiple registries.
  2. Different registries can provide additional versions packages as long as they agree on the ones they have in common. This allows a private registry to make a tentative patch version (e.g. “v1.2.3+hotfix”) and use it before an official fix has been upstreamed to the main registry.
  3. If we just modify version metadata in-place in registries without changing version names, how is this supposed to work? What happens when two different registries have different metadata associated with a particular version? Which one do we use? How do we know which one is newer?
  4. When the package manager decides to use a particular version of a package and records that in an environment, we want to know which version of its metadata was in effect at the time, so that we can at least tell, after the fact, whether the choice was valid according to metadata of all of the versions chosen at the time. (There may be valid reasons ignore compatibility constraints and use a version anyway.)
  5. There are various ways to distinguish in an environment, which version of a version we’re using, but they are all equivalent to giving it a new name.
  6. I’ve proposed calling the version revision of source version v1.2.3 with updated compatibility, v1.2.3+1. Let’s go with that for the sake of argument.
  7. Remember that commit that’s implied by any update to a version’s compatibility information? Yeah, that one. The one that should probably exist in the source repo. If we’re calling that “v1.2.3+1” in the package manager and in environment files, and there’s a corresponding commit, then we should probably propagate that tag back to the source repo so that git also calls it “v1.2.3+1”.

Taken with the above points, this all leads us to one thing: immutable versions, with immutable compatibility, but with compatibility updates expressed as version revisions, e.g. v1.2.3+1, and these updates are upstreamed to the original source repositories as appropriately tagged commits, merged back into the relevant release branches.

Aside on tagging. Yes, version tagging in Pkg2 is a nightmare. It was a design mistake to tag versions before they are accepted into registries. We won’t repeat that design mistake in Pkg3 – version tagging will flow from the registry to the source repo, not the other way around. Any arguments about the annoyingness of tagging versions stems from this, not some fundamental problem with the concept of having git tags that correspond to versions, which is handy once they’re correct since it means that git knows what we call these things.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

There also seems to be some semantic confusion in this thread that I'd like to address:

Compatibility info needs to be possible to amend in order to fix any mistakes, or update past source releases for new information that was not available when they were tagged

No one is arguing that we will declare version compatibility once and for all and it will be correct and perfect forever. That's totally impractical – there's a reason that I made metadata mutable in Pkg1/2. What I believe should be immutable is the association between a particular package version and the claims it made about compatibility at the time. Even if this is immutable, compatibility can still be updated, just not by rewriting history, but instead by adding new information that supersedes old information.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

@tbreloff: regarding snapshots of entire sets of related packages, in the Pkg3 design you can just configure your repos so that commits include sufficient environment information like specific package versions and/or source tree hashes. That way people following along can just check out those exact versions instead of depending on normal version resolution. As long as you make sure that tests pass and you've committed your latest environment, people should be able to easily reproduce the exact same working set of package versions. This approach will work much better for rapidly moving collections of closely related packages like your Plots stuff or the Kenoverse.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

I don't think your observations 1-5 are all that major. 6 is often the case but not guaranteed - sometimes a compatibility adjustment for a past source version isn't relevant to newer source versions.

It may be less confusing to give these compatibility-only modified releases new names, but anything that replaces a past version in dependency resolution should be enforced as having the same source, otherwise this mechanism is a shortcut around immutable source releases. If we're going to have this path for republishing replacement versions as entirely new entities, we'd need to enforce somehow that these versions are only originated from registry compatibility adjustments and only modify compatibility, rather than potentially allowing arbitrary source changes. If you allow arbitrary source changes in an update that replaces a past version in dependency resolution, then that's essentially equivalent to unpublishing the past release tag which isn't good for reproducibility.

You're right that the state of the registry information plays into reproducibility of the compatibility state, so maybe that should also be recorded rather than trying to find ways to avoid having to think about it. But I think it would be more predictable and invite fewer opportunities for subverting the system if compatibility versions were specified as purely virtual registry-generated entities, guaranteed to be derived from an existing base release. Corresponding modifications and upstreaming of the package config.toml in place can be optional, and it's not always necessary or appropriate for a package developer to incorporate such changes in all future versions. The content of a compatibility revision shouldn't be allowed to be any arbitrary thing submitted by a package developer, it should be constrained. Tracking the information separately is a way of accomplishing that by design, and I don't see it as all that confusing or problematic. After all, other package names and versions in any statement of compatibility are already implicitly with respect to some registry.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

Since other package names and versions are meaningless without taking into account the registry that tells you what those package names and versions correspond to, it's a bit "wrong" to store a package's compatibility info within its source. It's implicitly a representation of what the future registry entry is going to say about that package, and gets ignored for most other purposes. Maybe we can think a bit about whether this system really makes sense. Right now REQUIRE is used to save state and remove the need to type out its entire content every time you make a tag, but it's sort of being stored and tracked in the wrong place. Depending on how development for packages is supposed to work in Pkg3, maybe we could change where we keep this information for under-development versions of a package.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

@tkelman: I don't think your observations 1-5 are all that major.

This sort of response is not constructive. This is your opinion backed by zero argument or discussion. Your feedback on this particular issue so far largely amounts to "I want it to continue to work the way it does now." When I make a carefully broken down argument for why we can't / shouldn't do that, it's not for my sake, it's so that we can have a constructive debate and zero in on specific points where there are concrete problems to be avoided.

Enforcing compatibility-only updates isn't exactly rocket science: don't allow version v1.2.3+1 to be registered if it differs from v1.2.3 in terms of anything but compatibility. Your stated preference for purely virtual versions is unconvincing – I made a thorough, multi-point argument for non-virtual versions being less confusing, more usable and more practical for package development, and you did not refute any of it, just dismissed all the points as "not all that major". Let's play this one again... Whether they exist in version control or not, compatibility updates correspond to predictable commits that could be materialized. So should they be materialized or not?

  • Do we want to actually materialize them on disk? Yes.
  • Do we want to materialize them in git repositories sometimes? Yes.
  • Do we want them to be ancestors of later commits sometimes? Yes.

Why wouldn't we materialize them? The only reason is that they could change source – which we can easily verify that they don't upon registration. All of this points to having having compatibility updates being actual, not virtual.

Tracking the information separately is a way of accomplishing that by design, and I don't see it as all that confusing or problematic. After all, other package names and versions in any statement of compatibility are already implicitly with respect to some registry.

This is no longer true in Pkg3. By introducing multiple registries, some of which are private, the registration system necessarily becomes distributed, federated and not globally visible. Package UUIDs give packages identity independent from registries – even unregistered ones. Packages can and will move between registries (e.g. from uncurated to curated or from private to public), and it is possible to depend on packages in other registries. The registry cannot be the determiner of package or version identity anymore. Having each package's version history in its repository may be a good idea, although that inforformation would be redundant with git tags, so maybe not.

from juleps.

simonbyrne avatar simonbyrne commented on May 20, 2024

version tagging will flow from the registry to the source repo, not the other way around

How do you envision that this would work?

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024

This is no longer true in Pkg3. By introducing multiple registries, some of which are private, the registration system necessarily becomes distributed, federated and not globally visible

For me, this statement is the key reason that dependency info should be independent in concept from the source repo. Dependency resolution will be determined by one or more (possibly conflicting) registries. In the end, I don't think it's ever a robust solution to use the deps from the package repo... it should be completely determined by the registries. Now, keeping a copy of dependency info for initialization of a registry file is a mere convenience, and is orthogonal to whether repo code and dependency info are independent concepts.

from juleps.

tkelman avatar tkelman commented on May 20, 2024
  1. When a package’s source is checked out somewhere, we should apply any registry updates to its config file.

Why? You haven't given any reason for doing this or problem that it solves, and 3-5 flow from this fairly weak motivation.

  1. It’s confusing for the checked out source version of a package to say one thing about compatibility while the registry for the package says something else.

I disagree that this is all that confusing. They are different information by way of past package versions being immutable, and registries not.

If config.toml is only the claim, at tagging time, about compatibility, then sure it can be an immutable part of a source release's content. But you shouldn't use the claim at tagging time indefinitely as the source of this information. That would necessitate making a new source release for any change to its content, and if that process allows replacement of anything other than the compatibility info in config.toml, then it makes the primary job of the package manager, ensuring previously published releases can be depended on indefinitely, unreliable all around. This is my core objection to this, unconstrained replacement of previously published versions should not be a designed-in feature that's up to manual review in a registry to enforce.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

version tagging will flow from the registry to the source repo, not the other way around

How do you envision that this would work?

Request registration, which can be rejected or accepted. Once a version is registered, then do the tagging – this is one place where the registry having commit access to the repo would be handy. Unfortunately, there are no pull requests for tags. This could be part of future PkgDev functionality.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

This is no longer true in Pkg3. By introducing multiple registries, some of which are private, the registration system necessarily becomes distributed, federated and not globally visible

For me, this statement is the key reason that dependency info should be independent in concept from the source repo. Dependency resolution will be determined by one or more (possibly conflicting) registries. In the end, I don't think it's ever a robust solution to use the deps from the package repo... it should be completely determined by the registries. Now, keeping a copy of dependency info for initialization of a registry file is a mere convenience, and is orthogonal to whether repo code and dependency info are independent concepts.

So different registries could have completely different notions of what the version numbers and associated commits of a package are? What do you do when registries disagree? How do you reconcile this? Completely external versioning from arbitrarily many federated authorities would be total chaos. There has to be an authoritative source for each package. The obvious place for that is in the package repo itself.

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024

Completely external versioning from arbitrarily many federated authorities would be total chaos

What about the very realistic scenario that an organization wants to define specific versions/deps which are not public, some of which overlap with a JuliaLang registry. They could: 1) fork and fix all repos that disagree with their preferred dependency resolution, or 2) use an alternative package manager. Why not try to support 3) Override the dependencies.

I'm not saying "no dependency info allowed in the package repo"... I'm saying that the package repo should not be the definitive source... registries should take precedence. And since registries can take precedence, it's not required to keep deps info in the package repo.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024
  1. It’s confusing for the checked out source version of a package to say one thing about compatibility while the registry for the package says something else.

I disagree that this is all that confusing. They are different information by way of past package versions being immutable, and registries not.

  1. Innumerable conversations with people who are confused about this.
  2. The fact that there are complex rules about whether the METADATA requires file applies or source REQUIRE file applies. I wrote them and I don't remember what they are. Quick – don't look at the Pkg source code and tell me what the rules are.
  3. It's fairly obvious that having two possible sources for a fact is more complicated and confusing than only having a single possible source for it.
  1. When a package’s source is checked out somewhere, we should apply any registry updates to its config file.

Why? You haven't given any reason for doing this or problem that it solves, and 3-5 flow from this fairly weak motivation.

  1. Because of the above confusion.
  2. So that we don't need complex logic in the package manager to decide which applies.
  3. Because one generally wants to include those changes in new versions of the package.
  4. If one doesn't want to include those changes that fact is of interest – why doesn't the change to compatibility apply to downstream versions?

If config.toml is only the claim, at tagging time, about compatibility, then sure it can be an immutable part of a source release's content. But you shouldn't use the claim at tagging time indefinitely as the source of this information.

It's not indefinite – you can make a new compatibility release to update it. We can even allow making those updates in a registry without having to make a source version first since the source version can then be automatically made in the package repo. The point is that the tagged commit should exist at some point.

That would necessitate making a new source release for any change to its content, and if that process allows replacement of anything other than the compatibility info in config.toml, then it makes the primary job of the package manage, ensuring previously published releases can be depended on indefinitely, unreliable all around. This is my core objection to this, unconstrained replacement of previously published versions should not be a designed-in feature that's up to manual review in a registry to enforce.

I've said multiple times that the replacement need not be unconstrained. It's trivial to verify automatically that compatibility updates only make changes to Config.toml. What's the problem with that? How does this make anything unreliable? Older releases don't go away. If you're already using them, they aren't automatically changed or deleted. They are simply shadowed when looking for new versions to use. If v1.2.3+1 exists with updated compatibility claims, the package manager will consider that instead of v1.2.3 which has older, no-longer valid claims. I'm not sure what you're imagining here, but it doesn't reflect what I've said.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

What about the very realistic scenario that an organization wants to define specific versions/deps which are not public, some of which overlap with a JuliaLang registry. They could: 1) fork and fix all repos that disagree with their preferred dependency resolution, or 2) use an alternative package manager. Why not try to support 3) Override the dependencies.

Yes, this is definitely a consideration. I'm considering a naming convention for private versions that will ensure that they don't conflict with public versions. That's what I was getting at with the v1.2.3+hotfix version above. If we disallow build strings aside from bare integers (for compatibility-only updates) in public repos, then this would be guaranteed not to clash with any public version names.

I'm not saying "no dependency info allowed in the package repo"... I'm saying that the package repo should not be the definitive source... registries should take precedence. And since registries can take precedence, it's not required to keep deps info in the package repo.

I see what you're getting at, I think. That sometimes – e.g. in case like the above hotfix scenario – you want to make a registry update to dependencies and then let that flow back to the source repo. But the source repo is itself distributed, being a git repository. In that scenario, the company has their own copy of the package where the forked dependency info lives. Their private registry is just so that they can communicate that version internally. There needs to be support for alternate repository sources for situations like that so that you can find the relevant fork.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

I believe the current rule is that metadata is used when the local copy of a registered package is at a release tag, and the package's REQUIRE is used for unregistered packages or when a package is checked out to a branch or has local modifications. The latter scenario is ruled out by the immutability of installed packages design here. I've made a proposal to get rid of the unregistered special case.

The point is that the tagged commit should exist at some point

It would be useful some of the time, but I don't think it should be required - a compatibility update doesn't absolutely need to have its own independent identity, it is derived from an existing release.

Automated registry level enforcement could work, but that imposes a downstream cost on anyone who wants to maintain their own registry, they'd need to reproduce all that automation to ensure correctness.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Automated registry level enforcement could work, but that imposes a downstream cost on anyone who wants to maintain their own registry, they'd need to reproduce all that automation to ensure correctness.

I think this needs to happen no matter what. Having you personally check every registration request does not scale and we already have a lot of things we need to check when a new version is registered. There will only be more things to check in the future.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Having multiple sources for packages means that we can have official public forks and organizations can have internal private forks that are checked for commits all the time – no redirection needed. I don't think that comparing two source trees in git is particularly onerous for automation.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

@simonbyrne: yes, this is probably a good idea. Registry-signed tags make sense too.

from juleps.

tbreloff avatar tbreloff commented on May 20, 2024

registry itself maintain a fork of all the repositories

The other benefit is that the community could decide to tag/release without requiring the package author. There have been many times that people would have stepped up and tagged something while the author is on vacation (or whatever).

from juleps.

simonbyrne avatar simonbyrne commented on May 20, 2024

So as I understand it, a typical release process might look something like:

  1. Package author requests new release via some registry API
  2. Registry performs checks. If it fails we notify the author somehow
  3. Registry pulls data into its fork, tags and signs the tag.
  4. Registry contents are updated.
  5. All dependent packages are also checked for compatibility with the new package: their Config.toml files are updated to reflect the outcomes of this check.

Is that what you had in mind?

(these points are intentionally a bit vague, in particular point 5, but that is probably best discussed in a different issue)

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Yes, roughly, although I might order it like this instead:

  1. Package author requests new release via some registry API
  2. Registry pulls git data into its fork
  3. Registry performs checks. If it fails we notify the author somehow
  4. Registry tags and signs the tag
  5. Registry contents are updated

One issue with tagging is that IIRC, tags are only transmitted via push/pull, not via pull request, so it's still unclear how to get the tag into the origin repo. For GitHub repos, we could use the tag create API but that doesn't address non-GitHub repos. For those, I suppose we could either have platform-specific APIs or ask the repository owners to pull tags from the registry fork.

I'm also not sure where the best point for checking compatibility is. It could be part of the checks step – if it's a patch release, it shouldn't break any packages that depend on it. We could verify that before accepting a version.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Also, note that git tags are usually for commits not trees, so if we use tree tags (which is possible), it will be a bit unusual. We may want to tag a commit for convenience but associate the version with a tree rather than a commit.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

If the checks fail, you'd need to back out pulling into the registry fork and redo it after the author addresses the issues.

This is getting to be a lot of machinery to expect small organizations to maintain their own instances of.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Why would you need to back anything out? Git commits are immutable.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

Not everyone has enabled branch protection - people do occasionally force push to master of packages. They shouldn't be doing that, but if they do we wouldn't want it to mess up the registry's fork.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Force pushing a branch doesn't destroy commits, it just changes the commit that a branch points at.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

Depends exactly what "pulls git data into its fork" means then, and where the checks happen. If checks happen in a completely from-scratch clone wherever it's running and don't push anything back to the github copy of the fork unless the checks pass, then it's fine. Pulling into an existing clone's master after a force push is where things can go wrong.

from juleps.

simonbyrne avatar simonbyrne commented on May 20, 2024

Pulling into an existing clone's master after a force push is where things can go wrong.

I think "pull" may be the wrong word here: the metadata fork I envision the process as something like the following:

git fetch upstream
git checkout HASH
# run tests
# if tests pass
git tag -s -m "..." vX.Y.Z
git push registry vX.Y.Z

(here upstream and registry are the respective remotes). In other words, no branches are involved. This doesn't solve the problem of getting the tags back to upstream, but I don't know if that is such a big deal as the user won't be pulling from it.

I'm not sure about the commit vs tree hash issue, but my experience has been that trees are often harder to work with as they're not really a "user facing" feature of git.

Also, I'm not really sure how we would handle non-git sources either.

from juleps.

simonbyrne avatar simonbyrne commented on May 20, 2024

one other thing to think about: who "owns" the version numbers. In what I outlined above, it would be the registry, not the package (as emphasised by the fact that it is the registry signing the tag).

I'm not sure how this would work in the case of a package being in multiple registries (who decides whether or not it is a valid version?)

from juleps.

tkelman avatar tkelman commented on May 20, 2024

I will make the wrong choice so that we can argue about it.

Was that really necessary? "This sort of response is not constructive" either.

It's fairly obvious that having two possible sources for a fact is more complicated and confusing than only having a single possible source for it.

We haven't actually solved this problem if everything is duplicated in both the registry and the package. One should take priority over the other. If we design this whole system to ensure they're equal in most normal usage, you still need to pick which to use in case of local divergence or development. Local development probably points to preferring the package's copy, but how local development is supposed to fit with the rest of Pkg3 has not yet been described here.

One of the copies of this information is a duplicate and somewhat redundant. It sounds like we're moving towards a very registry-driven design. In use cases other than local development, the package's copy (and upstreaming registry-driven compatibility changes back to it) is fairly vestigial. You want to be able to do dependency resolution without having to first download every version of every package. How would version resolution work on an unregistered package? Right now, unregistered packages have no versions - how would Pkg3 change that?

Archiving past versions is a good idea, but doing so by having every registry also maintain git forks of all its packages is making our "github as cdn" abuse worse.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Yeah, tagging versions is complicated. We may need a "two phase commit" process.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

I will make the wrong choice so that we can argue about it.

Was that really necessary? "This sort of response is not constructive" either.

My point is that your attitude to this discussion has been fundamentally uncharitable and contentious. In this particular instance, there are two ways to do a thing, and instead of giving me the benefit of the doubt that I'm not a moron and will pick the one that works, you assume that I'll do the wrong thing and then argue with me based on that assumption. This attitude is frustrating, comes across as disrespectful, and mires us in unnecessary arguments instead of collaborative exploration of the solution space to find something that addresses everyone's concerns.

We haven't actually solved this problem if everything is duplicated in both the registry and the package. One should take priority over the other.

Replicating immutable data isn't a problem. That's the principle behind git and most other successful distributed data stores. Having multiple copies is only a problem if they are mutable.

It sounds like we're moving towards a very registry-driven design.

Quite the opposite. If anything, the package repository is primary and registries are just copies of immutable, append-only metadata about package versions, copied from the packages.

How would version resolution work on an unregistered package? Right now, unregistered packages have no versions - how would Pkg3 change that?

This is a good question. I was considering just using tags for versions in unregistered packages. But of course, you generally don't want to bother tagging versions if your package isn't registered, so I'm not sure what the point is. Instead, I think one would just use an environment file in the git repo to synchronize unregistered packages in lock-step (a la MetaPkg), but their dependencies on registered packages can be looser via compatibility constraints in the unregistered package repos.

Archiving past versions is a good idea, but doing so by having every registry also maintain git forks of all packages is making our "github as cdn" abuse worse.

How else would you do this? If you want to keep an archive of a package's git history you have to make a fork of it in case it goes away at some point. Using git for source delivery has problems, but that's an orthogonal issue.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

Maybe we should separate the two jobs of a registry:

  1. Validation: checking that a proposed version makes sense – that it satisfies various checks.
  2. Collection: keeping package and version metadata in a centralized location.

The former is the part that requires intelligence and automation while the latter is dead simple.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

There are many more than 2 ways to do something that is "intentionally a bit vague" and unclearly specified. I've been contentiously arguing against aspects of the design that I don't think will work. Several of which it looks like we've moved away from, but it took discussion. Take it at technical face value, please.

Dependency resolution can require global information, which is why registries contain compatibility information for all past versions. Getting the equivalent set of information if the package copy is the primary source would require either downloading all versions, or getting information out of git for many versions simultaneously in a way that we don't currently do anywhere to my knowledge. The latter would make the goal of allowing packages to not have to be git repositories less feasible.

If we're only archiving releases that get published to a registry, then why would the git history be needed? If packages are immutable after installation then they can just be source tarballs, and an archive can work like most conventional package managers, just a collection of source release snapshots.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

I was actually thinking of separating them entirely. I.e. first you submit a proposed version to various validation services: services that check things like that the proposed version metadata is well-formed, that its tests pass, that it works with various versions of its dependencies, that it doesn't break various versions of its dependents. Once you've got ok/error from a validation service or services, you can go to a registry and submit that and then the check at the registry is just that the sufficient set of validations have passed. I can even imagine private packages being submitted to cloud-hosted validations services and then registered privately. The set of validations that a version has passed can be attributes of the version; people can filter packages/versions based on validations that it has.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

If we're only archiving releases that get published to a registry, then why would the git history be needed? If packages are immutable after installation then they can just be source tarballs, and an archive can work like most conventional package managers, just a collection of source release snapshots.

If someone deletes their git repo, we want to be able to make another full git repo the new source of the package. We need a fork to do that. I'm not sure why you're arguing this point.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

I'm not sure what your point about global version information is.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

Don't we also want to make Pkg3 robust against the "package developer force pushed over master" scenario? So tags need not all be linear have common descendants? We'd want it to be possible to restart development from a non-git copy of a deleted repo with a fresh git init from scratch, wouldn't we? (Or the "rebased to remove large old history" situation that has come up a few times.)

The scheme of propagating tags through forks sounds overly complex and unnecessary, and a lot to set up to run a registry. And now we have multiple mutable remotes for any given package - this could get confusing in terms of issue and PR management, if all the downloads are coming from a fork that users should actually ignore.

The point about global version information is that the head copy of a package's compatibility contains less information than the registry's copy. Except for the author at tag time, everyone else could delete the package's copy and not notice. "Package is primary" is the remaining item of dispute here, afaict.

from juleps.

StefanKarpinski avatar StefanKarpinski commented on May 20, 2024

I agree that propagating tags through forks is complicated and maybe impractical. We'll have to see. The main thing we need is copies of the git history for the commits behind various tagged versions, but that could be a separate process from registration.

from juleps.

tkelman avatar tkelman commented on May 20, 2024

If we have a reliable registry-controlled mechanism of obtaining a copy of the release snapshot source with a matching checksum, does it actually need a copy of the git history? Thanks to github it's oddly easier to get straightforward hosting of a full git repo (up to its size limits, anyway) than it is to host arbitrary non-git source snapshots, but I wonder whether we're letting that ease of use drive the design decisions.

from juleps.

martinholters avatar martinholters commented on May 20, 2024

Wouldn't future support of non-git-based packages be problematic if releasing a version would include cloning its git history? Ok, of course one could replace that with "cloning its version history in whatever VCS is being used", but that would make registries much more complicated, as they would have to accommodate every VCS used by packages they want to register.

from juleps.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.