Code Monkey home page Code Monkey logo

Comments (9)

vsoch avatar vsoch commented on June 12, 2024

As a maintainer of several projects, the rebuild is exactly that - to make sure it rebuilds and if it does not, to fix it. For the most part if you choose a registry that doesn't have a purge policy, your old containers will persist. And 100% - our resources are moving targets and part of our skill set is to move and be flexible to the changing landscape as needed, especially since most resources are provided to us for free.

from ten-simple-rules-dockerfiles.

sdettmer avatar sdettmer commented on June 12, 2024

@vsoch You are doing very well, but I think this is not a property of reproducibility, but about maintainability. In no case you can rely on external registries and especially not if they are provided for free! Here, obviously someone else pays and probably there is no sufficient contract. Maybe the contract contains "can be terminated at any time" and thus directly shows how reproducibility can break.
In the last few years there were quite a few such breaks, at least in web development, for example packages were removed from node.js repository, so it is absolute daily practice that registries are not reliable, In some years maybe Docker Inc goes bankrupt and no one pays the bills for Docker Hub, because people moved forward to OCI-ng or whatever we will have then.
You habit helps a lot here: you notice such issues early and keep your systems maintainable, you benefit from new features and you will follow reasonable migrations (who knows what advantages OCI-ng will have), no doubt that this is great and correct - but it does not help reproducing previous work. So it is a very good rule, but not for reproducibility, I think.

from ten-simple-rules-dockerfiles.

vsoch avatar vsoch commented on June 12, 2024

Sure, I can agree with that. I think I do prioritize maintainability. I don't believe reproducibility is something actually achievable, but rather an academic concept that has exploded in trendiness and thought discussions like this one. People can argue about details, but (in practice, or in my opinion) it doesn't accomplish much (at least for what I need to do, which is reliably develop and provide software).

And I definitely never touch node.js - there is something to say about most packaging ecosystems being unsafe or unstable, but that one in particular is really terrible.

And we've already hit the "Docker Hub is unreliable" point - they have flirted with questionable policies. I've moved all my builds to Quay, and then also to GitHub packages, and I'm ready to move again if needed. Also, Docker is an OCI registry, so it's still in the bucket of "OCI-ing."

from ten-simple-rules-dockerfiles.

sdettmer avatar sdettmer commented on June 12, 2024

(EDITED)

@vsoch Yes, I see. Let me repeat that I know you chose best approaches for the given tasks and I see of course the high priority of maintainability. I know that much science is not reproducible, especially in physiologic science, but this is no topic for me.

In practice, there is a great reproducibility project (initiated from people close to Debian Linux) and they did really a lot, for everybody. Not every package, but much more than a bootable Linux system can be built reproducible (and they needed to fix a lot of stuff for that).

Another great example is that it was possible to build open source compilers from step by step from old sources (ranging back to the 80ths I think). In theory, a compiler could have an automatic backdoor generator and add a backdoor automatically to your software. Ahh, but this could be verified by looking at the source code of the compiler, right?. But wait, if the parent compiler (the one you used to compile the compiler) had a backdoor adding the backdoor generator to your compiler, then you application still could have a backdoor! This is problem quite hard to solve. It made it hard to convince some people that open source has a meaning at all ("you can never trust the compiler, so nothing matters" - ohh, and these people are less in science but more in agencies, I guess)
But if you can build also this compiler from source and its parent compiler, and grandgrandmom compiler and so on, it makes it very unlikely that e.g. NSA added backdoors to that n*grandmom compilers, because back then (in the 80ths or so) it had been impossible to built in hidden backdoors in compilers that propagates hidden backdoors to the created compilers down a road of many compilers, alone look how small the compilers in the 80ths were, and they cannot contain multiple levels of backdoors. Also, they used different paths over different compiler vendors (now you can get the source code of formerly closed source compilers from the 80ths or 90th). They showed, that is does not matter which compilers to use in which order, always the same compiler binaries fall out. And these produce the same binary software (hashes) like e.g. the gcc from the Azure Debian image.

By this, we and everybody in the world can now trust the evaluated gcc compiler version, and every source-code inspected other version that can be built directly or indirectly from it. This may not sound important in first place, but for security it is a very very important piece to be able to trust the compiler (and may make agencies invest in open source). This work involved a lot of reproducibility (reproduce old binaries to prove that they are OK).

Another practical example is reproducing the Signal App APK. When you install Signal from Play Store, how do you know that the code provided as open source was really used to produce the APK file? Maybe NSA replaced it and you get a hidden backdoor! But you can reproduce the APK (in theory, when I tried it, it did not work, but others did). If the APK you built is the same and the source code OK (i.e. does not add odd behavior based on file timestamps or signature keys) and you have the same binary, you can trust the version you installed from App Store (which still does not solve the problem, Android could have backdoors, but same can be done here, and I think, had been done).

This is practically done also for cryptographically-signed software. Due to the signatures you cannot build a software, load it and expect it to be running, even if you have escrow and everything. This is because the network operator cannot not trust you not to do bad things such as modifying the code, so only software signed by the operator can be loaded (and ran). So you cannot use the binaries you built from the certified source code, but how you can trust the binaries from the operator anyway?

If you can reproduce the same, then they used the same source code for the binaries you loaded (you might need a tool to normalize the signature parts of the binaries, and often some timestamps, for example by setting all to zero or cutting of some header, and trust it), then you know that the software source code you got for verification (usually of course contracting a special company for the verification).

from ten-simple-rules-dockerfiles.

vsoch avatar vsoch commented on June 12, 2024

Building from source isn't a guarantee of anything, I don't think. If you are interested in building from source check out spack - it does that, although I don't use it for my package manager. I don't think there is any assumption you can trust a compiler or any piece of software - you have to trust the supply chain or humans that curated it before providing to users. Check out chainguard.dev, a company doing cool work around securing supply chains.

And I can't comment - I don't install most apps from any app store for the exact reason of security.

You make good points but I don't think a repository for a paper written years ago for introduction to writing Dockerfiles is the place for it. :)

from ten-simple-rules-dockerfiles.

sdettmer avatar sdettmer commented on June 12, 2024

@vsoch Yes, this is not a proposal to add something, but to move the rule to another place :)

My motivation is that good information about reproducibility is rare; for example Docker claims to be a tool for exactly reproducibility but if it fully incapable of doing anything reproducible, isn't this amazing?

So these few who care could share the few bits they have :)

from ten-simple-rules-dockerfiles.

vsoch avatar vsoch commented on June 12, 2024

As I stated earlier, I think true reproducibility is nothing more than an academic topic of discussion, and it's unfair to say that containers have not done amazing things for the simple act of doing something once, and doing it again. Getting into these level of details feels like nitpicking and loses sight of this credit. And I don't think anyone is saying containers are the ultimate solution for reproducibility, but rather a tool (like any other tool) and abstraction that can greatly help given other means are limited. The level of reproducibility you are describing, if it's possible, is not in the scope of the user base that this paper is geared toward, in my opinion. We don't need things to be perfect, we need to make our best efforts and do the best we can. To say containers are useless because they are not perfect by some impossible standard is wrong.

And the paper is published so I don't think we can "move" anything but you indeed could create another resource or means to write about the level of reproducibility that you are striving for. I'd request a section alongside each point about the feasibility of each idea for the average person without access to privileged resources.

from ten-simple-rules-dockerfiles.

sdettmer avatar sdettmer commented on June 12, 2024

@vsoch Yes, Docker did amazing things, but support for SOURCE_DATE_EPOCH from day 1 had been cheap and extremely powerful, I think. Containers are not useless, there are tools that build docker-compatible containers in a reproducible way (in contrast to docker build). I think that docker cannot build reproducibility clearly is a bug, that's all.
I don't want to do anything about the paper, I didn't even read the license of this work, I assumed everybody could fork the repo and create a newer version. Regularly use and rebuild the paper, so to say.

Also its fine not to use all my input, possibly letting others do if they want.

Actually I assumed this before, so that's why I wrote a focused mail. Its not very efficient to let my kids wait and I explain my experiences if they are not wanted. Fine for me, in any case.

Please let me repeat that I don't want to critise the paper and specially not your work or any of the authors, there are very good practical rules, maybe just the title suggest a bit too much, and its all fine for me. I just tried to help :-)

Have a nice evening!

from ten-simple-rules-dockerfiles.

vsoch avatar vsoch commented on June 12, 2024

Gotcha, thanks @sdettmer ! Indeed it will be interesting to have further discussion with more of my co-authors and other community members - I am likely the wrong one to respond because I am less of a researcher concerned with workflow reproducibility and more of a software engineer that assumes the ecosystem is fragile and I do my best to add duct tape :)

from ten-simple-rules-dockerfiles.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.