Code Monkey home page Code Monkey logo

Comments (13)

keith-packard avatar keith-packard commented on June 16, 2024 1

As someone who maintains gcc-based toolchains for other projects (debian), and has been hacking autotools-based projects for well over 20 years, you're experiencing how people commonly used autotools 'back in the day'. You'd ship a generated configure script because that's what was expected. And that often meant that the generated script was checked into the VCS so that a bare check-out would exactly match the distributed tarballs. GCC is about as legacy a project as you will ever see, and they've stuck to this practice for a very long time.

Most other autotools-based projects changed to delivering an 'autogen.sh' script and expected users to run that to get the required configure script. Heck, there's even 'autoreconf' these days for this job.

However, GCC has very strict requirements about which autotools version you can use to generate the scripts; older or newer versions often simply fail because autotools doesn't guarantee backwards compatibility. Because of this, GCC is usually many versions behind the default autotools versions provided on most systems. For someone simply building the compiler, it's far more reliable to use the provided scripts than attempt to generate them locally.

Yes, this places a huge burden on anyone hacking on the compiler; as @stephanosio says, you end up installing the precise autotools versions required for GCC so that the generated scripts match what's in the VCS. But, once you've got it set up, things are fairly straightforward, if a bit icky -- you hack the source code, re-build the generated scripts and commit both together. With luck, the diffs to the generated scripts are easy to manually verify. And, yes, there is a strong temptation for those doing a drive-by change to simply manually edit both the source scripts and the generated scripts. Which means that when you review patches to the autotools scripts, the best practice is to apply the source script patch and then verify that the generated script patch matches.

If you've ever looked at the autotools scripts that gcc uses, you'll probably understand why there hasn't been any serious attempt to replace them with cmake or meson. For every horribly ugly little kludge, there's someone who depends upon the existing behavior to get their work done.

from sdk-ng.

stephanosio avatar stephanosio commented on June 16, 2024 1

But why not simply write down a sequence of exact steps (i.e. commands) necessary to build one toolchain?

Sure, that could be a good starting point; though, keeping it up to date and making it actually work locally would be easier said than done. It should be quite doable targeting a very specific environment though, as you have mentioned.

Ideally, snippets could even be factored-out to external scripts that can be used by both CI and by users.

Actually, this used to be the case (there used to be a script that was used by CI and could also be used locally to invoke crosstool-ng and Yocto build processes).

That script was removed with the addition of macOS and Windows host support because the CI infrastructure and the build process were too closely coupled for this to be practical (and, at the time of writing the CI workflow for all three major host operating systems, I did not have a very good idea of what it would look like at the end).

Now that ci.yml is fairly well established and stable, we could consider refactoring its build steps out to external script(s) that can also be used locally. I have nothing against such an approach.

At this time, I do not have any spare bandwidth to take on such an endeavour; but, if someone is willing to put their effort looking into it, I would be more than glad to review and provide feedback.

from sdk-ng.

cfriedt avatar cfriedt commented on June 16, 2024 1

This would be a good doc to link to
https://crosstool-ng.github.io/docs/build/

Might be good to include the part about ct-ng build RESTART=<step> STOP=<other_step> or ct-ng libc_headers etc.

from sdk-ng.

cfriedt avatar cfriedt commented on June 16, 2024 1

Probably would be good to mention CT_DEBUG_CT_SAVE_STEPS=y is necessary to restart builds at a saved spot when they fail.

from sdk-ng.

stephanosio avatar stephanosio commented on June 16, 2024

maintainability: do not check in / edit generated code

We never edit generated code (configure scripts). Everything in configure script is generated using autoconf from the relevant source files.

Currently, the version of gcc that is contained in the Zephyr SDK (https://github.com/zephyrproject-rtos/gcc) contains some generated code that is checked-in (e.g. ./configure scripts).

This is nothing particular to the Zephyr SDK. Upstream GCC does this, and so do all the other projects that use the GNU build system.

This requires an additional manual step of regenerating the ./configure script from configure.ac (and many other support files) via autoreconf that may or many not be easily reproducible (e.g. the default autoreconf in Ubuntu might not work, it might be necessary to get the latest from gnu).

What I (and many others working with the GNU build system) do is to have the various common autoconf versions installed under their own prefix (e.g. /opt/autoconf-x.y) and add their bin directory to the PATH in the RC file for each build environment. This should be simple enough to do.

The main issue is sustainability; rather than the build process being predictable and linear, it becomes unpredictable, non-linear, and not really stainable.

What part of it becomes unpredictable? I understand that it can be hard to follow at first for the people who are not familiar with the GNU tooling; but, it is a fairly standard and very predictable process.

Without having specialized tools or domain specific knowledge, or a particular build machine or version, it makes it difficult for developers to make successful PRs to the SDK.

There is nothing particular to Zephyr SDK about how this works. This is just how the GNU build system works. It is not pretty, it is extremely outdated and far from ideal; but, I am afraid nobody has time to overhaul the entire GCC codebase to a different build system ...

It's generally bad to check generated code into version control

It certainly is not good to check in generated code into VCS in general; but, that is the standard process for the upstream GCC, and we are not going to deviate from that.

generally worse to require either manually patching the generated code or some specialized knowledge about how to do it.

Once again, we do not edit any manual patching to the generated code (configure scripts).

So I would like to just request that we do not check in generated code (in the form of ./configure scripts and so on), and instead insert (or populate) a dedicated step in the build process to simply regenerate those scripts.

Sorry, but we are not going to deviate from the upstream GCC process for this.

from sdk-ng.

cfriedt avatar cfriedt commented on June 16, 2024

maintainability: do not check in / edit generated code

We never edit generated code (configure scripts). Everything in configure script is generated using autoconf from the relevant source files.

Patching via git is ~equivalent to manually editing generated code.

In particular, given that the output of autoconf can and does vary from one machine to another, not only based on the release of autoconf but also based on the presence of other tools that it uses, it's a bit of a slippery slope.

Currently, the version of gcc that is contained in the Zephyr SDK (https://github.com/zephyrproject-rtos/gcc) contains some generated code that is checked-in (e.g. ./configure scripts).

This is nothing particular to the Zephyr SDK. Upstream GCC does this, and so do all the other projects that use the GNU build system.

That's a fallacy.

Most projects that use the GNU build system only generate the .configure script when release tarballs are generated.

https://stackoverflow.com/a/3291181

My guess as to why GNU started doing this for GCC / Binutils was that enough previous tarball users complained that it wasn't there after they had switched to RCS.

Generally, it's bad, but it has clearly snowballed well out of control.

What I (and many others working with the GNU build system) do is to have the various common autoconf versions installed under their own prefix (e.g. /opt/autoconf-x.y) and add their bin directory to the PATH in the RC file for each build environment. This should be simple enough to do.

^^ This should be documented somewhere. Actually, so should the entire process.

The main issue is sustainability; rather than the build process being predictable and linear, it becomes unpredictable, non-linear, and not really stainable.

What part of it becomes unpredictable? I understand that it can be hard to follow at first for the people who are not familiar with the GNU tooling; but, it is a fairly standard and very predictable process.

The last time I had to fix the build, it was because I had no way of predicting what was in the (private) AWS caches that are used by Zephyr's SDK builder. This was after painstakingly trying to reproduce what was done in CI for some time. Like weeks of effort due to what should have been easy to reproduce following some simple steps.

If it's as predictable as you suggest, the please document the steps to manually reproduce builds.

There is nothing particular to Zephyr SDK about how this works. This is just how the GNU build system works. It is not pretty, it is extremely outdated and far from ideal; but, I am afraid nobody has time to overhaul the entire GCC codebase to a different build system ...

There is domain specific knowledge (see your paragraph above).

There is no need to overhaul anything. Autotools may not be pretty but they do work.

However, currently, the documented process to build it is to make a PR to the Zephyr project.

That effectively creates a black box (due to insufficient diagnostics / privileged access / private AWS caches).

Once again, we do not edit any manual patching to the generated code (configure scripts).

Submitting patches to the configure script via git is equivalent to manually editing the generated code. It's bad practice in any case, whether upstream is doing it or not.

It (at least) doubles the amount of work that needs to be done for changes to the SDK.

Likely far more than 2x though, e.g. it took me maybe a couple of hours to edit the necessary .ac / .m4 files , and now it's going on several days of debugging the build (in CI as a black box).

The last time I had to fix something that was broken in the SDK, it took me weeks. Eventually, I realized it was due to a deprecated release of zlib or something like that. The tarballs still existed in Zephyr's AWS cache though, so the build actually succeeded in unpredictable ways.

I've been building autotools packages for close to 20 years. If it isn't obvious to me how to build the SDK, then how do you expect it to be obvious to a newcomer?

Please document the manual build process, even if that is only for a single host / target.

from sdk-ng.

stephanosio avatar stephanosio commented on June 16, 2024

Patching via git is ~equivalent to manually editing generated code.

It is not. The script is checked in as is without any modifications/patches into git.

In particular, given that the output of autoconf can and does vary from one machine to another, not only based on the release of autoconf but also based on the presence of other tools that it uses, it's a bit of a slippery slope.

That is not true. The configure script states which version of autoconf was used to generate it -- as long as you use the same version, the output should be exactly the same.

Most projects that use the GNU build system only generate the .configure script when release tarballs are generated.

That is arguable. Many projects still include the pre-generated configure script in tree for convenience as well as for "predictability" (because you do not want a bunch of bugs saying "build fails because every developer is using a different version of autoconf").

Submitting patches to the configure script via git is equivalent to manually editing the generated code. It's bad practice in any case, whether upstream is doing it or not.

I am having a very hard time understanding how that works; but, either way, this is not a decision made by me or anyone else working on the Zephyr SDK -- it is the decision made by upstream GCC and, as a downstream project using GCC, sdk-ng is not going to deviate from that.

If you have a problem with this, please email the GCC mailing list.

The last time I had to fix the build, it was because I had no way of predicting what was in the (private) AWS caches that are used by Zephyr's SDK builder. This was after painstakingly trying to reproduce what was done in CI for some time. Like weeks of effort due to what should have been easy to reproduce following some simple steps.

I do not understand why the AWS cache matters here. The source tarball cache is literally a directory with the tarballs downloaded by crosstool-ng (that is uploaded after a crosstool-ng run).

If it does not exist locally, crosstool-ng will download everything from the scratch (i.e. it will be 100% locally reproducible as long as you have a working internet connection and none of the mirrors are broken -- see below).

Likely far more than 2x though, e.g. it took me maybe a couple of hours to edit the necessary .ac / .m4 files , and now it's going on several days of debugging the build (in CI as a black box).

I think you are mixing up CI and crosstool-ng. The CI itself is pretty much just a wrapper around crosstool-ng (and Yocto for building host tools).

All the toolchain builds are done through crosstool-ng with the configs located inside the sdk-ng tree. Anyone familiar with crosstool-ng should be able to build the sdk-ng toolchains using the crosstool-ng toolchain config files (configs/*.config) in tree without much effort, as long as you have installed "the right set of packages," which is the the hard part because everyone has different working environment.

The last time I had to fix something that was broken in the SDK, it took me weeks. Eventually, I realized it was due to a deprecated release of zlib or something like that. The tarballs still existed in Zephyr's AWS cache though, so the build actually succeeded in unpredictable ways.

If you are talking about the local crosstool-ng run failing to download the source tarballs from broken mirrors, that happens. In fact, that was one of the reasons why the cache was introduced in the first place, aside from the download speed. I am afraid no amount of documentation is going to fix a broken third party mirror ...

Please document the manual build process, even if that is only for a single host / target.

I think the missing link here is crosstool-ng. You may be familiar with how autotools work; but, you do not seem to be very familiar with crosstool-ng which sdk-ng uses to build toolchains -- if you are, you would have probably looked at the crosstool-ng output logs and manually invoked the gcc configure script with the exact command line that was used by crosstool-ng (yes, it is there in the logs); in which case, you do not have to go through the whole ordeal of waiting for CI (or local crosstool-ng run for that matter) to re-build everything from the scratch -- instead, you can just check out https://github.com/zephyrproject-rtos/gcc/ and directly build and debug GCC locally.

I can try to document hints like these in the FAQ for those who are not familiar with crosstool-ng. I suppose this should lessen the amount of frustration for newcomers who do not have much experience working with embedded toolchains -- though, crosstool-ng is a fairly standard tool for generating embedded cross compiler toolchains; so, many people contributing to sdk-ng tend to already have working knowledge of it, which I suppose is why we have not had much problem in the past with third-party PRs to sdk-ng from many people ...

As for documenting the whole process, I am afraid "take a look at what ci.yml does" is going to be the best answer unless someone is willing to dedicate their time translating the YAML language in ci.yml to English language ...

As for things seemingly randomly breaking, I am afraid no amount of documentation is going to ease the pain with that. Even I, as a maintainer of sdk-ng, sometimes spend days troubleshooting weird CI, crosstool-ng, gcc build system, binutils build system, third party mirrors, or whatever-other-crap-in-the-whole-chain breakages.

from sdk-ng.

cfriedt avatar cfriedt commented on June 16, 2024

I'm not sure if one or two lines in a FAQ is sufficient.

It would be nice to know exact steps to build a toolchain.

What is maybe obvious to you likely is not obvious to others.

from sdk-ng.

cfriedt avatar cfriedt commented on June 16, 2024

For reference, the following patches were required when building the 0.15.2 SDK manually. The CI build only worked because of deprecated packages (some with security vulnerabilities) being in the AWS cache.

Not that I'm saying the documentation should include transient patches, but it would be nice if someone didn't need to extrapolate everything out to a bash script to make SDK builds easily reproducible.

https://github.com/cfriedt/zephyr-sdk-builder

0000-crosstool-ng-update-to-zlib-1.2.13.patch
0000-poky-fix-io-fwide-issue-in-cross-localedef-native-2.27.patch
0001-crosstool-ng-update-to-expat-2.5.0.patch

from sdk-ng.

stephanosio avatar stephanosio commented on June 16, 2024

I'm not sure if one or two lines in a FAQ is sufficient.

The FAQ could be more comprehensive. Here we already have a few candidates from the above.

It would be nice to know exact steps to build a toolchain.

What is maybe obvious to you likely is not obvious to others.

The problem is that it is not obvious to me what is not obvious to others, and it is very difficult to decide where the documentation should begin and end (e.g. should the documentation cover crosstool-ng 101, working with GCC, or even fixing the problem with a mirror that replaced an existing source tarball with the same exact filename/version number?).

The only fundamental solution to that is to provide a very detailed documentation on the whole process; which, as I said above, will require significant amount of effort from a willing party -- I just do not have the bandwidth to write such a detailed documentation (or book).

At least, the (somewhat implicit) expectation for sdk-ng contributors up until now has been that they have some experience working with embedded toolchains (and hence likely with crosstool-ng) in one way or another; and, if they had any questions specific to sdk-ng, I have answered them in the issues/PRs or privately in chat.

from sdk-ng.

cfriedt avatar cfriedt commented on June 16, 2024

As someone who maintains gcc-based toolchains for other projects (debian), and has been hacking autotools-based projects for well over 20 years, you're experiencing how people commonly used autotools 'back in the day'.

@keith-packard - as someone who has maintained gcc-based toolchains for other projects for the last 20 years (Gentoo based, Yocto based), I'm fairly confident in labeling my experiences.

Again, the point of this issue isn't trying to categorize the user. It's simply asking for better documentation and / or to improve the build process.

You'd ship a generated configure script because that's what was expected. And that often meant that the generated script was checked into the VCS so that a bare check-out would exactly match the distributed tarballs.

GCC is about as legacy a project as you will ever see, and they've stuck to this practice for a very long time.

Most other autotools-based projects changed to delivering an 'autogen.sh' script and expected users to run that to get the required configure script. Heck, there's even 'autoreconf' these days for this job.

The source-based distros that I use typically regenerate generated code as part of the build process (mostly always). As a result, it is significantly easier to maintain the toolchain as the process is (again) linear - does not really hide any skeletons, etc.

So whether or not a particular project checks in configure to revision control is mostly irrelevant to the people building it on a regular basis.

However, GCC has very strict requirements about which autotools version you can use to generate the scripts; older or newer versions

Yes, I've been told by both Stephanos and by our version of GCC that very specific autoconf versions need to be used.

There are 2 problems there:

  1. The suggested versions differ, and
  2. Neither version seems to work

If only there were a sequence of documented instructions .. 🤔

often simply fail because autotools doesn't guarantee backwards compatibility. Because of this, GCC is usually many versions behind the default autotools versions provided on most systems.

For someone simply building the compiler, it's far more reliable to use the provided scripts than attempt to generate them locally.

Yes, which is why GCC ships generated code / checks it into version control.

Most autotools projects only do this when creating a release tarball.

Yes, this places a huge burden on anyone hacking on the compiler;

Exactly - so why not lessen that burden?

  1. with some proper documentation, and
  2. by regenerating generated sources as part of the build process (using the approach source-based distros have used for decades)

The latter suggestion was where this issue started. While it would make everyone's lives significantly easier, that was deemed too much work by @stephanosio, so now we are left with door number 1.

-- you hack the source code, re-build the generated scripts and commit both together. With luck, the diffs to the generated scripts are easy to manually verify. And, yes, there is a strong temptation for those doing a drive-by change to simply manually edit both the source scripts and the generated scripts.

Again, there is this misconception that I haven't also been working with gcc fairly intimately for the last 20 years..

The only reason I've done the latter is because the suggested ways have not worked.

If you've ever looked at the autotools scripts that gcc uses, you'll probably understand why there hasn't been any serious attempt to replace them with cmake or meson.

I'm perfectly comfortable with autotools and the autotools scripts in gcc and (again) have been working with autotools projects and gcc for 20 years. I am far more familiar with autotools than CMake or meson.

For every horribly ugly little kludge, there's someone who depends upon the existing behavior to get their work done.

Sure...

I guess my argument here is that life can be made significantly easier with proper documentation.

Personally, when I contribute to a project, if the instructions are:

  1. Make a PR
  2. See if it works

I'm going to be skeptical about it.

Since it became significantly more complicated than that, and since I needed to manually set up a build environment to match what was in CI so that I could manually diagnose what the problem was, I thought it would be wise to ask for some documentation about how to manually set up a build environment to match what was in CI.

It was essentially the same gripe I had when I needed to build the SDK manually last time.

This correct resolution of this issue isn't about "maybe you've never contributed to an autotools project / gcc before", or "what reasons are there to not write proper documentation?"

It's more along the lines of, "yes, there is a conventional build flow, and here is a page that describes that".

With that, there is at least some starting point at an intuitive location for people, and a place to put knowledge that is otherwise maybe only just in @stephanosio ' head at the moment.

from sdk-ng.

cfriedt avatar cfriedt commented on June 16, 2024

The problem is that it is not obvious to me what is not obvious to others, and it is very

Fair enough.

I would suggest starting from first principles with some assumptions. Try to solve a much smaller version of the bigger problem.

E.g. user has an Ubuntu Linux environment, e.g. build/host is x86_64, target is e.g. arm. Must install these .deb's, must manually build this version of that tool...

Even documenting an existing container image to run that has some of these things built already?

documentation cover crosstool-ng 101,

Crossdev-ng has decent documentation already, so a link could be sufficient.

working with GCC, or even fixing the problem

There are already links to GCC and they have docs already.

The only fundamental solution to that is to provide a very detailed documentation on the whole process; which, as I said above, will require significant amount of effort from a willing party -- I just do not have the bandwidth to write such a detailed documentation (or book).

Well, that's one option. Maybe a detailed doc like that would be good overall, conceptually, but it's probably more work than necessary.

But why not simply write down a sequence of exact steps (i.e. commands) necessary to build one toolchain?

Ideally, snippets could even be factored-out to external scripts that can be used by both CI and by users.

People can extrapolate from there. If someone wants to build the macos tools, some optional steps could be added later.

At least, the (somewhat implicit) expectation for sdk-ng contributors up until now has been that they have some experience working with embedded toolchains (and hence likely with crosstool-ng) in one way or another;

Please, feel free to continue making that assumption or not.

It should be mostly irrelevant though.

from sdk-ng.

stephanosio avatar stephanosio commented on June 16, 2024

Yes, this places a huge burden on anyone hacking on the compiler;

Exactly - so why not lessen that burden?

  1. with some proper documentation, and
  2. by regenerating generated sources as part of the build process (using the approach source-based distros have used for decades)

The latter suggestion was where this issue started. While it would make everyone's lives significantly easier, that was deemed too much work by @stephanosio, so now we are left with door number 1.

I am not really sure where you got the idea that it was "deemed too much work" to regenerate generated sources as part of the build process.

All I said was "this is not a decision made by me or anyone else working on the Zephyr SDK -- it is the decision made by upstream GCC and, as a downstream project using GCC, sdk-ng is not going to deviate from that."

It is just as much of a good practice to, as a downstream project, not make arbitrary decisions deviating from the way the upstream project does things. It really has nothing to do with how much work it would be to generate these generated sources a part of the build process.

Since it became significantly more complicated than that, and since I needed to manually set up a build environment to match what was in CI so that I could manually diagnose what the problem was, I thought it would be wise to ask for some documentation about how to manually set up a build environment to match what was in CI.

First of all, this issue was initially opened for "regenerating generated sources [in GCC] as part of the build process," and later changed to "properly documenting the build process" -- these two are completely different and independent topics; so, let us try not to mix these up.

Regarding "generating generated sources [in GCC] as part of the build process," this is a deviation from the upstream GCC development process and I have voiced negative opinions about it for the aforementioned reasons.

Regarding "properly documenting the build process," I have already clarified in #723 (comment) that there is room for improvement (e.g. providing an FAQ); but, for a detailed "full" documentation, a willing party will need to dedicate a significant amount of their time for it to happen.

With that, there is at least some starting point at an intuitive location for people, and a place to put knowledge that is otherwise maybe only just in @stephanosio ' head at the moment.

Which part of ci.yml looks like "knowledge that is otherwise maybe only just in @stephanosio ' head" to you?

from sdk-ng.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.