A lot of people ship free and nonfree code in addon yum repos for Fedora. They add the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The larger problem is that <a class="user-mention notranslate" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Please support generation of AppStream metadata automatically,about rpm-software-management/createrepo_c

Comments (53)

Conan-Kudo commented on June 22, 2024 3

ostree+flatpak where AppStream metadata is baked right into the design

In the most generous case, where Flatpak takes over the world, it's still not covering everything, as there's still the host components needing AppStream metadata (like drivers, among other things).

In this regard, it still makes sense to be able to generate AppStream metadata for rpm repos. But at least if you do get around to doing it, I can review, test, and merge it. And naturally cut a new release with the feature shortly thereafter.

from createrepo_c.

Conan-Kudo commented on June 22, 2024 2

This missing feature is why the Fedora repositories don't work that way. If this feature was implemented, then Fedora would use it.

from createrepo_c.

Tojaj commented on June 22, 2024 2

@praiskup although not involved in the project for several years now, I would say the file list is generated just from the RPM header, see: https://github.com/rpm-software-management/createrepo_c/blob/master/src/parsehdr.c#L227-L309

from createrepo_c.

jsilhan commented on June 22, 2024 1

Hi Richard,
that probably make sense to do it automatically to eliminate the step of modifying repo but I am not sure whether it should be done by default. Adding ~--appdata option and weak dependency on libappstream-builder.so IMO should not hurt. @Tojaj , @megaumi what do you think?

Anyway if COPR supports generating appdata I personally don't see this as big issue. COPR should be preferred way of creating 3rd party repos (and the most convenient).

from createrepo_c.

jsilhan commented on June 22, 2024 1

@hughsie so I've talked with Tomas and the behavior for generating appdata could be by default and use a libappstream-builder lib but there should be compile time option enabling it. If you can implement it we'll be happy to do PR review and merge it.

from createrepo_c.

dralley commented on June 22, 2024 1

The larger problem is that @dralley and others working on Pulp are going to have a harder time too, because they want to work directly with createrepo_c for all metadata generation.

I'm not sure it makes a difference for us, the main issue will be that it requires the RPMs to be available at all times, which is slightly in conflict with one of the main features, which is to on-demand download them only when needed.

Plus we already have to deal with libmodulemd separately, and also we're not using the literal createrepo_c binary, where the support would primarily be added.

Our main requirements are just that it needs to be possible to hand the RPM file directly to the library one at a time to incrementally build up the xml instead of pointing it at a directory, but that's fundamentally an appstream requirement. It might already be possible, I haven't looked deeply at the APIs recently and can't look it up at the moment.

from createrepo_c.

Conan-Kudo commented on June 22, 2024

@hughsie In my view, I think it's completely fine, as long as there's a compile-time option for adding the libappstream-builder dependency. In fact, this would make things even easier for us in Mageia for supporting AppStream metadata, as well as many others who are consumers of createrepo_c.

from createrepo_c.

Conan-Kudo commented on June 22, 2024

I would suggest calling the switch --generate-appstream-md though, since that's what it would do. Putting this into createrepo_c also means you could do nice optimization tricks like only scan packages for AppStream data if they have appdata() or metainfo() Provides in the RPM header, which could cut down the processing and generation time considerably.

from createrepo_c.

Conan-Kudo commented on June 22, 2024

@hughsie Are you still interested in doing this? If you are, I'm happy to review and test patches for integrating AppStream functionality into createrepo_c.

from createrepo_c.

Conan-Kudo commented on June 22, 2024

@hughsie If you're still interested in this, I'm happy to review, test, and merge patches to incorporate this functionality into createrepo_c.

from createrepo_c.

hellcp commented on June 22, 2024

I would really love to see this happen, most methods I have seen around are a workaround just so appstream works with the repository, which is less than ideal, this has the potential to work well with the existing solutions, without much work required for different build systems to enable.

from createrepo_c.

hughsie commented on June 22, 2024

If you're still interested in this

I'm interested in seeing it done, but alas don't have the time or the permission to spent a week+ on writing the code. In all honesty, appstream-glib and shipping an appstream-data package is really just a a stopgap until we can use ostree+flatpak where AppStream metadata is baked right into the design.

from createrepo_c.

Conan-Kudo commented on June 22, 2024

Minor update here, if someone wants to work on this, please use @ximion's libappstream instead of @hughsie's libappstream-glib, as the latter is now deprecated.

from createrepo_c.

dralley commented on June 22, 2024

@Conan-Kudo Thanks for the heads up.

It would be great if this clarification was applied to the repos and documentation, because there is no external indications that one of them is deprecated, or generally any indications of the differences between the two. @hughsie 's repo is still getting commits and occasional fixes, there's no deprecation warnings, the README's don't clarify anything (and indeed @hughsie 's README seems more complete), appstream-builder doesn't appear to have been moved to @ximion 's repo and so one could get the impression that @hughsie 's fork is newer, etc.

from createrepo_c.

Conan-Kudo commented on June 22, 2024

@ximion is writing a new appstream-compose library to replace the appstream-builder library.

Note that deprecated does not mean dead, it just means that new stuff shouldn't be using it while things slowly move over. GNOME Software moved over in GNOME 40, for example.

from createrepo_c.

ximion commented on June 22, 2024

@ximion is writing a new appstream-compose library to replace the appstream-builder library.

That thing is pretty much complete, appstream-generator uses it already. While its API isn't marked as "stable" yet, I don't actually expect many changes to happen. You can easily test this and look for bugs by running appstreamcli compose on a directory tree and see what output is produced (see man appstreamcli compose for some usage hints).

from createrepo_c.

j-mracek commented on June 22, 2024

It looks like that the issue is opened for 5 years therefore it is good time to revisit it. May be we can stat with some clarification. Right now AppStream metadata are generated by a library and then they are added to repository by modifyrepo_c.

Are UppStream metadata used only by PackageKit or they are widely used?

Why createrepo_c should directly generate this metadata? Modules and comps group are also generated outside of createrepo_c.

And what about merging repositories and merging AppStream metadata? Is this functionality supported by a library?

from createrepo_c.

hughsie commented on June 22, 2024

Are UppStream metadata used only by PackageKit or they are widely used?

gnome-software, KDE apper and software center, cockpit, fwpud and others. In all honestly, tons of stuff :)

And what about merging repositories and merging AppStream metadata

I believe libappstream supports this already.

from createrepo_c.

j-mracek commented on June 22, 2024

I tried to understand the mechanism how AppStream metadata works and I discovered that they are not defined in repomd file (like rpms, modules, filelists, comps, advisory, ...) but they are stored in RPM (appstream-data). If I understand it correctly it would mean that during creation of repository createrepo_is supposed to generate additional RPM and may even override existing rpm. The rpm will be probably without a signature because I do not expect that createrepo_c will have always access to distribution signing key (private key). Please can you verify that I understand the request and workflow correctly?

from createrepo_c.

hughsie commented on June 22, 2024

Does https://blogs.gnome.org/hughsie/2016/04/27/3rd-party-fedora-repositories-and-appstream/ help?

from createrepo_c.

j-mracek commented on June 22, 2024

@hughsie Thank you very much for information, but I tried to find such a data in Fedora repositories in repomd.xml, but there is nothing like that. Please can you point me once again?

from createrepo_c.

hughsie commented on June 22, 2024

I... thought the blog post should explain everything... i.e. you generate the appstream metadata using appstream-builder and then use modifyrepo to include it if createrepo has already been called.

from createrepo_c.

dralley commented on June 22, 2024

I think the confusing part is that the Fedora repositories themselves don't work this way, they use this crazy approach with packing the metadata into the appstream-data RPM

from createrepo_c.

Conan-Kudo commented on June 22, 2024

Fedora COPR repositories work as @hughsie describes, though. @praiskup can provide details on the implementation.

from createrepo_c.

ximion commented on June 22, 2024

I've always considered the appstream-data package method of delivering the data as a workaround. The ideal way to ship this data is as part of the regular repository metadata, as that ensures the data is always up-to-date when something in the repository changes, without having to upgrade a random package first.

In Debian, this is implemented by having our archive software add & sign the AppStream data with the rest of the metadata, and then having APT, our package management tool, download this data on the client. APT will then invoke appstreamcli which extracts the downloaded icon tarballs and moves all metadata to the right locations so AppStream clients can find it, and also updates some caches. This was originally done this way for political reasons (APT could have very well extracted the tarballs and moved the data to the right place on its own), but it turned out to be very reliable with little reason for change.
This system has been used for a very long time now, I think since 2015. If needed, appstreamcli could add similar support for RPM-based distributions.

The AppStream metadata itself is generated by appstream-generator on Debian/Ubuntu in a sandboxed environment (the software doing font rendering and image scaling from 3rd-party sources scared the security people). The generator tool is a pretty heavyweight solution that takes care of a lot of stuff specific to Linux distributions and Debian in particular, for example it has some very complicated logic to hunt down the right icon for an application across all packages without having to trace dependencies and extract half of the archive into a temporary location.

On the other hand, the appstreamcli compose tool also exists, which is part of AppStream and will generate the right metadata if you point it at one or more directory trees containing metadata. For simple solutions, and even repositories with few packages, this is actually a viable solution that doesn't need a more complex thing like appstream-generator.
Then, the library libappstream-compose also exists, which contains some very simple building blocks to write solutions similar to appstream-generator and allows integrating the metadata generation tightly with existing tools.

Which of these is right for this application I don't know, but I can definitely help in case any of these tools is missing anything you need. The generator even reads repomd files and can handle RPMs, but I don't think this feature is used much at the moment.

from createrepo_c.

j-mracek commented on June 22, 2024

Thank you very much for clarification and for additional information. If I will summary what I've got from discussion.

The support of generation of AppStream metadata

It make sense to have such a support in createrepo especially when RPMs can be in multiple directories and metadata are generated from RPMs.
Also to include Appstream metadat as metadata and not RPM make sense.
It also looks like that there are multiple ways what can be used for the support and there is already deployed solution in Copr. I would like to know plans of Fedora distribution to understand why they use RPM rather then metadata to ensure that the new metadata in createrepo_c will be widely used. It means that it will require some discussion => time. We need to share the solution rather then re-implement it.

Adding AppStream metadata as default behavior for createrepo

According to block post, the generation of AppStream metadata is demanding process that significantly slow down the generation of repositories therefore I think it is not good idea to implement it as a default behavior

Delivery - well it is tricky part. There are still several open questions. I don't know whether API in mentioned libraries can be used by createrepo_c that is written in C. Also I know that we have DNF5 project as a team priority with planned delivery to Fedora 38 and 39. What can really help here is a community contribution.

from createrepo_c.

dralley commented on June 22, 2024

Am I correct in thinking that generating the metadata requires deep inspection of the RPM file, so it needs to be available?

from createrepo_c.

Conan-Kudo commented on June 22, 2024

It does, yes. What makes appstream repodata generation slow for Fedora and COPR right now is that we have to generate the rpm repodata, and then scan through the RPMs again to pull the necessary contents out for appstream repodata generation. That's why Fedora doesn't do it now, because it's too slow. Once integrated into createrepo_c, it would be possible to assemble the appstream and rpm repodata in one step as each RPM is read and the data is pulled, which would make it faster.

from createrepo_c.

praiskup commented on June 22, 2024

Isn't the difference that createrepo_c reads the RPM metadata (headers), while appstream-builder needs to analyze the archive content (all the files)?

from createrepo_c.

dralley commented on June 22, 2024

Yeah, I'm curious how much time it would actually save (not that the feature isn't a good idea necessarily).

The bottleneck is almost certainly going to be unpacking the RPM archives rather than finding or opening the RPMs. createrepo_c only has to read the headers, so there's limited overlap in the work being performed.

from createrepo_c.

praiskup commented on June 22, 2024

Actually, createrepo_c probably reads the contents ... at least to
generate the filelists.xml. So Neal is right in that.

What would though speedup the Copr use-case a lot would be the
support for appstream metadata together with --update --recycle-pkglist
(the incremental metadata update). Because no matter how big the
repository is, currently we almost always regenerate the metadata (build
is added, build is removed). Appstream-builder doesn't do incremental
updates, so it always fails (timeouts, to not block every other build) for large
repositories.

from createrepo_c.

xsuchy commented on June 22, 2024

For the record, here is the appstream issue about the performance hughsie/appstream-glib#301

from createrepo_c.

ximion commented on June 22, 2024

For new code, please don't use appstream-glib but appstream proper. The former does not support any of the newer AppStream metadata and is currently only lightly maintained.

from createrepo_c.

j-mracek commented on June 22, 2024

Thank you very much for the discussion. It looks like that enabling generation of AppStream metadata will be a performance killer of createrepo_c therefore we cannot support it as a default behavior.

If I good understand the problem with the performance, required data for generation of the AppStream metadata are not stored in uncompressed rpm header but in compressed, much larger payload, therefore there will be always some additional requirements. I don't know what approach would be the best to resolve it but may be it could be helpful to contact RPM team to let them know about this user case.

I will be happy to start a discussion about implementing AppStream metadata when performance issue will be resolved and when the data will be distributed by unified (standardized) way - Fedora, Copr. I am really sorry but right now I don't think it is possible to implement requested feature.

from createrepo_c.

hughsie commented on June 22, 2024

therefore we cannot support it as a default behavior

I think then my advice to Fedora would be to stop shipping applications in rpm packages, and we should speed up the transition to Flatpak and OSTree metadata.

from createrepo_c.

Conan-Kudo commented on June 22, 2024

therefore we cannot support it as a default behavior

I think then my advice to Fedora would be to stop shipping applications in rpm packages, and we should speed up the transition to Flatpak and OSTree metadata.

Please don't make unproductive comments. This is not ever going to happen.

from createrepo_c.

hughsie commented on June 22, 2024

This is not ever going to happen

What happens when I get bored of creating the appstream-data package updates? I think I'm the only person that's ever actually done it.

from createrepo_c.

Conan-Kudo commented on June 22, 2024

I will be happy to start a discussion about implementing AppStream metadata when performance issue will be resolved and when the data will be distributed by unified (standardized) way - Fedora, Copr. I am really sorry but right now I don't think it is possible to implement requested feature.

The RPM team (specifically @ffesti) is aware of this. Extending the base RPM format to incorporate AppStream data has been discussed before, but the result of those discussions is that it would make the RPM headers ridiculously big.

To writ: the problem is that AppStream data is extremely rich. The following components are generally part of AppStream metadata:

The XML metainfo file with name, summary, description, release notes
The INI desktop file with name, generic name, description, icon
Screenshots or screencast video

We can't embed this in the RPM header. The best we could do is include pointers to the payload regions for the data files so that we don't have to scan the whole RPM for them. But that's still a fair bit extra to pull off during RPM generation.

But the thing is, scanning the RPMs for these files is not particularly slow. The problem is that we have to load all the RPMs twice today, since we read them one time for createrepo_c, and then read them again for appstream-builder. Two separate processes.

If you decide not to integrate the two processes, and @hughsie decides to stop making appstream-data and I decide to force the issue, then Fedora and RHEL will be forced to run the same process that Mageia and openSUSE do, which is running appstream-builder or appstream-generator right after creating repos. The larger problem is that @dralley and others working on Pulp are going to have a harder time too, because they want to work directly with createrepo_c for all metadata generation.

from createrepo_c.

ximion commented on June 22, 2024

The performance issues do not exist with appstream-generator - it works absolutely fine on massive Debian repositories, and Ubuntu and Arch also use it without reported issues . Still, it's expensive to run (needs quite a bit of memory), which is why we only do it every 6h, but that is absolutely sufficient (the archive doesn't get published any faster anyway). Its RPM backend exists, but I would guess its the least-tested backend.

There's no reason why createrepo_c couldn't use appstream-generator or implement something that fits its needs better based on libappstream-compose (knowing which packages are new is a huge advantage).
(The "trick" for asgen is to simply cache resulting data aggressively, so it will never reprocess anything unless explicitly told so)

Also, dpkg/rpm have nothing to do with AppStream metadata, it is truly part of the repository metadata, so very much in the hands of DNF/Zypper/APT and the respective repository layouts that distributions use. So, while it's unified in the Debian-based world, I wonder whether it is reasonable at all to expect any unification in the RPM world with its different package management tools. Also, holding back features for one distribution while waiting on others is not a great plan (especially since openSUSE actually already has AppStream data as part of its repository metadata, AFAIK it's only Fedora that doesn't implement this yet).

from createrepo_c.

ximion commented on June 22, 2024

the main issue will be that it requires the RPMs to be available at all times, which is slightly in conflict with one of the main features, which is to on-demand download them only when needed

On Debian, we pull the packages on-demand via a network mount. On Ubuntu, appstream-generator fetches them from a web location via links only if needed.

Our main requirements are just that it needs to be possible to hand the RPM file directly to the library one at a time to incrementally build up the xml instead of pointing it at a directory, but that's fundamentally an appstream requirement.

It's not really a requirement. On Debian we do have to scan multiple packages (simply because icons may be split out into a -data package and we do need to find those for AppStream), but we do later build the YAML data incrementally from cached data and don't need to re-scan stuff that was already scanned. Same applies to Arch & Co which do use the XML format.
It's a "scan this package once and cache the data" step, followed by "fetch all data of active packages from the cache and concatenate it in one single file to add to the repository".
So, I bet this can be implemented in a way that works with the current plans for this project, but it does sound like it will need a bit of extra engineering.

from createrepo_c.

xsuchy commented on June 22, 2024

But the thing is, scanning the RPMs for these files is not particularly slow. The problem is that we have to load all the RPMs twice today since we read them one time for createrepo_c, and then read them again for appstream-builder. Two separate processes.

This. And in Copr, this is on a different level of magnitude. After each build, we call createrepo_c. I.e., every 2 minutes. And if appstream-builder runs 10 minutes...
But if createrepo_c uses the lib itself (and not executable) and just parses the new package (similar to what --recycle-pkglist does), then there will be no performance issues.

from createrepo_c.

dralley commented on June 22, 2024

@xsuchy By the way, I would make sure COPR gets upgraded to createrepo_c 0.20.1 for the --update performance fix. And I'd love to hear back about the impact that has if it is monitored.

But the thing is, scanning the RPMs for these files is not particularly slow. The problem is that we have to load all the RPMs twice today since we read them one time for createrepo_c, and then read them again for appstream-builder. Two separate processes.

This. And in Copr, this is on a different level of magnitude. After each build, we call createrepo_c. I.e., every 2 minutes. And if appstream-builder runs 10 minutes...

Strictly speaking, couldn't appstream-builder be equally capable of determining which packages have changed, and only processing those RPMs? If the implementation was there, it might benefit everyone (if the metadata builder tool is generic across all package types).

You mentioned that building the appstream metadata is very expensive in terms of memory - well, so is --update, so if this is done at the same time it might be a concern.

from createrepo_c.

dralley commented on June 22, 2024

Separate question: Does appstream metadata really need to be generated for all packages, or only the latest version of each package present? Because that's another potential optimization and one that createrepo_c isn't currently architected to handle very well since it doesn't take versions into account.

from createrepo_c.

ximion commented on June 22, 2024

Separate question: Does appstream metadata really need to be generated for all packages, or only the latest version of each package present? Because that's another potential optimization.

Only the latest version, and appstream-generator does that already, and also only processes changed packages ;-)
Using appstream-builder for new projects is a bad idea because it is based on appstream-glib, which is not up to date with supporting the latest AppStream features at all.

So, for newer implementations, please use appstream-generator, build a custom tool based on libappstrem-compose (which does everything except for reading RPM files and caching, as these are very use-case specific) or use appstreamcli compose (the last option is primarily useful for things like Flatpak, as it works on directory trees and makes some assumptions that may not be true for packages).

from createrepo_c.

dralley commented on June 22, 2024

Right, that's what I meant, it's not helpful to have all these tools with similar names :)

So if that's the case, then maybe COPR should start with using a workflow based on appstream-generator instead of appstream-builder (assuming they didn't make the same error I just did) and see if that addresses the issue?

This. And in Copr, this is on a different level of magnitude. After each build, we call createrepo_c. I.e., every 2 minutes. And if appstream-builder runs 10 minutes...

from createrepo_c.

ximion commented on June 22, 2024

@dralley That woud make sense to at least test :-) I see two potential issues with asgen: 1) It's written in D, so compared to C has limited platform support (may not be a problem, arm64 and amd64 are well supported) and 2) It was originally written for Debian/Ubuntu, so it might have a bunch of workflow assumptions that do not jive well at all with createrepo_c's expected workflow.
If something simple can be changed to make your life easier, I could definitely help with that though, as well as with using libappstream-compose in case you do want a more tightly integrated solution at some point (since I know very little about createrepo_c and Fedora's workflow though, I unfortunately can't give you an instantly working solution ;-D)

from createrepo_c.

xsuchy commented on June 22, 2024

So if that's the case, then maybe COPR should start with using a workflow based on appstream-generator instead of appstream-builder (assuming they didn't make the same error I just did) and see if that addresses the issue?

Spinning off to separate discussion ximion/appstream-generator#104

from createrepo_c.

j-mracek commented on June 22, 2024

Probably I overlooked something, but is there any support of merging AppStream metadata in any of libraries?

from createrepo_c.

praiskup commented on June 22, 2024

@xsuchy By the way, I would make sure COPR gets upgraded to createrepo_c
0.20.1 for the --update performance fix. And I'd love to hear back about the impact that has if it is monitored.

I've run some tests, but sadly there's no obvious difference. I replied directly into #323.

from createrepo_c.

hughsie commented on June 22, 2024

but is there any support of merging AppStream metadata in any of libraries

I believe both libappstream and libappstream-glib support this.

from createrepo_c.

ximion commented on June 22, 2024

Mergin stuff is most painless on a per-component level (replace one component entirely with another). Both libraries do also support merging data between components, but that's an extremely messy thing (and leads to annoying issues later, when you want to figure out where broken metadata actually came from...). So, "replace component if the same name" is IMHO the better thing to do when merging :-)

from createrepo_c.

praiskup commented on June 22, 2024

I also don't think any sophisticated merging is needed, at least if I'm not missing something. On metadata --updates, createrepo_c either fully loads the metadata from the (new or updated) RPM (and the same can be done for AppStream metadatata, even if that means extracting the cpio archive), or fully re-uses the old metadata (which would recycle even the AppStream metadata). IOW, merge is needed, but on per-component basis (as already done in createrepo_c).

Not sure about the library API, but if there's a way to point the library at RPM filename, and get the metadata, this sounds like a perfectly valid (opt-in, both on build and runtime) RFE.

from createrepo_c.

ximion commented on June 22, 2024

LibAppStream can do the merging you want without any issues, however:

Not sure about the library API, but if there's a way to point the library at RPM filename, and get the metadata, this sounds like a perfectly valid (opt-in, both on build and runtime) RFE.

That isn't so simple, unfortunately. Because packagers like to split data across multiple packages, e.g. place icons in a -data package and the actual application in a different package, or using stock icons from shared icon packages (like KDE does), the AppStream metadata generator always has to look at all the packages in their entirety, which in case of appstream-generator means building a cache of things that it has already seen as well as new packages that still need to be processed.

from createrepo_c.

Please support generation of AppStream metadata automatically about createrepo_c HOT 53 OPEN

Comments (53)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent