Comments (53)
ostree+flatpak where AppStream metadata is baked right into the design
In the most generous case, where Flatpak takes over the world, it's still not covering everything, as there's still the host components needing AppStream metadata (like drivers, among other things).
In this regard, it still makes sense to be able to generate AppStream metadata for rpm repos. But at least if you do get around to doing it, I can review, test, and merge it. And naturally cut a new release with the feature shortly thereafter.
from createrepo_c.
This missing feature is why the Fedora repositories don't work that way. If this feature was implemented, then Fedora would use it.
from createrepo_c.
@praiskup although not involved in the project for several years now, I would say the file list is generated just from the RPM header, see: https://github.com/rpm-software-management/createrepo_c/blob/master/src/parsehdr.c#L227-L309
from createrepo_c.
Hi Richard,
that probably make sense to do it automatically to eliminate the step of modifying repo but I am not sure whether it should be done by default. Adding ~--appdata
option and weak dependency on libappstream-builder.so IMO should not hurt. @Tojaj , @megaumi what do you think?
Anyway if COPR supports generating appdata I personally don't see this as big issue. COPR should be preferred way of creating 3rd party repos (and the most convenient).
from createrepo_c.
@hughsie so I've talked with Tomas and the behavior for generating appdata could be by default and use a libappstream-builder lib but there should be compile time option enabling it. If you can implement it we'll be happy to do PR review and merge it.
from createrepo_c.
The larger problem is that @dralley and others working on Pulp are going to have a harder time too, because they want to work directly with createrepo_c for all metadata generation.
I'm not sure it makes a difference for us, the main issue will be that it requires the RPMs to be available at all times, which is slightly in conflict with one of the main features, which is to on-demand download them only when needed.
Plus we already have to deal with libmodulemd separately, and also we're not using the literal createrepo_c binary, where the support would primarily be added.
Our main requirements are just that it needs to be possible to hand the RPM file directly to the library one at a time to incrementally build up the xml instead of pointing it at a directory, but that's fundamentally an appstream requirement. It might already be possible, I haven't looked deeply at the APIs recently and can't look it up at the moment.
from createrepo_c.
@hughsie In my view, I think it's completely fine, as long as there's a compile-time option for adding the libappstream-builder dependency. In fact, this would make things even easier for us in Mageia for supporting AppStream metadata, as well as many others who are consumers of createrepo_c
.
from createrepo_c.
I would suggest calling the switch --generate-appstream-md
though, since that's what it would do. Putting this into createrepo_c
also means you could do nice optimization tricks like only scan packages for AppStream data if they have appdata()
or metainfo()
Provides in the RPM header, which could cut down the processing and generation time considerably.
from createrepo_c.
@hughsie Are you still interested in doing this? If you are, I'm happy to review and test patches for integrating AppStream functionality into createrepo_c.
from createrepo_c.
@hughsie If you're still interested in this, I'm happy to review, test, and merge patches to incorporate this functionality into createrepo_c.
from createrepo_c.
I would really love to see this happen, most methods I have seen around are a workaround just so appstream works with the repository, which is less than ideal, this has the potential to work well with the existing solutions, without much work required for different build systems to enable.
from createrepo_c.
If you're still interested in this
I'm interested in seeing it done, but alas don't have the time or the permission to spent a week+ on writing the code. In all honesty, appstream-glib and shipping an appstream-data package is really just a a stopgap until we can use ostree+flatpak where AppStream metadata is baked right into the design.
from createrepo_c.
Minor update here, if someone wants to work on this, please use @ximion's libappstream
instead of @hughsie's libappstream-glib
, as the latter is now deprecated.
from createrepo_c.
@Conan-Kudo Thanks for the heads up.
It would be great if this clarification was applied to the repos and documentation, because there is no external indications that one of them is deprecated, or generally any indications of the differences between the two. @hughsie 's repo is still getting commits and occasional fixes, there's no deprecation warnings, the README's don't clarify anything (and indeed @hughsie 's README seems more complete), appstream-builder
doesn't appear to have been moved to @ximion 's repo and so one could get the impression that @hughsie 's fork is newer, etc.
from createrepo_c.
@ximion is writing a new appstream-compose
library to replace the appstream-builder
library.
Note that deprecated does not mean dead, it just means that new stuff shouldn't be using it while things slowly move over. GNOME Software moved over in GNOME 40, for example.
from createrepo_c.
@ximion is writing a new
appstream-compose
library to replace theappstream-builder
library.
That thing is pretty much complete, appstream-generator
uses it already. While its API isn't marked as "stable" yet, I don't actually expect many changes to happen. You can easily test this and look for bugs by running appstreamcli compose
on a directory tree and see what output is produced (see man appstreamcli compose
for some usage hints).
from createrepo_c.
It looks like that the issue is opened for 5 years therefore it is good time to revisit it. May be we can stat with some clarification. Right now AppStream metadata are generated by a library and then they are added to repository by modifyrepo_c.
Are UppStream metadata used only by PackageKit or they are widely used?
Why createrepo_c should directly generate this metadata? Modules and comps group are also generated outside of createrepo_c.
And what about merging repositories and merging AppStream metadata? Is this functionality supported by a library?
from createrepo_c.
Are UppStream metadata used only by PackageKit or they are widely used?
gnome-software, KDE apper and software center, cockpit, fwpud and others. In all honestly, tons of stuff :)
And what about merging repositories and merging AppStream metadata
I believe libappstream supports this already.
from createrepo_c.
I tried to understand the mechanism how AppStream metadata works and I discovered that they are not defined in repomd file (like rpms, modules, filelists, comps, advisory, ...) but they are stored in RPM (appstream-data
). If I understand it correctly it would mean that during creation of repository createrepo_is supposed to generate additional RPM and may even override existing rpm. The rpm will be probably without a signature because I do not expect that createrepo_c will have always access to distribution signing key (private key). Please can you verify that I understand the request and workflow correctly?
from createrepo_c.
Does https://blogs.gnome.org/hughsie/2016/04/27/3rd-party-fedora-repositories-and-appstream/ help?
from createrepo_c.
@hughsie Thank you very much for information, but I tried to find such a data in Fedora repositories in repomd.xml, but there is nothing like that. Please can you point me once again?
from createrepo_c.
I... thought the blog post should explain everything... i.e. you generate the appstream metadata using appstream-builder and then use modifyrepo to include it if createrepo has already been called.
from createrepo_c.
I think the confusing part is that the Fedora repositories themselves don't work this way, they use this crazy approach with packing the metadata into the appstream-data
RPM
from createrepo_c.
Fedora COPR repositories work as @hughsie describes, though. @praiskup can provide details on the implementation.
from createrepo_c.
I've always considered the appstream-data
package method of delivering the data as a workaround. The ideal way to ship this data is as part of the regular repository metadata, as that ensures the data is always up-to-date when something in the repository changes, without having to upgrade a random package first.
In Debian, this is implemented by having our archive software add & sign the AppStream data with the rest of the metadata, and then having APT, our package management tool, download this data on the client. APT will then invoke appstreamcli
which extracts the downloaded icon tarballs and moves all metadata to the right locations so AppStream clients can find it, and also updates some caches. This was originally done this way for political reasons (APT could have very well extracted the tarballs and moved the data to the right place on its own), but it turned out to be very reliable with little reason for change.
This system has been used for a very long time now, I think since 2015. If needed, appstreamcli
could add similar support for RPM-based distributions.
The AppStream metadata itself is generated by appstream-generator on Debian/Ubuntu in a sandboxed environment (the software doing font rendering and image scaling from 3rd-party sources scared the security people). The generator tool is a pretty heavyweight solution that takes care of a lot of stuff specific to Linux distributions and Debian in particular, for example it has some very complicated logic to hunt down the right icon for an application across all packages without having to trace dependencies and extract half of the archive into a temporary location.
On the other hand, the appstreamcli compose
tool also exists, which is part of AppStream and will generate the right metadata if you point it at one or more directory trees containing metadata. For simple solutions, and even repositories with few packages, this is actually a viable solution that doesn't need a more complex thing like appstream-generator.
Then, the library libappstream-compose
also exists, which contains some very simple building blocks to write solutions similar to appstream-generator and allows integrating the metadata generation tightly with existing tools.
Which of these is right for this application I don't know, but I can definitely help in case any of these tools is missing anything you need. The generator even reads repomd files and can handle RPMs, but I don't think this feature is used much at the moment.
from createrepo_c.
Thank you very much for clarification and for additional information. If I will summary what I've got from discussion.
- The support of generation of AppStream metadata
- It make sense to have such a support in createrepo especially when RPMs can be in multiple directories and metadata are generated from RPMs.
- Also to include Appstream metadat as metadata and not RPM make sense.
- It also looks like that there are multiple ways what can be used for the support and there is already deployed solution in Copr. I would like to know plans of Fedora distribution to understand why they use RPM rather then metadata to ensure that the new metadata in createrepo_c will be widely used. It means that it will require some discussion => time. We need to share the solution rather then re-implement it.
- Adding AppStream metadata as default behavior for createrepo
- According to block post, the generation of AppStream metadata is demanding process that significantly slow down the generation of repositories therefore I think it is not good idea to implement it as a default behavior
- Delivery - well it is tricky part. There are still several open questions. I don't know whether API in mentioned libraries can be used by createrepo_c that is written in C. Also I know that we have DNF5 project as a team priority with planned delivery to Fedora 38 and 39. What can really help here is a community contribution.
from createrepo_c.
Am I correct in thinking that generating the metadata requires deep inspection of the RPM file, so it needs to be available?
from createrepo_c.
It does, yes. What makes appstream repodata generation slow for Fedora and COPR right now is that we have to generate the rpm repodata, and then scan through the RPMs again to pull the necessary contents out for appstream repodata generation. That's why Fedora doesn't do it now, because it's too slow. Once integrated into createrepo_c, it would be possible to assemble the appstream and rpm repodata in one step as each RPM is read and the data is pulled, which would make it faster.
from createrepo_c.
Isn't the difference that createrepo_c
reads the RPM metadata (headers), while appstream-builder
needs to analyze the archive content (all the files)?
from createrepo_c.
Yeah, I'm curious how much time it would actually save (not that the feature isn't a good idea necessarily).
The bottleneck is almost certainly going to be unpacking the RPM archives rather than finding or opening the RPMs. createrepo_c only has to read the headers, so there's limited overlap in the work being performed.
from createrepo_c.
Actually, createrepo_c
probably reads the contents ... at least to
generate the filelists.xml. So Neal is right in that.
What would though speedup the Copr use-case a lot would be the
support for appstream metadata together with --update --recycle-pkglist
(the incremental metadata update). Because no matter how big the
repository is, currently we almost always regenerate the metadata (build
is added, build is removed). Appstream-builder doesn't do incremental
updates, so it always fails (timeouts, to not block every other build) for large
repositories.
from createrepo_c.
For the record, here is the appstream issue about the performance hughsie/appstream-glib#301
from createrepo_c.
For new code, please don't use appstream-glib but appstream proper. The former does not support any of the newer AppStream metadata and is currently only lightly maintained.
from createrepo_c.
Thank you very much for the discussion. It looks like that enabling generation of AppStream metadata will be a performance killer of createrepo_c therefore we cannot support it as a default behavior.
If I good understand the problem with the performance, required data for generation of the AppStream metadata are not stored in uncompressed rpm header but in compressed, much larger payload, therefore there will be always some additional requirements. I don't know what approach would be the best to resolve it but may be it could be helpful to contact RPM team to let them know about this user case.
I will be happy to start a discussion about implementing AppStream metadata when performance issue will be resolved and when the data will be distributed by unified (standardized) way - Fedora, Copr. I am really sorry but right now I don't think it is possible to implement requested feature.
from createrepo_c.
therefore we cannot support it as a default behavior
I think then my advice to Fedora would be to stop shipping applications in rpm packages, and we should speed up the transition to Flatpak and OSTree metadata.
from createrepo_c.
therefore we cannot support it as a default behavior
I think then my advice to Fedora would be to stop shipping applications in rpm packages, and we should speed up the transition to Flatpak and OSTree metadata.
Please don't make unproductive comments. This is not ever going to happen.
from createrepo_c.
This is not ever going to happen
What happens when I get bored of creating the appstream-data package updates? I think I'm the only person that's ever actually done it.
from createrepo_c.
I will be happy to start a discussion about implementing AppStream metadata when performance issue will be resolved and when the data will be distributed by unified (standardized) way - Fedora, Copr. I am really sorry but right now I don't think it is possible to implement requested feature.
The RPM team (specifically @ffesti) is aware of this. Extending the base RPM format to incorporate AppStream data has been discussed before, but the result of those discussions is that it would make the RPM headers ridiculously big.
To writ: the problem is that AppStream data is extremely rich. The following components are generally part of AppStream metadata:
- The XML metainfo file with name, summary, description, release notes
- The INI desktop file with name, generic name, description, icon
- Screenshots or screencast video
We can't embed this in the RPM header. The best we could do is include pointers to the payload regions for the data files so that we don't have to scan the whole RPM for them. But that's still a fair bit extra to pull off during RPM generation.
But the thing is, scanning the RPMs for these files is not particularly slow. The problem is that we have to load all the RPMs twice today, since we read them one time for createrepo_c, and then read them again for appstream-builder. Two separate processes.
If you decide not to integrate the two processes, and @hughsie decides to stop making appstream-data
and I decide to force the issue, then Fedora and RHEL will be forced to run the same process that Mageia and openSUSE do, which is running appstream-builder
or appstream-generator
right after creating repos. The larger problem is that @dralley and others working on Pulp are going to have a harder time too, because they want to work directly with createrepo_c for all metadata generation.
from createrepo_c.
The performance issues do not exist with appstream-generator - it works absolutely fine on massive Debian repositories, and Ubuntu and Arch also use it without reported issues . Still, it's expensive to run (needs quite a bit of memory), which is why we only do it every 6h, but that is absolutely sufficient (the archive doesn't get published any faster anyway). Its RPM backend exists, but I would guess its the least-tested backend.
There's no reason why createrepo_c couldn't use appstream-generator or implement something that fits its needs better based on libappstream-compose (knowing which packages are new is a huge advantage).
(The "trick" for asgen is to simply cache resulting data aggressively, so it will never reprocess anything unless explicitly told so)
Also, dpkg/rpm have nothing to do with AppStream metadata, it is truly part of the repository metadata, so very much in the hands of DNF/Zypper/APT and the respective repository layouts that distributions use. So, while it's unified in the Debian-based world, I wonder whether it is reasonable at all to expect any unification in the RPM world with its different package management tools. Also, holding back features for one distribution while waiting on others is not a great plan (especially since openSUSE actually already has AppStream data as part of its repository metadata, AFAIK it's only Fedora that doesn't implement this yet).
from createrepo_c.
the main issue will be that it requires the RPMs to be available at all times, which is slightly in conflict with one of the main features, which is to on-demand download them only when needed
On Debian, we pull the packages on-demand via a network mount. On Ubuntu, appstream-generator
fetches them from a web location via links only if needed.
Our main requirements are just that it needs to be possible to hand the RPM file directly to the library one at a time to incrementally build up the xml instead of pointing it at a directory, but that's fundamentally an appstream requirement.
It's not really a requirement. On Debian we do have to scan multiple packages (simply because icons may be split out into a -data package and we do need to find those for AppStream), but we do later build the YAML data incrementally from cached data and don't need to re-scan stuff that was already scanned. Same applies to Arch & Co which do use the XML format.
It's a "scan this package once and cache the data" step, followed by "fetch all data of active packages from the cache and concatenate it in one single file to add to the repository".
So, I bet this can be implemented in a way that works with the current plans for this project, but it does sound like it will need a bit of extra engineering.
from createrepo_c.
But the thing is, scanning the RPMs for these files is not particularly slow. The problem is that we have to load all the RPMs twice today since we read them one time for createrepo_c, and then read them again for appstream-builder. Two separate processes.
This. And in Copr, this is on a different level of magnitude. After each build, we call createrepo_c. I.e., every 2 minutes. And if appstream-builder runs 10 minutes...
But if createrepo_c uses the lib itself (and not executable) and just parses the new package (similar to what --recycle-pkglist does), then there will be no performance issues.
from createrepo_c.
@xsuchy By the way, I would make sure COPR gets upgraded to createrepo_c 0.20.1 for the --update
performance fix. And I'd love to hear back about the impact that has if it is monitored.
But the thing is, scanning the RPMs for these files is not particularly slow. The problem is that we have to load all the RPMs twice today since we read them one time for createrepo_c, and then read them again for appstream-builder. Two separate processes.
This. And in Copr, this is on a different level of magnitude. After each build, we call createrepo_c. I.e., every 2 minutes. And if appstream-builder runs 10 minutes...
Strictly speaking, couldn't appstream-builder
be equally capable of determining which packages have changed, and only processing those RPMs? If the implementation was there, it might benefit everyone (if the metadata builder tool is generic across all package types).
You mentioned that building the appstream metadata is very expensive in terms of memory - well, so is --update
, so if this is done at the same time it might be a concern.
from createrepo_c.
Separate question: Does appstream metadata really need to be generated for all packages, or only the latest version of each package present? Because that's another potential optimization and one that createrepo_c
isn't currently architected to handle very well since it doesn't take versions into account.
from createrepo_c.
Separate question: Does appstream metadata really need to be generated for all packages, or only the latest version of each package present? Because that's another potential optimization.
Only the latest version, and appstream-generator
does that already, and also only processes changed packages ;-)
Using appstream-builder
for new projects is a bad idea because it is based on appstream-glib
, which is not up to date with supporting the latest AppStream features at all.
So, for newer implementations, please use appstream-generator
, build a custom tool based on libappstrem-compose
(which does everything except for reading RPM files and caching, as these are very use-case specific) or use appstreamcli compose
(the last option is primarily useful for things like Flatpak, as it works on directory trees and makes some assumptions that may not be true for packages).
from createrepo_c.
Right, that's what I meant, it's not helpful to have all these tools with similar names :)
So if that's the case, then maybe COPR should start with using a workflow based on appstream-generator
instead of appstream-builder
(assuming they didn't make the same error I just did) and see if that addresses the issue?
This. And in Copr, this is on a different level of magnitude. After each build, we call createrepo_c. I.e., every 2 minutes. And if
appstream-builder
runs 10 minutes...
from createrepo_c.
@dralley That woud make sense to at least test :-) I see two potential issues with asgen: 1) It's written in D, so compared to C has limited platform support (may not be a problem, arm64 and amd64 are well supported) and 2) It was originally written for Debian/Ubuntu, so it might have a bunch of workflow assumptions that do not jive well at all with createrepo_c's expected workflow.
If something simple can be changed to make your life easier, I could definitely help with that though, as well as with using libappstream-compose in case you do want a more tightly integrated solution at some point (since I know very little about createrepo_c and Fedora's workflow though, I unfortunately can't give you an instantly working solution ;-D)
from createrepo_c.
So if that's the case, then maybe COPR should start with using a workflow based on appstream-generator instead of appstream-builder (assuming they didn't make the same error I just did) and see if that addresses the issue?
Spinning off to separate discussion ximion/appstream-generator#104
from createrepo_c.
Probably I overlooked something, but is there any support of merging AppStream metadata in any of libraries?
from createrepo_c.
@xsuchy By the way, I would make sure COPR gets upgraded to createrepo_c
0.20.1 for the--update
performance fix. And I'd love to hear back about the impact that has if it is monitored.
I've run some tests, but sadly there's no obvious difference. I replied directly into #323.
from createrepo_c.
but is there any support of merging AppStream metadata in any of libraries
I believe both libappstream and libappstream-glib support this.
from createrepo_c.
Mergin stuff is most painless on a per-component level (replace one component entirely with another). Both libraries do also support merging data between components, but that's an extremely messy thing (and leads to annoying issues later, when you want to figure out where broken metadata actually came from...). So, "replace component if the same name" is IMHO the better thing to do when merging :-)
from createrepo_c.
I also don't think any sophisticated merging is needed, at least if I'm not missing something. On metadata --update
s, createrepo_c either fully loads the metadata from the (new or updated) RPM (and the same can be done for AppStream metadatata, even if that means extracting the cpio archive), or fully re-uses the old metadata (which would recycle even the AppStream metadata). IOW, merge is needed, but on per-component basis (as already done in createrepo_c).
Not sure about the library API, but if there's a way to point the library at RPM filename, and get the metadata, this sounds like a perfectly valid (opt-in, both on build and runtime) RFE.
from createrepo_c.
LibAppStream can do the merging you want without any issues, however:
Not sure about the library API, but if there's a way to point the library at RPM filename, and get the metadata, this sounds like a perfectly valid (opt-in, both on build and runtime) RFE.
That isn't so simple, unfortunately. Because packagers like to split data across multiple packages, e.g. place icons in a -data
package and the actual application in a different package, or using stock icons from shared icon packages (like KDE does), the AppStream metadata generator always has to look at all the packages in their entirety, which in case of appstream-generator
means building a cache of things that it has already seen as well as new packages that still need to be processed.
from createrepo_c.
Related Issues (20)
- `_XOPEN_SOURCE` define in `src/misc.c` seems extraneous
- Drop `--database` and `--no-database`, split?/drop `sqliterepo_c` HOT 5
- Sending SIGTERM to "createrepo_c --workers 2" sometimes leads to a crash HOT 6
- `--pkglist` can't be used with non-regular files
- Parsing primary.xml error: Start tag expected, '<' not found HOT 4
- heap buffer overflow and stack buffer overflow in test suite HOT 3
- Intermittent crash in `ci-dnf-stack/dnf-behave-tests/createrepo_c/zchunk.feature` HOT 1
- Python bindings fail to add the default version for sqlite records
- Has `--deltas` option been removed? HOT 9
- Brainstorm ways to shrink RPM metadata HOT 5
- Fix the building process to drop documentation for disabled features
- Newer createrepo_c doesn't generate comps readable EL7 HOT 27
- sqlite3_enable_shared_cache HOT 1
- `modifyrepo_c` and `mergerepo_c` generate `--no-pretty` metadata by default
- createrepo_c zstd compression doesn't fill in the content size, in the frame header. Python API problems. HOT 4
- Allow parsing packages metadata without filelists HOT 2
- cr_xml_dump_int() should point to a forbidden character HOT 2
- how does src rpm by pass sub packages in conditons HOT 2
- Removing older versions from the repo. HOT 2
- Why remove the uncompressed XML files from repodata? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from createrepo_c.