Code Monkey home page Code Monkey logo

Comments (10)

asodja avatar asodja commented on July 3, 2024 1

@lamphamTL can you maybe create a reproducer and share it with us as github project or a zip file? That would help us a lot.
More information about your setup could also help: e.g. Operating system, filesystem, AGP version, task you run when it fails, project setup etc.

from gradle.

bigdaz avatar bigdaz commented on July 3, 2024

@gradle/bt-execution a thought: since the actual workspace directory appears to be internally consistent in the failing case, is it possible that we are incorrectly using a previously calculated value for the hash of transformed/android.jar, rather than recalculating the hash of the actual contents?

Looking at the code it appears the hash is being recalculated, but I'm not sure what caching might be in place.

from gradle.

lptr avatar lptr commented on July 3, 2024

Here's a theory that seems to fit what we see. Consider the following chain of events:

  1. A build is executed that triggers the execution of our transform in question, and output contents "version 1" is produced in the $GUH/caches/transforms-4/XXX workspace directory.
  2. Something causes transforms-4/XXX to be cleaned up (but not all of the Gradle user home (GUH) caches). This can be the Gradle GitHub Action aggressively deleting entries untouched during the build (say, after a build that does not trigger the transform), or the regular GUH cleanup triggering, or some sort of other cleanup routine.
  3. Another build is executed, which correctly recognizes that the transforms-4/XXX workspace directory is missing, and proceeds to recreate it. It's important that it produces byte-for-byte different output, such as a JAR with different timestamps.
  4. Yet another build is executed that once again triggers the transform. Gradle recognizes that transforms-4/XXX is available, and proceeds to reuse it, but then the validation of the content on disk fails.

The culprit could be the CachingFileHasher used for caching file hashes during and between builds in the GUH caches. This cache is persisted to disk, so it survives even daemon restarts, and the GitHub Action copies it (along with the rest of the global caches) to every executor. So even ephemeral GitHub builds get a warm file hash cache.

The assumption is that this file hash cache retains hashes for the "version 1" output:

a) when it's deleted from disk, and
b) then when it is subsequently recreated as "version 2", the old content is not recognized as stale.

If these hold true, then build 4) above would read the "version 2" of metadata.bin from disk with the hash for the "version 2" outputs, but when it snapshotted the outputs, it would get stale hashes from the file hash cache.

We know that a) is true because LeastRecentlyUsedCacheCleanup doesn't remove entries from the file hash cache when it deletes cache content.

As for b), with the GitHub Action we know that the action itself tampers with file timestamps in the global caches. It's a bit unclear whether in the non-ephemeral case we could also somehow end up with GradleUserHomeScopeFileTimeStampInspector not recognizing a file in the global caches as changed, and returning the old timestamp.

TODO

  1. Add a flag to disable the consistency check to unblock the Gradle 8.7 release
  2. Add some more diagnostics to confirm assumptions about stale content in the file hash cache being the root cause of our problems.
  3. Either stop the Gradle GitHub Action from tampering with file timestamps in the global caches, or exclude the file hash cache files from the caches it carries from executor to executor.
  4. Let LeastRecentlyUsedCacheCleanup remove deleted entries from the file hash cache. It should also invalidate any relevant parts of the VFS. (As a simplification we could just drop both caches whenever a cleanup happens.)
  5. Integrate the file hash cache and the VFS more closely; make sure the relevant entries are removed from the file hash cache when a part of the VFS is invalidated. We should do this with FileContentCache as well. There should be a single mechanism inside a daemon to invalidate everything we know about a location, and it should invalidate all caches.
  6. Implement a mechanism where cache cleanup can force other daemons to drop deleted content from both the file hash cache and the VFS. It would be ideal to make sure no other daemon actually runs while cache cleanup happens.

from gradle.

lptr avatar lptr commented on July 3, 2024

Actually, for the non-ephemeral case, when we have the daemon surviving between builds, it's possible the VFS itself retains the stale hashes after the cleanup. FTR, the services wrap each other thusly:

graph TD
  AssignImmutableWorkspaceStep:::highlight --> FileCollectionSnapshotter
  FileCollectionSnapshotter --> OutputSnapshotter
  OutputSnapshotter --> FileSystemAccess
  FileSystemAccess --> VirtualFileSystem
  VirtualFileSystem:::highlight --> FileHasher:::highlight

  classDef highlight stroke-width:4px

from gradle.

lamphamTL avatar lamphamTL commented on July 3, 2024

I consistently get this issue with the task DexMergingTask, with or without daemon, both on CI and locally.
I'm using Gradle 8.6

from gradle.

bigdaz avatar bigdaz commented on July 3, 2024

@lamphamTL when you say "consistently", does this mean that the issue reoccurs even after you remove the offending transforms-4/XXX directory? If not, please make a full copy of your ~/.gradle directory while your project is in a bad state.

Then, some experiments to try:

  1. Run the build with a clean Gradle User Home (ie ./gradlew -g HOME ...
  2. Run the build with a fresh clone of your project repository

Knowing if these experiments can reproduce the issue will help. We are having trouble getting a reliable reproducer for this issue, so any evidence you can provide may be helpful.

from gradle.

lamphamTL avatar lamphamTL commented on July 3, 2024

@bigdaz no, removing GUH caches did make the first build to work; however, it starts failing consistently from the second one

from gradle.

lptr avatar lptr commented on July 3, 2024

I've made the handling of the problem more sophisticated in:

With this change when Gradle bumps into an inconsistent workspace, it will not fail the build, but instead discard the workspace and re-run the work. It will also produce a deprecation warning with some details:

> Transform lib1.jar (project :lib) with Duplicator
The contents of the immutable workspace '...' have been modified. This behavior has been deprecated. This will fail with an error in Gradle 9.0. These workspace directories are not supposed to be modified once they are created. The modification might have been caused by an external process, or could be the result of disk corruption. The inconsistent workspace has been moved to '...', and will be recreated.
outputDirectory:
 - transformed (Directory, 7cfc7dbc0901a39cf9a5f44f7ea491b8)
   - 0 (Directory, 78af062fd74d95b8d382f5adabb136f5)
     - lib1-green.jar (RegularFile, 1ba4170a71868b572bca2e747af7a707)
   - tamper-tamper.txt (RegularFile, c2a9b87422fcb39f4f55ff182939992b)

resultsFile:
 - results.bin (RegularFile, 44ec6391aaade3532e59b1d99d292b4d)

The more sophisticated handling of the problem will be included in 8.7-rc-4 (soon to be released).

I moved this issue to 8.8-rc-1, as we still need to figure out what causes the false positives, and come up with a fix.

from gradle.

mustafa-antonio avatar mustafa-antonio commented on July 3, 2024

@lptr I encountered this error today on my GitLab CI server after update to Gradle 8.6. I'm using remote Gradle cache to share between branches therefore Gradle cache directory changes every time.

It happens consistently for all branches which fetched cache created by main branch so maybe that's a hint to reproduce it.

Local machine so far wasn't affected.

Also not sure if relevant but I'm also using custom GRADLE_ENCRYPTION_KEY on my CI.

from gradle.

lptr avatar lptr commented on July 3, 2024

Could you please try using 8.7-rc-4? It should have the improved behavior where we don't fail the build anymore.

from gradle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.