Code Monkey home page Code Monkey logo

Comments (4)

dbnicholson avatar dbnicholson commented on July 3, 2024

While I think it's nice to use the timestamp of the destination file as the source for the If-Modified-Since header, it loses the URL. It would be better to keep a map of URL to timestamp like is done with ETags. That way if the URL changes, the timestamp won't be found and no If-Modified-Since header will be added.

from gradle-download-task.

dbnicholson avatar dbnicholson commented on July 3, 2024

An extension of this problem is that if you set useETag, then it won't be used because the URL isn't in the cached JSON and then it's operating entirely on Last-Modified. So, even though the ETag of the new URL doesn't match the previous download, it's essentially being ignored.

from gradle-download-task.

michel-kraemer avatar michel-kraemer commented on July 3, 2024

Thanks for reporting this. I can see that this behaviour seems odd but everything works as it's supposed to.

I understand that saving the URL in a map like it's done with the ETag would solve your particular example, but I'm not sure if it would address the root of the problem. It is true that the plugin uses the file's timestamp as a fallback if it cannot find an ETag for a given URL. However, if it can find one, it also skips downloading.

Try this: Enable useETag and then download welcomeScreenVersion 6.44.0 first. Change the version to 6.46.0 and download this too. The plugin will now know both ETags. If you change the version back to 6.44.0, you will see the exact same behaviour you're seeing in your example: the task will be skipped.

The problem is even worse here, by the way. If you do it the other way around, the plugin will not be able to know which of the two ETags is newer and will not overwrite version 6.46.0 with 6.44.0 like it would do if useETag was disabled and it only compared the timestamps.

The only way to really solve the root cause would be to know if the task's inputs have changed since the last call, and if they had, skip any up-to-date checks and redownload the file. This might become possible in the future when Gradle's configuration cache becomes enabled by default, but it is certainly not possible now.

From my point of view, if the user changes the URL of the file to download, they should manually remove the destination file or call the clean task. Another (more preferable) possibility would be to include the version number in the destination file name. If you need the filename in a subsequent task, you can always get it through downloadWelcomeScreen.dest or downloadWelcomeScreen.outputs.

Let me know if this helps. I'm open to feedback and ideas!

P.S.: I'm currently out of office with limited Internet access. It might take me some days to reply.

from gradle-download-task.

dbnicholson avatar dbnicholson commented on July 3, 2024

I understand that saving the URL in a map like it's done with the ETag would solve your particular example, but I'm not sure if it would address the root of the problem. It is true that the plugin uses the file's timestamp as a fallback if it cannot find an ETag for a given URL. However, if it can find one, it also skips downloading.

This just seems like a bug in the cache implementation. It's only storing the URL but what matters is the ETag of the saved file. That's why you use the timestamp of the saved file for the Last-Modified handling, right? Without the saved file path, you can't make an accurate decision about when to invalidate the cache entries since you don't have enough information. You have the previous ETag of a URL, but you don't know if the file your using it to make decisions about was created from one of those URLs or not. Knowing the ETag of a URL at some point in time doesn't actually give you any useful information if it's not associated to a file you have locally.

I think the ETag cache should be keyed by the saved path:

{
  "path/to/downloaded/file": {
    "URL": "https://someurl",
    "ETag": "\"someetag\""
  }
}

This has 2 effects:

  • A strong connection is maintained between the saved file and the parameters used to form it.
  • The previous cache entry is automatically deleted when you overwrite that paths entry it after a new download. You can no longer accidentally get invalid cache data for the file at that path.

From my point of view, if the user changes the URL of the file to download, they should manually remove the destination file or call the clean task. Another (more preferable) possibility would be to include the version number in the destination file name. If you need the filename in a subsequent task, you can always get it through downloadWelcomeScreen.dest or downloadWelcomeScreen.outputs.

That's fine if the person making the changes and doing the builds is always the same person, but it breaks down otherwise. What if my CI system has the download cached? What if another user in my team has the download cached and they didn't see that the URL changed? The only reliable way to handle that is to not cache at all or to always clean, which are effectively the same thing.

I'll change my tasks so they include the version number in the saved filename. I was trying to avoid it before so that when the code references the asset it can always use the generic name, but I don't actually need to reference it from the code right now.

To circle back to one thing I said in the issue description - I think there should be an option to ignore Last-Modified if you're using an ETag and the server supplies one. That would put it in line with the HTTP behavior for If-Modified-Since:

When used in combination with If-None-Match, it is ignored, unless the server doesn't support If-None-Match.

Which makes sense. If I say "I have a file that has this identifier, tell me if the one you have matches", then I don't care if it's older or newer. I only care whether the content is the same. It's like what a file archiver would do with checksums.

from gradle-download-task.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.