Code Monkey home page Code Monkey logo

Comments (11)

JeanFred avatar JeanFred commented on July 18, 2024

Sounds reasonable.

Currently the output file name is given by the downloading methods (which get it from make_thumbnail_name. This should be refactored so that the downloading methods only return the contents.

This way, we could test the presence of the file to download, and skip, if it is already there.

from commonsdownloader.

JeanFred avatar JeanFred commented on July 18, 2024

@symac Do you think this beahviour should be default (in which case I’ll implement a --force flag) or optional (which would call for a --skip flag) ?

(Any suggestions on how to call the flag too ? :) )

from commonsdownloader.

symac avatar symac commented on July 18, 2024

@JeanFred should be the default I think yes. And why not call the flag --overwrite ?

from commonsdownloader.

JeanFred avatar JeanFred commented on July 18, 2024

Thinking more about this − I can see two ways to go.

  1. look up whether the file to be saved already exists on disk
  2. implement a local cache of the file names, based on the list.

1/ would necessitate some refactoring (see comment above) but is fairly straightforward.
2/ would make execution faster, but would require more development and flushing cache mechanisms can be tricky.

Thoughts?

from commonsdownloader.

symac avatar symac commented on July 18, 2024

First solution should be easier to implement and for a downloading tool where every download is above one second (or more) I am not sure we can notice an extra check on the disk.

from commonsdownloader.

JeanFred avatar JeanFred commented on July 18, 2024

This is now resolved per adb81dd, which implements a local cache.

The problem of the first solution is that the name of the file on disk can only be determined after download − as the extension is retrieved from the HTTP answer. This would need quite some refactoring to become possible.

On the other hand, local caching sounded like fun and proved not so difficult to implement.

(Still needs a CLI argument to bypass the local cache I guess)

from commonsdownloader.

JeanFred avatar JeanFred commented on July 18, 2024

@symac, can you confirm this works as expected?

from commonsdownloader.

symac avatar symac commented on July 18, 2024

@JeanFred works fine, what about the argument to skip this ?

from commonsdownloader.

JeanFred avatar JeanFred commented on July 18, 2024

@symac This is not as easy as it sounds.

(I renamed the system introduced in adb81dd to « manifest » for better clarity − see 9744785

The current behaviour is

  • CommonsDownloader loads the local manifest (if it exists)
  • CommonsDownloader considers a file name:
    • If the file name is in the manifest, it is skipped
    • If not, the file is downloaded and its name is appended at the end of the manifest

What should the force flag do?

  • Load the manifest but do not look up file names in it
    • This means that when files already downloaded get downloaded, the manifest will contain the filename twice
  • Clearing the local manifest
    • The same output folder can be used for several downloads, and hence the same manifest. Clearing the manifest would also delete the files
  • Selectively clearing the files to be downloaded from the manifest, not the other ones
    • The files to be downloaded are loaded progressively, way later than the manifest reading.

Ideas ? :-)

from commonsdownloader.

JeanFred avatar JeanFred commented on July 18, 2024

@symac Any thoughts on this?

from commonsdownloader.

symac avatar symac commented on July 18, 2024

@JeanFred finding this issue at the bottom of my unread mailbox :)

I would add a fourth option :

  • Load the manifest, do not look up names in it, append the name to manifest only if it was not already int it

Does this sound feasible? [I have no need to use the file at the moment, juste feeling that it is not a correct behaviour to not answer questions in issues :)]

from commonsdownloader.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.