Comments (11)
Sounds reasonable.
Currently the output file name is given by the downloading methods (which get it from make_thumbnail_name
. This should be refactored so that the downloading methods only return the contents.
This way, we could test the presence of the file to download, and skip, if it is already there.
from commonsdownloader.
@symac Do you think this beahviour should be default (in which case I’ll implement a --force
flag) or optional (which would call for a --skip
flag) ?
(Any suggestions on how to call the flag too ? :) )
from commonsdownloader.
@JeanFred should be the default I think yes. And why not call the flag --overwrite
?
from commonsdownloader.
Thinking more about this − I can see two ways to go.
- look up whether the file to be saved already exists on disk
- implement a local cache of the file names, based on the list.
1/ would necessitate some refactoring (see comment above) but is fairly straightforward.
2/ would make execution faster, but would require more development and flushing cache mechanisms can be tricky.
Thoughts?
from commonsdownloader.
First solution should be easier to implement and for a downloading tool where every download is above one second (or more) I am not sure we can notice an extra check on the disk.
from commonsdownloader.
This is now resolved per adb81dd, which implements a local cache.
The problem of the first solution is that the name of the file on disk can only be determined after download − as the extension is retrieved from the HTTP answer. This would need quite some refactoring to become possible.
On the other hand, local caching sounded like fun and proved not so difficult to implement.
(Still needs a CLI argument to bypass the local cache I guess)
from commonsdownloader.
@symac, can you confirm this works as expected?
from commonsdownloader.
@JeanFred works fine, what about the argument to skip this ?
from commonsdownloader.
@symac This is not as easy as it sounds.
(I renamed the system introduced in adb81dd to « manifest » for better clarity − see 9744785
The current behaviour is
- CommonsDownloader loads the local manifest (if it exists)
- CommonsDownloader considers a file name:
- If the file name is in the manifest, it is skipped
- If not, the file is downloaded and its name is appended at the end of the manifest
What should the force
flag do?
- Load the manifest but do not look up file names in it
- This means that when files already downloaded get downloaded, the manifest will contain the filename twice
- Clearing the local manifest
- The same output folder can be used for several downloads, and hence the same manifest. Clearing the manifest would also delete the files
- Selectively clearing the files to be downloaded from the manifest, not the other ones
- The files to be downloaded are loaded progressively, way later than the manifest reading.
Ideas ? :-)
from commonsdownloader.
@symac Any thoughts on this?
from commonsdownloader.
@JeanFred finding this issue at the bottom of my unread mailbox :)
I would add a fourth option :
- Load the manifest, do not look up names in it, append the name to manifest only if it was not already int it
Does this sound feasible? [I have no need to use the file at the moment, juste feeling that it is not a correct behaviour to not answer questions in issues :)]
from commonsdownloader.
Related Issues (12)
- Alter logging to by default display INFO messages
- CommonsDownloader should allow to download the full-size image HOT 4
- Investigate running downloads in parallel
- Thumbnails URLs should be retrieved through the MediaWiki API by batches instead of using Special:FilePath
- feature(metadata): embed metadata or extract in a file for quick attribution
- SyntaxError: invalid syntax in downloading from Category HOT 4
- Default width when value is too large HOT 2
- CommonsDownloader should not use thumb.php HOT 2
- UTF-8 file names decoding/encoding issue HOT 4
- Download fails for file "C'est là le moulin?.JPG" HOT 2
- Add support for downloading a category HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from commonsdownloader.