Code Monkey home page Code Monkey logo

Comments (4)

hartator avatar hartator commented on August 22, 2024

It appears that the page between 20110827210108 and 20121007234751 wasn't modified then become a 404 in 20130110230837. Wayback Machine will return the first version of the page it encountered if it wasn't modified after.

Ref: http://web.archive.org/cdx/search/cdx?url=http://tohokinemakan.jp/index.html&gzip=false

We do had a bug with the 404 of 2013. I've pushed a fix to 1.0.0. If you update to 1.0.0, it should show up with the -a/--all flag.

➜  gem git:(master) ✗ bin/wayback_machine_downloader http://tohokinemakan.jp/index.html -l -a
[
{"file_url":"http://tohokinemakan.jp:80/index.html","timestamp":20130110230837,"file_id":"index.html"},
]

from wayback-machine-downloader.

wangqr avatar wangqr commented on August 22, 2024

I have upgraded to version 1.0.0. It seems that I need the --all option for 200 pages as well.
For example: http://web.archive.org/cdx/search/cdx?url=http://tohokinemakan.jp/information.html&gzip=false
This page has 2 different version (with different content). Both of them have 200 response code. But without --all the old one will be fetched.

$ wayback_machine_downloader http://tohokinemakan.jp/information.html -a -l
[
{"file_url":"http://tohokinemakan.jp:80/information.html","timestamp":20121008235908,"file_id":"information.html"},
]
$ wayback_machine_downloader http://tohokinemakan.jp/information.html -l
[
{"file_url":"http://tohokinemakan.jp:80/information.html","timestamp":20100815073403,"file_id":"information.html"},
]

from wayback-machine-downloader.

hartator avatar hartator commented on August 22, 2024

Try to update to 1.1.2, it should download the correct last version of files with or without the --all flag. It's also now capable of parsing more deeply snapshot pages if you have a large website.

from wayback-machine-downloader.

wangqr avatar wangqr commented on August 22, 2024

Fixed in 7eedc1a.

from wayback-machine-downloader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.