Code Monkey home page Code Monkey logo

Comments (11)

detrout avatar detrout commented on September 2, 2024

running a mirror of the linked ftp site.

from datasets.

as-com avatar as-com commented on September 2, 2024

FileZilla says there is 5.1 TB of data in there

from datasets.

detrout avatar detrout commented on September 2, 2024

Thanks for the warning @as-com relaunched on a server with 9T of scratch space, though I'll need to find a better home in a few weeks. I suppose worst case I could buy a hard disk or two.

from datasets.

detrout avatar detrout commented on September 2, 2024

Overnight only transferred 13 GB, trying to increase number of parallel requests, though that's generating warnings.

Dear Scientists, please consider including bittorrent links for your public data.

from datasets.

detrout avatar detrout commented on September 2, 2024

Have ~690G so far

from datasets.

detrout avatar detrout commented on September 2, 2024

Is there a slack team? Could I join?

from datasets.

mxplusb avatar mxplusb commented on September 2, 2024

@nickrsan can you send @detrout an invite to slack? thanks!

from datasets.

ethanwhite avatar ethanwhite commented on September 2, 2024

From the README it looks like we probably don't need most of this 5 TB. In summary I think we need ghcnd-all.tar.gz and the metadata .txt files, which is only ~3 GB. I think most of the directories are multiple uncompressed copies of the data in ghcnd-all.tar.gz.

The grid directory is described as "Directory with the GHCN-Daily gridded dataset known as HadGHCND", so it couldn't hurt to grab it as well (~5 GB).

from datasets.

ethanwhite avatar ethanwhite commented on September 2, 2024

Actually, like the the main directory we just need ghcnd.grid.tar.gz and the README from the grid directory, so < 1GB. I now have copies of all everything I mentioned down.

from datasets.

detrout avatar detrout commented on September 2, 2024

Well I have all 4.7 T. Though I need to figure out how to update the daily portions of the mirror

from datasets.

detrout avatar detrout commented on September 2, 2024

I've been occasionally rerunning mirror to update the archive.

I just done a zfs snapshot so I can see easily see whats changed between mirrors.

from datasets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.