Code Monkey home page Code Monkey logo

Comments (5)

steadyfish avatar steadyfish commented on September 18, 2024

Hi @masalmon,

Did you get this error while trying to download any other data?

This indian railways dataset seems quite big and the way data.gov.in API is set up, it allows to fetch only a 100 records per API call. The fetch_data() function was made to make multiple API calls to download the entire dataset. To avoid this, I have added a max_obs parameter (defaulted to 500) to fetch_data() function. This will limit the number of API calls being made (500 / 100 = 5 calls, in this case). This perhaps could resolve the error you are getting.

Could you try again after re-installing this package?

from ogdindiar.

maelle avatar maelle commented on September 18, 2024

Thanks for being so reactive! 👍

100 is a very small limit. 😞 In the API of the OpenAQ platform for which I've written a R package I've been luckier than you: the limit is 1000, they do paging and you can get the total number of measurements so you know how many calls you need to make. In your case, you have to do it "blindly" because the API doesn't return you the number of lines in the original file, what a pity!

No I didn't get the error with other datasets I had tried. They were much smaller.

I have installed the new version and I got this error

lala <- fetch_data("b46200c1-ca9a-4bbe-92f8-b5039cc25a12", max_obs=70000)
Error in function (type, msg, asError = TRUE)  : 
  Unknown SSL protocol error in connection to data.gov.in:443

Then I did it a second time and got a new error

lala <- fetch_data("b46200c1-ca9a-4bbe-92f8-b5039cc25a12", max_obs=70000)
Error in function (type, msg, asError = TRUE)  : 
  SSL read: error:00000000:lib(0):func(0):reason(0), errno 10053

I tried with a limit closer to the number of lines in the timetable (69007)

lala <- fetch_data("b46200c1-ca9a-4bbe-92f8-b5039cc25a12", max_obs=69010)
Error in function (type, msg, asError = TRUE)  : 
  Failed to connect to data.gov.in port 443: Timed out

Is the data too big? It's quite a limitation of the API, ah! But I guess I could still use it if used the filter argument and queried over things that interest me (like all trains from Hyderabad). In this case, I really wanted the whole thing.

I have two suggestions (because some tables will be bigger than 100 or 500 lines without being as bing as the train timetable 😄 ):

  • having a verbose option so when it is TRUE you can say "querying observations X to Y" (so the user doesn't get desperate and knows something is going on)
  • If the calls stop because max_obs is reached, but not because all data was retrieved (in my case if I set 500 as the max_obs I get no error and I cannot know whether there are more lines in the original table), then give a warning, e.g. "There were more observations than this, try again with a higher max_obs".

Thanks again for your help and your package!

from ogdindiar.

steadyfish avatar steadyfish commented on September 18, 2024

Sure @masalmon, I'll incorporate your suggestions. :)

from ogdindiar.

maelle avatar maelle commented on September 18, 2024

Cool, thank you!

I was also thinking that your package needs use cases. The open data platform is a goldmine! Maybe in the next weeks I'll do something with the trains (e.g. querying all trains from a city and making a map of all trains). I'm sure it would motivate people to use the package and the data. 😄 And then you could add cool pictures/gif from the data in the README for teasing. 😆

from ogdindiar.

steadyfish avatar steadyfish commented on September 18, 2024

Sounds good, Thanks! :)

from ogdindiar.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.