Code Monkey home page Code Monkey logo

Comments (7)

sckott avatar sckott commented on August 22, 2024

thanks for this @mbojan

Agree, would be good to allow writing to disk. Definitely interested in a PR

I'm not sure I'm sold on passing in any arbitrary function to do the writing though given that it's not likely to be clear to the user what data format they are passing to a write function, and so what function they'd need to use. Perhaps it makes more sense to have a few options for users? and document those so they are clear how to use them. E.g., looks like we have raw XML as text option, and parsed XML, we could have a set of functions as options for users

It would be nice to just write to disk in the GET call using e.g., httr::write_disk(), but we do need to grab the resumptiontoken from the result, so that seems to not be an option

from oai.

mbojan avatar mbojan commented on August 22, 2024

Great.

General options for dumping that come to my mind:

  1. Writing raw XML text to file or a connection (http://www.rdocumentation.org/packages/base/functions/connections) in general
  2. Writing to a DB
  3. If as != "raw" perhaps creating rds files for each page (?)

Options (1) and (3) seem rather straightforward.

For option (2) I am not sure whether it will be feasible because of plethora of possibilities how this could be done: different DB layouts, table and field names, writing semi-parsed results etc....

Perhaps the best option is to leave the possibility to inject any dumper function with a couple of predefined ones (e.g. like the ones above)?

from oai.

sckott avatar sckott commented on August 22, 2024

As a start, I think we should do 1 and 3, then try to work in 2 later. Sound okay? we can open a separate issue for writing to a DB

Perhaps the best option is to leave the possibility to inject any dumper function with a couple of predefined ones (e.g. like the ones above)?

That sounds good.

from oai.

mbojan avatar mbojan commented on August 22, 2024

Actually I need the DB functionality first. I will write a dumper which will be writing two fields: identifier and raw XML with the content. It is rather universal I guess and also an illustration how it could be done.

Any suggestions how to document it? while_oai is not exported right now and dumpers do not require modifying other functions as arguments can be passed through ....

  1. Should we export while_oai?
  2. Add dumper-related arguments to all list_records, list_identifiers and so on?
  3. Write dumpers in a separate file and explain there how they can be used?

I like (3).

What do you think?

from oai.

mbojan avatar mbojan commented on August 22, 2024

OK @sckott, I think I'm ready for a PR (branch dumpers in my fork). However, I made two more modifications that might be useful. At this moment all three are in separate branches rebased onto your master.

  1. Reading XMLs safely. I wrote a wrapper over xml2::read_xml that can catch some errors. In particular if there is an error about illegal characters in downloaded XML there is a restart that removes them. (I encountered this with working with my OAI-PMH service).
  2. See #10 . I slightly modified handle_errors (now as handle_errors2) so that it takes parsed XML as input, not raw text. XML is alread parsed anyway so there is no need to do it twice.

Ideally I would like to submit a PR of all three with dumpers rebased on top of (1) and (2). I need these anyway.

from oai.

sckott avatar sckott commented on August 22, 2024

sorry, I was on vacation away from computer for a few days.

Any suggestions how to document it?

A combination of 2 and 3. Each function that can use these parameters should have documentation for those parameters. Also, create a separate man file to go into more details about the dumpers, and can link to from each of the main functions (e.g., list_records())

from oai.

sckott avatar sckott commented on August 22, 2024

Ideally I would like to submit a PR of all three with dumpers rebased on top of (1) and (2). I need these anyway.

I prefer smaller PR's, so 1 and 2 separately or together, then the dumpers, would be ideal

from oai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.