Code Monkey home page Code Monkey logo

Comments (14)

mbojan avatar mbojan commented on August 22, 2024 1

I'm on it.

from oai.

mbojan avatar mbojan commented on August 22, 2024

Yet another option for getting the resumptionToken is to use xml2::read_html. It seems to be much more tolerant with malformed files (including illegal characters which seem to be deleted).

It should not be used instead of read_xml though, because, among other things, it makes or XML tags in lower case. But seems OK if we are only interested in getting the token.

from oai.

mbojan avatar mbojan commented on August 22, 2024

I am trying the following modification for while_oai to (try to) proceed with harvesting even in case of XML faults:

while is.character(token)
  GET()
  try to parse XML with `read_xml` (with optional removal of invalid characters)
  IF parsing is ok
    check for oai-pmh errors
    look for `resumptionToken`
    process the results as determined by `as` and `verb`
    collect the results in `out` and/or pass to dumper
  ELSE (i.e. read_xml fails)
   try to parse with `read_html`, if fails dump raw to file and stop()
   check for oai-pmh errors
   look for `resumptionToken`
   IF `as="raw"` 
     collect raw results in `out` and/or pass to dumper
   ELSE
    dump raw XML to a file
    warning("bad XML dumped to file")
  IF has `resumptionToken`
   token <- resumptionToken
  ELSE
   token <- 1

from oai.

mbojan avatar mbojan commented on August 22, 2024

The above assumes that the result of parsing with read_html is unreliable. So we write raw XML to a file and try to proceed with the resumptionToken if any.

from oai.

sckott avatar sckott commented on August 22, 2024

@mbojan tests are now failing on handle_errors fxn, the class returned is no longer oai-pmh_error, but Rcpp::exception - any thoughts?

run the test suite to see what happens

from oai.

mbojan avatar mbojan commented on August 22, 2024

Looks like OAI-PMH service at pbn.nauka.gov.pl is malfunctioning (certificate problems). Only those tests seem to fail.

from oai.

sckott avatar sckott commented on August 22, 2024

Hmm, okay, anything we should do to fail better in those cases?

from oai.

mbojan avatar mbojan commented on August 22, 2024

I'll change the test URLs and see if it the tests pass correctly.

What was failing is actually httr::GET not oai error handling. These test are suppose to test the correct catching of OAI-PMH errors conditional upon an assumption that the test URLs actually lead to these errors. So I don't think there is a need to modify the tests apart from coming up with URLs that are correctly returning OAI-PMH exceptions from a fully functional OAI-PMH server. What do you think?

That's a general problem with testing your system against some external system...

from oai.

sckott avatar sckott commented on August 22, 2024

Okay, i'll have a look at the http request error catching

from oai.

mbojan avatar mbojan commented on August 22, 2024

Do you think they deserve a dedicated "net" of tests to catch?

One thing I might add is the OAI-PMH error handling tests first check whether the request returns a proper result at all before parsing it to learn what the OAI-PMH exception is.

from oai.

sckott avatar sckott commented on August 22, 2024

Do you think they deserve a dedicated "net" of tests to catch?

we'll see, I'll look into it

first check whether the requests a proper result at all before parsing it to learn what the OAI-PMH exception is.

makes sense

from oai.

mbojan avatar mbojan commented on August 22, 2024

Perhaps it makes sense for those tests that rely on contacting some OAI-PMH service to first check whether the service is available and then skip the tests if it is not available?

Inspired by "Skipping a test" here http://r-pkgs.had.co.nz/tests.html .

We would have to write something like oai_available(url) though.

from oai.

sckott avatar sckott commented on August 22, 2024

yeah, sounds good

from oai.

sckott avatar sckott commented on August 22, 2024

closing for now, we can open a new issue if ther'es still problems along these lines

from oai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.