Code Monkey home page Code Monkey logo

hepdata-converter-ws-client's Introduction

GitHub Actions Status Coveralls Status License GitHub Releases PyPI Version GitHub Issues

hepdata-converter-ws-client

Light client wrapper for interaction with hepdata-converter-ws (Web Services). It is recommended to use this wrapper instead of manually creating requests to hepdata-converter-ws.

The reason for creating this package is the fact that the hepdata-converter-ws API requires compressing files into tar.gz format and encoding using Base64 in order to pass them as an argument in the JSON request. Doing this manually every time someone wants to call the hepdata-converter-ws API was a little cumbersome: that's why this light wrapper was born.

Additionally, the library provides additional functionality when it comes to writing the output of the convert function: instead of receiving raw tar.gz content it is possible to extract to a specified file path.

Sample usage

The library exposes one single function hepdata_converter_ws_client.convert which is very similar to hepdata_converter.convert. It accepts the additional argument url, and restricts input / output to str, unicode and file objects (objects supporting read, write, seek, tell).

The options parameter should be the same as with the hepdata_converter library.

The timeout parameter can be used to set a timeout for requests (defaults to 600s).

The library defines the exception hepdata_converter_ws_client.Error which will be thrown on timeouts or other errors connecting to the server.

A function hepdata_converter_ws_client.get_data_size gets the size in bytes of the JSON data that would be sent to the converter. This could be useful in checking that a maximum payload size imposed by a web server is not exceeded.

Function description

hepdata_converter_ws_client.convert function has proper docstring describing its arguments and return values. Similarly for hepdata_converter_ws_client.get_data_size with corresponding docstring.

Convert using file paths

Arguments passed as input and output can be file paths or file objects. Below is an example of how to utilise the convert function with file paths.

import hepdata_converter_ws_client

# using path to input file, and writing output directly to output_path
input_path = '/path/to/input.txt'
output_path = '/path/to/output/dir'
hepdata_converter_ws_client.convert('http://hepdata-converter-ws-addr:port', input_path, output_path,
                                    options={'input_format': 'oldhepdata'})

Convert using input path and output file object

Input can always be a file object (as long as the input Parser supports single files). Output can be a file object only if the keyword argument extract=False. In this case the binary content of the returned tar.gz file will be written to the output file object. It is then the responsibility of the user to decompress it.

import hepdata_converter_ws_client
from io import BytesIO
# using path to input file, writing to output stream
input_path = '/path/to/input.txt'
output = BytesIO()
hepdata_converter_ws_client.convert('http://hepdata-converter-ws-addr:port', input_path, output,
                                    options={'input_format': 'oldhepdata'}, extract=False)

hepdata-converter-ws-client's People

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hepdata-converter-ws-client's Issues

Correct call to tarfile.open in 'convert' function

One of the most frequent causes of failed submission jobs (see HEPData/hepdata#248, addressed by HEPData/hepdata#316), is a FileNotFoundError when converting from the oldhepdata format (see an example event in Flower and Sentry). The same exception is sometimes also raised by the convert_and_store task (see another example event in Flower and Sentry). In both cases the exception is raised by this line:

tarfile.open('r:gz', fileobj=BytesIO(r.content)).close()

This looks like invalid syntax. The first argument should be a keyword argument mode='r:gz', otherwise it is interpreted as being a pathname name, explaining the FileNotFoundError since a file named r:gz does not exist. What I don't understand is why this works for some calls but not for others. It might be the case that name is normally ignored if a fileobj argument is given, but if fileobj cannot be opened, then the code attempts to open a pathname name instead. Another case of a missing mode appears a few lines below:
with tarfile.open('r:gz', fileobj=BytesIO(r.content)) as tar:

The solution is probably just to replace 'r:gz' by mode='r:gz' in these two lines, but it would be good to have a better understanding of the failure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.