Code Monkey home page Code Monkey logo

dctap-python's Introduction

dcmi

dctap-python's People

Contributors

dublincore avatar kcoyle avatar nishad avatar tombaker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dctap-python's Issues

Reverted to hand-entering version number in `docs/conf.py`

@nishad I noticed today that the last three RTD builds had failed - see https://readthedocs.org/projects/dctap-python/builds/14275110/ .

The builds started failing when we changed the hand-entered version number in conf.py to be dctap.__version__. At first it failed because this variable was unfindable until I added import dctap to conf.py. But I did not realize until today that import dctap was now causing the build to "fail", perhaps because the documentation online looked perfectly fine.

Now that I have reverted to hand-entering the version number (from dctap/__init__.py), the build is passing again.

I do not think that hand-entering the version number imposes an undue burden, but do you see a way to make it work with dctap.__version__?

CSV file is valid as CSV

See dcmi/dctap#28

The program should verify that the input file as valid as CSV:

  • each row has the same number of columns

If errors found:

  • stop program with message to user

Pylint says return statements are inconsistent

At https://github.com/dcmi/dctap-python/blob/main/dctap/utils.py#L22

def is_uri_or_prefixed_uri(uri):
    """True if string is URI or superficially looks like a prefixed URI."""
    if is_uri(uri):
        return True
    if re.match("[A-Za-z0-9_]*:[A-Za-z0-9_]*", uri):  # looks like prefixed URI
        return True

Pylint says: "R1710: Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements)"

@nishad Can you advise?

Warning about shapeID

The program issues a warning for each use of a shapeID that is not in URL form:

  "warnings": {
    "Scholarly Resource": {
      "shapeID": [
        "Value 'Scholarly Resource' does not look like a URI.",
        "Value 'Scholarly Resource' does not look like a URI.",

Because we do not require the shapeID to be a URI, this warning could be confusing. I think it should be removed.

Carriage returns?

At huggingFace, the default text output does not have line ends:

['Tabular Application Profile (TAP)', '    Shape', '        shapeID                  Scholarly Resource', '        Statement Template', '            propertyID           dct:abstract', '            propertyLabel        Abstract', '            valueDataType        xsd:string', '            note                 Free text', '        Statement Template', '            propertyID           dct:accessRights', '            propertyLabel        Access rights', '            valueDataType        xsd:anyURI', "            valueConstraint      ['http://vocabularies.coar-repositories.org/documentation/access_rights/']", '            valueConstraintType  iristem', '            note                 A term from COAR vocabulary (http://vocabularies.coar-repositories.org

In a text program it looks like:
Screenshot 2023-11-29 at 8 23 13 AM

I assume this is a question of carriage return types, as I think that huggingface doesn't make any modifications to output.

python version required for dctap

pyproject.toml currently requires Python 3.9, but I'm not at all sure that the version needs to be so recent.

dctap can now be installed with pip (see PyPI project), it would be good to adjust the version required to the lowest possible.

@nishad Do you have a way to test whether it will work with earlier Python versions?

Installing from GitHub using pip

There is an alternative option to install from GitHub using pip.

pip install git+https://github.com/dcmi/dctap-python.git

Works with recent versions of pip.

Issues with .dctaprc as a config file name

.*rc files are a standard convention, but using โ€œ.dctaprcโ€ raises some issues.

  • It is a hidden file in NIX OSs as a dotfile, which may be tricky for many users to find or modify, especially within the working directory. In the long run, this implementation can raise a lot of unexpected side effects.
  • Keeping the file extension yaml or yml helps the text editors to syntax highlight and validate while editing these files.

DEFAULT_CONFIGFILE_NAME = ".dctaprc"

`yaml.safe_load` gets `PendingDeprecationWarning`

If ignore::PendingDeprecationWarning is commented out in pytest.ini one gets:

tests/test_config/test_config_get_config_dict.py::test_exit_if_configfile_has_bad_yaml
  /Users/tbaker/github/dcmi/dctap-python/dctap/config.py:73: PendingDeprecationWarning:
  safe_load will be removed, use

    yaml=YAML(typ='safe', pure=True)
    yaml.load(...)

  instead
    return yaml.safe_load(default_config_yaml)

-- Docs: https://docs.pytest.org/en/stable/warnings.html

Element aliases causing error in dctap.yaml?

Running the latest version of dctap with a dctap.yml that sets an element alias causes the error
Valid DCTAP CSV must have a 'propertyID' column.

to repeat use:
dctap.yaml:

### dctap configuration file (in YAML format)
extra_statement_template_elements:
 - severity

element_aliases:
     "Mand": "mandatory"
     "Rep": "repeatable"

tap.csv:

shapeID,propertyID,propertyLabel,Mand,Rep,valueNodeType,valueDataType,valueConstraint,valueConstraintType,valueShape,note,severity
BookShape,dct:title,Title,TRUE,FALSE,Literal,rdf:langString,,,,,Violation
BookShape,dct:creator,Author,FALSE,TRUE,IRI BNODE,,,,AuthorShape,,Warning
BookShape,sdo:isbn,ISBN-13,FALSE,FALSE,Literal,xsd:string,^(\\d{13})?$,pattern,,"Just the 13 numbers, no spaces or separators.",Violation
BookShape,rdf:type,Type,TRUE,FALSE,IRI,,sdo:Book,,,,Warning
AuthorShape,rdf:type,Type,TRUE,TRUE,IRI,,foaf:Person,,,,Warning
AuthorShape,foaf:givenName,Given name,FALSE,TRUE,Literal,xsd:string,,,,,Warning
AuthorShape,foaf:familyName,Family name,FALSE,TRUE,Literal,xsd:string,,,,,Warning

Removing the element_alias block from dctap.yaml fixes the error message.

Spurious warning about invalid valueNodeType when added in extra_value_node_types

I'm using the csvreader() method in another program. The TAP I am reading has one entry of valueNodeType of IRI BNODE (line 8). The dctap yaml config I am using has

extra_value_node_types:
 - iri bnode

but still I get {'valueNodeType': ["'iri bnode' is not a valid node type."]} in the "warnings_dict"

(I understand the values should be case-insensitive, but I've also tried IRI BNODE in the YAML config and it made no difference.)

The file loads fine and I can use it, but I think this is a spurious warning.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.