dcmi / dctap-python Goto Github PK
View Code? Open in Web Editor NEWDC Tabular Application Profile - Python library and utility
License: MIT License
DC Tabular Application Profile - Python library and utility
License: MIT License
@nishad I noticed today that the last three RTD builds had failed - see https://readthedocs.org/projects/dctap-python/builds/14275110/ .
The builds started failing when we changed the hand-entered version number in conf.py to be dctap.__version__
. At first it failed because this variable was unfindable until I added import dctap
to conf.py. But I did not realize until today that import dctap
was now causing the build to "fail", perhaps because the documentation online looked perfectly fine.
Now that I have reverted to hand-entering the version number (from dctap/__init__.py
), the build is passing again.
I do not think that hand-entering the version number imposes an undue burden, but do you see a way to make it work with dctap.__version__
?
See dcmi/dctap#28
The program should verify that the input file as valid as CSV:
If errors found:
At https://github.com/dcmi/dctap-python/blob/main/dctap/utils.py#L22
def is_uri_or_prefixed_uri(uri):
"""True if string is URI or superficially looks like a prefixed URI."""
if is_uri(uri):
return True
if re.match("[A-Za-z0-9_]*:[A-Za-z0-9_]*", uri): # looks like prefixed URI
return True
Pylint says: "R1710: Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements)"
@nishad Can you advise?
The program issues a warning for each use of a shapeID that is not in URL form:
"warnings": {
"Scholarly Resource": {
"shapeID": [
"Value 'Scholarly Resource' does not look like a URI.",
"Value 'Scholarly Resource' does not look like a URI.",
Because we do not require the shapeID to be a URI, this warning could be confusing. I think it should be removed.
At huggingFace, the default text output does not have line ends:
['Tabular Application Profile (TAP)', ' Shape', ' shapeID Scholarly Resource', ' Statement Template', ' propertyID dct:abstract', ' propertyLabel Abstract', ' valueDataType xsd:string', ' note Free text', ' Statement Template', ' propertyID dct:accessRights', ' propertyLabel Access rights', ' valueDataType xsd:anyURI', " valueConstraint ['http://vocabularies.coar-repositories.org/documentation/access_rights/']", ' valueConstraintType iristem', ' note A term from COAR vocabulary (http://vocabularies.coar-repositories.org
In a text program it looks like:
I assume this is a question of carriage return types, as I think that huggingface doesn't make any modifications to output.
pyproject.toml currently requires Python 3.9, but I'm not at all sure that the version needs to be so recent.
dctap
can now be installed with pip (see PyPI project), it would be good to adjust the version required to the lowest possible.
@nishad Do you have a way to test whether it will work with earlier Python versions?
There is an alternative option to install from GitHub using pip.
pip install git+https://github.com/dcmi/dctap-python.git
Works with recent versions of pip.
.*rc
files are a standard convention, but using โ.dctaprc
โ raises some issues.
yaml
or yml
helps the text editors to syntax highlight and validate while editing these files.Line 44 in 51ad0de
CLI argument generate
has changed to read
, but example commands in the README Quick start have not been updated to reflect this.
Attaching 20231218_dctap-python_cli_copy_paste.txt for "No such command" error message and other details. Python 3.11.7 in Windows PowerShell.
Ran successfully with dctap read --warnings my_profile.csv
.
If ignore::PendingDeprecationWarning
is commented out in pytest.ini one gets:
tests/test_config/test_config_get_config_dict.py::test_exit_if_configfile_has_bad_yaml
/Users/tbaker/github/dcmi/dctap-python/dctap/config.py:73: PendingDeprecationWarning:
safe_load will be removed, use
yaml=YAML(typ='safe', pure=True)
yaml.load(...)
instead
return yaml.safe_load(default_config_yaml)
-- Docs: https://docs.pytest.org/en/stable/warnings.html
The function dctap._get_rows
reads CSV from a file, but could there be an alternative function to read the CSV from stdin?
One easy way to do this could be to use click.Path
, which allows the use of a single dash to read input from stdin.
Running the latest version of dctap with a dctap.yml that sets an element alias causes the error
Valid DCTAP CSV must have a 'propertyID' column.
to repeat use:
dctap.yaml:
### dctap configuration file (in YAML format)
extra_statement_template_elements:
- severity
element_aliases:
"Mand": "mandatory"
"Rep": "repeatable"
tap.csv:
shapeID,propertyID,propertyLabel,Mand,Rep,valueNodeType,valueDataType,valueConstraint,valueConstraintType,valueShape,note,severity
BookShape,dct:title,Title,TRUE,FALSE,Literal,rdf:langString,,,,,Violation
BookShape,dct:creator,Author,FALSE,TRUE,IRI BNODE,,,,AuthorShape,,Warning
BookShape,sdo:isbn,ISBN-13,FALSE,FALSE,Literal,xsd:string,^(\\d{13})?$,pattern,,"Just the 13 numbers, no spaces or separators.",Violation
BookShape,rdf:type,Type,TRUE,FALSE,IRI,,sdo:Book,,,,Warning
AuthorShape,rdf:type,Type,TRUE,TRUE,IRI,,foaf:Person,,,,Warning
AuthorShape,foaf:givenName,Given name,FALSE,TRUE,Literal,xsd:string,,,,,Warning
AuthorShape,foaf:familyName,Family name,FALSE,TRUE,Literal,xsd:string,,,,,Warning
Removing the element_alias block from dctap.yaml fixes the error message.
I'm using the csvreader() method in another program. The TAP I am reading has one entry of valueNodeType of IRI BNODE
(line 8). The dctap yaml config I am using has
extra_value_node_types:
- iri bnode
but still I get {'valueNodeType': ["'iri bnode' is not a valid node type."]}
in the "warnings_dict"
(I understand the values should be case-insensitive, but I've also tried IRI BNODE
in the YAML config and it made no difference.)
The file loads fine and I can use it, but I think this is a spurious warning.
@nishad The current version number is 0.2, but that seems low. Can you point me to guidelines we might follow?
if re.match("[A-Za-z0-9_]*:[A-Za-z0-9_]*", uri): # looks like prefixed URI
return True
Pylint suggests: R1703: The if statement can be replaced with 'return bool(test)' (simplifiable-if-statement)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.