Comments (23)
Can these instead be encoded using the existing identifier property and PropertyValue? In which propertyID could perhaps be urn:oid:1.3.60 and value could be 053393007
. Or change identifier so that DefinedTerm can be used as an alternative to PropertyValue.
from schemaorg.
if it goes in the documentation of identifier, I can give up this oid proposal.
Let's do that. Can you make a PR?
from schemaorg.
Sadly, things are complicated.
- Australia has VAT, Australian VAT numbers are not prefixed with
AU
. - European VAT numbers are prefixed with the country code if they are intercommunity codes (i.e. used EU wide), they technically don't need to be when used only within one country (although the trend is to always put the prefix).
- Greece is part of the EU, the country code is
GR
, but VAT numbers are prefixed withEL
. - Northern Ireland is not a country, but so they have VAT numbers prefixed with an ISO extension (
XI
). - There are magic EU wide VAT numbers prefixed with
EU
. - Swiss VAT numbers are prefixes with
CHE
which is the ISO 3 letter code. - Many of these numbers have two formats, a machine readable version (no space, dots, dashes) and one with a defined formatting. Is a number with the wrong formatting (dots in the wrong place) correct or not?
- the term taxID is very vague, most countries have a number given out by the government, which might be used in some administrative function, taxation being one of them. Add to the confusion that VAT is taxation (the T is a hint).
- There are often multiple identifiers at play, for instance in France, the following numbers are relevant:
- The SIREN code identifies the company.
- The SIRET code identifies a branch (or the main seat).
- The RCS code is the SIREN code formatted differently with the addition of the city the company is registered in.
- The APE code (also called NAF code) determines the activity sector of the company (which has tax implications).
- The VAT code, which is derived from the SIREN code, but with the
FR
prefix and an additional checksum.
The core problem is that users will just put whatever makes sense for them into the vatID, respectively taxID field. They could structure the values as PropertyValues, but they probably won't: it's complicated and the set of propertyID is not defined. Finding the list of ICDs is not trivial, the list of propertyId for vatIDs and taxIDs does not exist.
So a parser will have to assume that taxID and vatID are basically synonyms and represent a badly defined variant of the iso6523 field: there are keys and values, except the set of keys is not really defined and neither is the format of the values. Proper validation is only possible by cross-referencing other fields (address, for the country), with magic fallbacks if this is missing. In turn this means online validation will be difficult, so reporting to the user their data does not parse is harder.
Basically, which one do you think I would rather parse and validate?
法人番号: 222000500431
<meta itemprop="iso6523Code" content="0188:222000500431" />
or
<div itemprop="taxId" itemscope itemtype="https://schema.org/PropertyValue">
<span itemprop="propertyID">法人番号</span>:
<span itemprop="value">2220005004311</span>
</div>
Yes, you could do
<div itemprop="taxId" itemscope itemtype="https://schema.org/PropertyValue">
<meta itemprop="propertyID" content="icd:0188" />法人番号:
<span itemprop="value">2220005004311</span>
</div>
Or
<div itemprop="iso6523" itemscope itemtype="https://schema.org/PropertyValue">
<meta itemprop="propertyID" content="icd:0188" />法人番号:
<span itemprop="value">2220005004311</span>
</div>
But this more difficult to add to a web-page and more work to parse…
from schemaorg.
What types would we attach this to?
Historically we've tried to minimize things at the Thing level, but it sounds like this would be at least Organization, Product, CreativeWork, Place, ... Is there anything that oid isn't used to identify?
How much "oid data" is out there?
from schemaorg.
I'm mainly worried about the oid:
extension syntax:
- Risk of conflict if a future successor of IETF RFC 3061 "A URN Namespace of Object Identifiers" defines a syntax involving a colon
- Not clear who controls the format of extension -- the owner of the OID with which it is used, or always schema.org?
from schemaorg.
For identifiers, we should always be worried about bit rot.
I think identifier with PropertyValue is likely a good candidate.
The worry then would be some consistency for consuming apps to parse/understand the propertyID values (to avoid wild west syntaxes, we might provide guidance here to promote the standard conventions of URN. urn:nid:... etc.
https://en.wikipedia.org/wiki/Uniform_Resource_Name )
from schemaorg.
@KalleOlaviNiemitalo I wouldn't worry about any of that if we promoted the URN convention within propertyID values? Its syntax has got a long history, 1997, and already supported in tons of open source. https://datatracker.ietf.org/doc/html/rfc2141
from schemaorg.
@KalleOlaviNiemitalo OIDs are a continuous tree where the levels can be anything.
Eg 1.3.6.1.4.1.343
(listed in my Turtle description) is Intel.
This resolves at http://oid-info.com/get/1.3.6.1.4.1.343: the site doesn't know that is "Intel"
but it knows about subdivisions that Intel use: identifiers(1), products(2), experimental(3), information-technology(4) , sysProducts(5), mib2ext(6), hw(7), wekiva(111)
(BTW there's an interesting story why 1.3.6.1.4.1
, which is iso(1), identified-organization(3), dod(6), internet(1), private(4), enterprise(1)
- IANA misappropriated this for their company registrant namespace
- There is another namespace
1.3.IANA
that they are supposed to use - But I guess for backward compatibility reasons they stayed under
dod
;-)
So in the global OID tree, it is unclear where do you place the boundary between:
- metadata (properties, identifiers kinds) and
- data (identifiers of companies, IT devices, whatever)
http://oid-info.com/get/1.3.60.053393007 doesn't resolve because:
- Dun & Bradstreet will probably never publish their humongous company database as an OID tree
- It probably isn't feasible to publish a humongous database within that tree
- The leading zero in
053393007
maybe make it a syntactically invalid element;
and certainly04KE8
is an invalid element because of the letters, so1.3.141.04KE8
would be an invalidoid
I am a bit uneasy with my proposal to use an extension syntax like 1.3.141:04KE8
: that's a hackery.
I know about the split identifier [a PropertyValue; propertyID "foo"; value "bar"]
.
- But
identifier
itself allows to usefoo:bar
, - and
iso6523Code
mandates to usefoo:bar
, eg141:04KE8
for a NCAGE (I asked in #2915 whether:
is the separator) - So then, why should
oid
use the split construct?
I don't think there's any conflict with RFC 3061 because urn:oid:1.3.141:04KE8
is a fully valid URN.
Truth be told, because of this RFC there's very little loss if this proposal is rejected: I can just use
<https://kg.ontotext.com/resource/agent/ontotext>
s:identifier <urn:oid:1.3.141:04KE8>;
# or even
s:sameAs <urn:oid:1.3.141:04KE8>.
My main motivation is that I researched https://en.wikipedia.org/wiki/ISO/IEC_6523 (a collection of identifier schemes)
and then I figured out that OIDs are a superset of this:
it's an infinitely extensible namespace with delegatable subspaces, before internet domains were invented.
So what I'd really like is for schema.org to document this way of using globally identified properties:
if it goes in the documentation of identifier
, I can give up this oid
proposal.
from schemaorg.
urn:oid:1.3.141:04KE8 is a fully valid URN.
It matches the namestring
syntax in IETF RFC 8141, but its NSS portion 1.3.141:04KE8
violates IETF RFC 3061 and thus is not valid to use with the NID oid
:
The NSS portion of the name is strictly limited to the digits 0-9 and the '.' character with no leading zeros.
If this remains invalid as a URN forever, and schema.org defines its own interpretation of the string "urn:oid:1.3.141:04KE8", then there is no conflict. But if IETF RFC 3061 is ever updated to make this syntax valid as a URN, with semantics different from what schema.org assigned, then that will be a conflict.
from schemaorg.
@KalleOlaviNiemitalo OID syntax is strictly dotted integers (\d+(\.\d+)*
) and it has been like this for maybe 20 years. Therefore there is no possibility that RFC 3061 or its successor may appropriate :
for use.
The valid concern is: are we ok to willingly violate RFC 3061 by tacking any namespace-specific identifier (which may be non-numeric) after :
?
I personally am ok with this because it seems useful to be able to mint URNs from any identifier:
urn:oid:1.3.6.1.4.1.343
: Intel as IANA registrant in the pure numeric OID treeurn:oid:1.3.60:047897855
: Intel as DUNS. It's not valid to use.
because OID numbers cannot start with zero, and I don't think the keepers of the OID register will be willing to record hundreds of millions of DUNS identifiers.urn:oid:1.3.141:04KE8
: Intel represented by one of its NCAGE codes. NATO and DoD will never rework NCAGE codes to be numeric, and I don't think the keepers of the OID register will be willing to record several million NATO supplier records.
@danbri I'll make a PR but first say what do you think of the above.
And please answer my questions in #2915 (comment) .
from schemaorg.
I would distinguish between two uses of ISO 6523.
- As a root for the OID tree.
- As a meta system for organization identifiers.
As mentioned above, the DUNS tree (0060
) will probably never be exposed, and conversely I don't expect anyone to consume organization identifiers from the the CERN root (0020
). Also note that there are other identifier schemes which have an OID that are not purely numerical identifiers:
- IBANs (icd:
0021
). - Swiss UIDs (icd:
0183
).
If you look at the list of ICDs the registrar typically make it explicit when they want to use them for ISO 8348.
From my position, the iso6523Code field should be restricted to Organization (and maybe Person). I would really avoid using identifier, because it's usage is really vague and abstract for most users.
from schemaorg.
The reference list of ICDs is missing the 9XXX codes which contain a lot of VAT number types.
I read up a bit more and found the 9XXXs in the EAS list I found are not in the ICD list. So I guess they are not ICD codes. Wouldn't it be of value to be able to add things like VAT numbers for organization identifiers?
https://ec.europa.eu/digital-building-blocks/sites/display/DIGITAL/Code+lists
Most other identifiers in the eInvoicing standard can also be based on different identifiers schemes according to the International Code Designator (ICD) code list. When the same identifier scheme is allowed in both the ICD and the EAS code list, it shall use the same code value. Consequently, such codes must first be registered in the ICD code list and then requested to be added to the EAS code list. The EAS code list may also include other identifiers schemes that are specific for electronic addressing, such as emails, Uniform Resource Locator (URL) and other Uniform Resource Identifiers (URI). For details on this procedure of requesting changes to the EAS code list, please contact the DIGITAL Service Desk.
from schemaorg.
@Tiggerito vatID is already a property on Organization if that helps
vatID The Value-added Tax ID of the organization or person.
from schemaorg.
@thadguidry Good point. I guess that combined with the organization's address/country would be enough.
from schemaorg.
You assume two things:
- That a country only has one Tax-ID number format.
- That a company's tax / vat identifiers are from the country their address is in. For instance companies in Monaco can have a French SIREN number.
Generally, taxID and vatID are fields which are difficult to parse, because the syntax is not specified (do you use the common formatting for the country), you need the value from another field (country) to even know what it is, and and even with the country value, parsing still has to handle ambiguities. ISO-6523 is much more constrained, and therefore more robust for parsing.
The 9XXX are not official ICDs, these are part of the PEPPOL extension, which is a de facto standard.
from schemaorg.
@MatthiasWiesmann I didn't assume any of those. I simply stated that Schema.org provides a property to hold the values. How to format the values, attach additional metadata to the values, that can all certainly be done when coordinating the vatID property with https://schema.org/PropertyValueSpecification could it not? Do you see gaps here if vatId context is that of PropertyValueSpecification or even using multi-typing to provide some external additional context?
FYI, Schema.org typically doesn't get into the formatting weeds of values (since there's often not a need when we also have PropertyValueSpecification) unless absolutely necessary to make publisher/consumer lives easier.
If you need help with PropertyValueSpecification, we can help, and move that discussion to our mailing list, or just directly in our GitHub Discussions button above.
from schemaorg.
My main point is that ISO 6523 solves two problems: type identification and formatting.
If I understand PropertyValueSpecification, we would need to have 1+ per country and it would not help the identification issue, vatID and taxID are quite ambiguous and don't cover the space well, DUNS codes are neither.
from schemaorg.
@MatthiasWiesmann Oops! So sorry, that should have been PropertyValue https://schema.org/PropertyValue . But the PropertyValueSpecification comes from Hydra and it's a way to specify a format (using it's valuePattern). If using PropertyValue, to give more detail about an Organizations multiple vatID
that they might have, you could use PropertyValue or even provide a more specific StructuredValue using it's disambiguatingDescription "registered in Ireland" as well as identifier, url, etc.
Wouldn't that be enough to know that a vatID
value might be in a particular structure pattern? My understanding has always been that once you parse the 2 first country letter codes of the vatID value (or multiple vatID's if provided by a publisher) then you could parse the rest of the value and know its format more easily, no? https://en.wikipedia.org/wiki/VAT_identification_number
or just give context (the ICD part) that it's a https://schema.org/iso6523Code directly on the vatID property.
Sorry, but I just don't understand the format confusion here and why it would be hard to parse, given that you can provide a lot of metadata via all those properties I mentioned, to say what kind of value, who is the authority, where is the record for this vatID registration with that authority (a url), the date of registration, etc. etc. Can you help me understand more why StructuredValue, or iso6523Code property values/types could not help with your format parsing concerns?
@danbri This is likely where we need to provide better docs and guidance on how best to use those. Hmm, and seems we missed adding vatID
as a subproperty of https://schema.org/identifier ? Hmm, and there's no back link reference on https://schema.org/vatID to know that folks can set the ICD portion from iso6523 which we do mention on https://schema.org/iso6523Code
from schemaorg.
I much prefer property/value pairs in JSON-LD, and how I usually handle it. I have always found property/value pairs (semantics, which allows easier information exchange) to be easier to parse than values with separators that lack extra information. In fact, the term "parsing" indeed comes into play when separator characters are used to delineate information in a multi-value value. For JSON-LD, as a consumer of the data, the libraries handle the parsing for you, so all you have to do is make sense of values and perhaps custom extensions and RDFa nodes. But that's me.
from schemaorg.
Australia is an interesting example. Let's see if I remember correctly.
We have two business identifiers which are used to pay tax:
ACN: Australian Company Number
ABN: Australian Business Number
Both are numbers where the ACN is the ABN without the first few numbers. Only businesses registered as a company have an ACN.
We also have TFN (Tax File Number), which businesses and individuals have.
In Australia we pay GST not VAT. For businesses you track GST paid/charged via the ABN.
ICD has an entry for the ABN (0151) but not the other two. So we can identify a business in Australia via the ABN.
With the https://schema.org/PropertyValue idea, I guess we might be able to use the EAS codes.
<div itemprop="vatId" itemscope itemtype="https://schema.org/PropertyValue">
<meta itemprop="propertyID" content="peppol:9932" />United Kingdom VAT number
<span itemprop="value">GB1234567890</span>
</div>
I found what looks like a better EAS list that indicates what the source is:
from schemaorg.
That "better EAS list" is incomplete :-(
from schemaorg.
I much prefer property/value pairs in JSON-LD, and how I usually handle it. I have always found property/value pairs (semantics, which allows easier information exchange) to be easier to parse than values with separators that lack extra information. In fact, the term "parsing" indeed comes into play when separator characters are used to delineate information in a multi-value value. For JSON-LD, as a consumer of the data, the libraries handle the parsing for you, so all you have to do is make sense of values and perhaps custom extensions and RDFa nodes. But that's me.
The data comes from system which are probably not JSON-LD, and will be parsed into structures which are not JSON-LD.
Breaking up the information in transit does not bring much, and ads risks of breakage. The whole point of standards like ISO 6523 is that values can be transported without any transformation in the same way as country codes (ISO 3166), language codes (ISO 639), date-times (ISO 8601).
from schemaorg.
This issue is being nudged due to inactivity.
from schemaorg.
Related Issues (20)
- Anchoring of observationDate when there is an observationPeriod HOT 1
- In `duration`, `s/date format/duration format/`
- Thanks Matthias! HOT 1
- https://www.irene-wall-pflegedienst.online/ HOT 1
- Add Season, Diet, Course and Allergen
- change recipeIngredient from text to custom object
- Consider supporting RDF-star and JSON-LD-star for temporal aspects of properties (and deprecating Role) HOT 3
- Content negotiation for JSON-LD (or TTL) representation HOT 6
- Correct usage of usageInfo HOT 1
- Support values of type TextObject for the name property to allow tagging of names as being AI generated
- Make RDF available under prefix URL
- authors HOT 1
- Many Place properties are not relevant to AdministrativeArea / City
- adas.it HOT 1
- Inquiry About Using "sameAs" Property for Knowledge Graph Entities
- The program cannot read my link URL, while Google can still read it
- Is creditText to be used for Acknowledgements?
- Clean-up PaymentMethod
- Improve markup for electric vehicles
- Incorrect phone numbers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from schemaorg.