Code Monkey home page Code Monkey logo

Comments (23)

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on June 2, 2024 3

Can these instead be encoded using the existing identifier property and PropertyValue? In which propertyID could perhaps be urn:oid:1.3.60 and value could be 053393007. Or change identifier so that DefinedTerm can be used as an alternative to PropertyValue.

from schemaorg.

danbri avatar danbri commented on June 2, 2024 1

if it goes in the documentation of identifier, I can give up this oid proposal.

Let's do that. Can you make a PR?

from schemaorg.

MatthiasWiesmann avatar MatthiasWiesmann commented on June 2, 2024 1

Sadly, things are complicated.

  • Australia has VAT, Australian VAT numbers are not prefixed with AU.
  • European VAT numbers are prefixed with the country code if they are intercommunity codes (i.e. used EU wide), they technically don't need to be when used only within one country (although the trend is to always put the prefix).
  • Greece is part of the EU, the country code is GR, but VAT numbers are prefixed with EL.
  • Northern Ireland is not a country, but so they have VAT numbers prefixed with an ISO extension (XI).
  • There are magic EU wide VAT numbers prefixed with EU.
  • Swiss VAT numbers are prefixes with CHE which is the ISO 3 letter code.
  • Many of these numbers have two formats, a machine readable version (no space, dots, dashes) and one with a defined formatting. Is a number with the wrong formatting (dots in the wrong place) correct or not?
  • the term taxID is very vague, most countries have a number given out by the government, which might be used in some administrative function, taxation being one of them. Add to the confusion that VAT is taxation (the T is a hint).
  • There are often multiple identifiers at play, for instance in France, the following numbers are relevant:
    • The SIREN code identifies the company.
    • The SIRET code identifies a branch (or the main seat).
    • The RCS code is the SIREN code formatted differently with the addition of the city the company is registered in.
    • The APE code (also called NAF code) determines the activity sector of the company (which has tax implications).
    • The VAT code, which is derived from the SIREN code, but with the FR prefix and an additional checksum.

The core problem is that users will just put whatever makes sense for them into the vatID, respectively taxID field. They could structure the values as PropertyValues, but they probably won't: it's complicated and the set of propertyID is not defined. Finding the list of ICDs is not trivial, the list of propertyId for vatIDs and taxIDs does not exist.

So a parser will have to assume that taxID and vatID are basically synonyms and represent a badly defined variant of the iso6523 field: there are keys and values, except the set of keys is not really defined and neither is the format of the values. Proper validation is only possible by cross-referencing other fields (address, for the country), with magic fallbacks if this is missing. In turn this means online validation will be difficult, so reporting to the user their data does not parse is harder.

Basically, which one do you think I would rather parse and validate?

法人番号: 222000500431
<meta itemprop="iso6523Code" content="0188:222000500431" />

or

  <div itemprop="taxId" itemscope itemtype="https://schema.org/PropertyValue">
      <span itemprop="propertyID">法人番号</span>: 
      <span itemprop="value">2220005004311</span>
  </div>

Yes, you could do

  <div itemprop="taxId" itemscope itemtype="https://schema.org/PropertyValue">
      <meta itemprop="propertyID" content="icd:0188" />法人番号: 
      <span itemprop="value">2220005004311</span>
 </div>

Or

<div itemprop="iso6523" itemscope itemtype="https://schema.org/PropertyValue">
<meta itemprop="propertyID" content="icd:0188" />法人番号: 
<span itemprop="value">2220005004311</span>
</div>

But this more difficult to add to a web-page and more work to parse…

from schemaorg.

danbri avatar danbri commented on June 2, 2024

What types would we attach this to?

Historically we've tried to minimize things at the Thing level, but it sounds like this would be at least Organization, Product, CreativeWork, Place, ... Is there anything that oid isn't used to identify?

How much "oid data" is out there?

from schemaorg.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on June 2, 2024

I'm mainly worried about the oid:extension syntax:

  • Risk of conflict if a future successor of IETF RFC 3061 "A URN Namespace of Object Identifiers" defines a syntax involving a colon
  • Not clear who controls the format of extension -- the owner of the OID with which it is used, or always schema.org?

from schemaorg.

thadguidry avatar thadguidry commented on June 2, 2024

For identifiers, we should always be worried about bit rot.
I think identifier with PropertyValue is likely a good candidate.
The worry then would be some consistency for consuming apps to parse/understand the propertyID values (to avoid wild west syntaxes, we might provide guidance here to promote the standard conventions of URN. urn:nid:... etc. https://en.wikipedia.org/wiki/Uniform_Resource_Name )

from schemaorg.

thadguidry avatar thadguidry commented on June 2, 2024

@KalleOlaviNiemitalo I wouldn't worry about any of that if we promoted the URN convention within propertyID values? Its syntax has got a long history, 1997, and already supported in tons of open source. https://datatracker.ietf.org/doc/html/rfc2141

from schemaorg.

VladimirAlexiev avatar VladimirAlexiev commented on June 2, 2024

@KalleOlaviNiemitalo OIDs are a continuous tree where the levels can be anything.
Eg 1.3.6.1.4.1.343 (listed in my Turtle description) is Intel.
This resolves at http://oid-info.com/get/1.3.6.1.4.1.343: the site doesn't know that is "Intel"
but it knows about subdivisions that Intel use: identifiers(1), products(2), experimental(3), information-technology(4) , sysProducts(5), mib2ext(6), hw(7), wekiva(111)

(BTW there's an interesting story why 1.3.6.1.4.1, which is iso(1), identified-organization(3), dod(6), internet(1), private(4), enterprise(1)

  • IANA misappropriated this for their company registrant namespace
  • There is another namespace 1.3.IANA that they are supposed to use
  • But I guess for backward compatibility reasons they stayed under dod ;-)

So in the global OID tree, it is unclear where do you place the boundary between:

  • metadata (properties, identifiers kinds) and
  • data (identifiers of companies, IT devices, whatever)

http://oid-info.com/get/1.3.60.053393007 doesn't resolve because:

  • Dun & Bradstreet will probably never publish their humongous company database as an OID tree
  • It probably isn't feasible to publish a humongous database within that tree
  • The leading zero in 053393007 maybe make it a syntactically invalid element;
    and certainly 04KE8 is an invalid element because of the letters, so 1.3.141.04KE8 would be an invalid oid

I am a bit uneasy with my proposal to use an extension syntax like 1.3.141:04KE8: that's a hackery.


I know about the split identifier [a PropertyValue; propertyID "foo"; value "bar"].

  • But identifier itself allows to use foo:bar,
  • and iso6523Code mandates to use foo:bar, eg 141:04KE8 for a NCAGE (I asked in #2915 whether : is the separator)
  • So then, why should oid use the split construct?

I don't think there's any conflict with RFC 3061 because urn:oid:1.3.141:04KE8 is a fully valid URN.

Truth be told, because of this RFC there's very little loss if this proposal is rejected: I can just use

<https://kg.ontotext.com/resource/agent/ontotext>
   s:identifier <urn:oid:1.3.141:04KE8>;
  # or even
   s:sameAs <urn:oid:1.3.141:04KE8>.

My main motivation is that I researched https://en.wikipedia.org/wiki/ISO/IEC_6523 (a collection of identifier schemes)
and then I figured out that OIDs are a superset of this:
it's an infinitely extensible namespace with delegatable subspaces, before internet domains were invented.
So what I'd really like is for schema.org to document this way of using globally identified properties:
if it goes in the documentation of identifier, I can give up this oid proposal.

from schemaorg.

KalleOlaviNiemitalo avatar KalleOlaviNiemitalo commented on June 2, 2024

urn:oid:1.3.141:04KE8 is a fully valid URN.

It matches the namestring syntax in IETF RFC 8141, but its NSS portion 1.3.141:04KE8 violates IETF RFC 3061 and thus is not valid to use with the NID oid:

The NSS portion of the name is strictly limited to the digits 0-9 and the '.' character with no leading zeros.

If this remains invalid as a URN forever, and schema.org defines its own interpretation of the string "urn:oid:1.3.141:04KE8", then there is no conflict. But if IETF RFC 3061 is ever updated to make this syntax valid as a URN, with semantics different from what schema.org assigned, then that will be a conflict.

from schemaorg.

VladimirAlexiev avatar VladimirAlexiev commented on June 2, 2024

@KalleOlaviNiemitalo OID syntax is strictly dotted integers (\d+(\.\d+)*) and it has been like this for maybe 20 years. Therefore there is no possibility that RFC 3061 or its successor may appropriate : for use.
The valid concern is: are we ok to willingly violate RFC 3061 by tacking any namespace-specific identifier (which may be non-numeric) after : ?

I personally am ok with this because it seems useful to be able to mint URNs from any identifier:

  • urn:oid:1.3.6.1.4.1.343: Intel as IANA registrant in the pure numeric OID tree
  • urn:oid:1.3.60:047897855: Intel as DUNS. It's not valid to use . because OID numbers cannot start with zero, and I don't think the keepers of the OID register will be willing to record hundreds of millions of DUNS identifiers.
  • urn:oid:1.3.141:04KE8: Intel represented by one of its NCAGE codes. NATO and DoD will never rework NCAGE codes to be numeric, and I don't think the keepers of the OID register will be willing to record several million NATO supplier records.

@danbri I'll make a PR but first say what do you think of the above.
And please answer my questions in #2915 (comment) .

from schemaorg.

MatthiasWiesmann avatar MatthiasWiesmann commented on June 2, 2024

I would distinguish between two uses of ISO 6523.

  • As a root for the OID tree.
  • As a meta system for organization identifiers.

As mentioned above, the DUNS tree (0060) will probably never be exposed, and conversely I don't expect anyone to consume organization identifiers from the the CERN root (0020). Also note that there are other identifier schemes which have an OID that are not purely numerical identifiers:

  • IBANs (icd: 0021).
  • Swiss UIDs (icd: 0183).

If you look at the list of ICDs the registrar typically make it explicit when they want to use them for ISO 8348.

From my position, the iso6523Code field should be restricted to Organization (and maybe Person). I would really avoid using identifier, because it's usage is really vague and abstract for most users.

from schemaorg.

Tiggerito avatar Tiggerito commented on June 2, 2024

The reference list of ICDs is missing the 9XXX codes which contain a lot of VAT number types.

I read up a bit more and found the 9XXXs in the EAS list I found are not in the ICD list. So I guess they are not ICD codes. Wouldn't it be of value to be able to add things like VAT numbers for organization identifiers?

https://ec.europa.eu/digital-building-blocks/sites/display/DIGITAL/Code+lists
Most other identifiers in the eInvoicing standard can also be based on different identifiers schemes according to the International Code Designator (ICD) code list. When the same identifier scheme is allowed in both the ICD and the EAS code list, it shall use the same code value. Consequently, such codes must first be registered in the ICD code list and then requested to be added to the EAS code list. The EAS code list may also include other identifiers schemes that are specific for electronic addressing, such as emails, Uniform Resource Locator (URL) and other Uniform Resource Identifiers (URI). For details on this procedure of requesting changes to the EAS code list, please contact the DIGITAL Service Desk.

from schemaorg.

thadguidry avatar thadguidry commented on June 2, 2024

@Tiggerito vatID is already a property on Organization if that helps

vatID The Value-added Tax ID of the organization or person.

from schemaorg.

Tiggerito avatar Tiggerito commented on June 2, 2024

@thadguidry Good point. I guess that combined with the organization's address/country would be enough.

from schemaorg.

MatthiasWiesmann avatar MatthiasWiesmann commented on June 2, 2024

@thadguidry

You assume two things:

  • That a country only has one Tax-ID number format.
  • That a company's tax / vat identifiers are from the country their address is in. For instance companies in Monaco can have a French SIREN number.

Generally, taxID and vatID are fields which are difficult to parse, because the syntax is not specified (do you use the common formatting for the country), you need the value from another field (country) to even know what it is, and and even with the country value, parsing still has to handle ambiguities. ISO-6523 is much more constrained, and therefore more robust for parsing.

@Tiggerito

The 9XXX are not official ICDs, these are part of the PEPPOL extension, which is a de facto standard.

from schemaorg.

thadguidry avatar thadguidry commented on June 2, 2024

@MatthiasWiesmann I didn't assume any of those. I simply stated that Schema.org provides a property to hold the values. How to format the values, attach additional metadata to the values, that can all certainly be done when coordinating the vatID property with https://schema.org/PropertyValueSpecification could it not? Do you see gaps here if vatId context is that of PropertyValueSpecification or even using multi-typing to provide some external additional context?

FYI, Schema.org typically doesn't get into the formatting weeds of values (since there's often not a need when we also have PropertyValueSpecification) unless absolutely necessary to make publisher/consumer lives easier.

If you need help with PropertyValueSpecification, we can help, and move that discussion to our mailing list, or just directly in our GitHub Discussions button above.

from schemaorg.

MatthiasWiesmann avatar MatthiasWiesmann commented on June 2, 2024

My main point is that ISO 6523 solves two problems: type identification and formatting.

If I understand PropertyValueSpecification, we would need to have 1+ per country and it would not help the identification issue, vatID and taxID are quite ambiguous and don't cover the space well, DUNS codes are neither.

from schemaorg.

thadguidry avatar thadguidry commented on June 2, 2024

@MatthiasWiesmann Oops! So sorry, that should have been PropertyValue https://schema.org/PropertyValue . But the PropertyValueSpecification comes from Hydra and it's a way to specify a format (using it's valuePattern). If using PropertyValue, to give more detail about an Organizations multiple vatID that they might have, you could use PropertyValue or even provide a more specific StructuredValue using it's disambiguatingDescription "registered in Ireland" as well as identifier, url, etc.

Wouldn't that be enough to know that a vatID value might be in a particular structure pattern? My understanding has always been that once you parse the 2 first country letter codes of the vatID value (or multiple vatID's if provided by a publisher) then you could parse the rest of the value and know its format more easily, no? https://en.wikipedia.org/wiki/VAT_identification_number

or just give context (the ICD part) that it's a https://schema.org/iso6523Code directly on the vatID property.
Sorry, but I just don't understand the format confusion here and why it would be hard to parse, given that you can provide a lot of metadata via all those properties I mentioned, to say what kind of value, who is the authority, where is the record for this vatID registration with that authority (a url), the date of registration, etc. etc. Can you help me understand more why StructuredValue, or iso6523Code property values/types could not help with your format parsing concerns?

@danbri This is likely where we need to provide better docs and guidance on how best to use those. Hmm, and seems we missed adding vatID as a subproperty of https://schema.org/identifier ? Hmm, and there's no back link reference on https://schema.org/vatID to know that folks can set the ICD portion from iso6523 which we do mention on https://schema.org/iso6523Code

from schemaorg.

thadguidry avatar thadguidry commented on June 2, 2024

I much prefer property/value pairs in JSON-LD, and how I usually handle it. I have always found property/value pairs (semantics, which allows easier information exchange) to be easier to parse than values with separators that lack extra information. In fact, the term "parsing" indeed comes into play when separator characters are used to delineate information in a multi-value value. For JSON-LD, as a consumer of the data, the libraries handle the parsing for you, so all you have to do is make sense of values and perhaps custom extensions and RDFa nodes. But that's me.

from schemaorg.

Tiggerito avatar Tiggerito commented on June 2, 2024

Australia is an interesting example. Let's see if I remember correctly.

We have two business identifiers which are used to pay tax:

ACN: Australian Company Number
ABN: Australian Business Number

Both are numbers where the ACN is the ABN without the first few numbers. Only businesses registered as a company have an ACN.

We also have TFN (Tax File Number), which businesses and individuals have.

In Australia we pay GST not VAT. For businesses you track GST paid/charged via the ABN.

ICD has an entry for the ABN (0151) but not the other two. So we can identify a business in Australia via the ABN.

With the https://schema.org/PropertyValue idea, I guess we might be able to use the EAS codes.

<div itemprop="vatId" itemscope itemtype="https://schema.org/PropertyValue">
      <meta itemprop="propertyID" content="peppol:9932" />United Kingdom VAT number
      <span itemprop="value">GB1234567890</span>
 </div>

I found what looks like a better EAS list that indicates what the source is:

https://ec.europa.eu/digital-building-blocks/wikis/download/attachments/467108974/Electronic%20Address%20Scheme%20Code%20list%20-%20version%209%20-%20published%20March2022.xlsx?version=1&modificationDate=1646394201721&api=v2

from schemaorg.

Tiggerito avatar Tiggerito commented on June 2, 2024

That "better EAS list" is incomplete :-(

from schemaorg.

MatthiasWiesmann avatar MatthiasWiesmann commented on June 2, 2024

I much prefer property/value pairs in JSON-LD, and how I usually handle it. I have always found property/value pairs (semantics, which allows easier information exchange) to be easier to parse than values with separators that lack extra information. In fact, the term "parsing" indeed comes into play when separator characters are used to delineate information in a multi-value value. For JSON-LD, as a consumer of the data, the libraries handle the parsing for you, so all you have to do is make sense of values and perhaps custom extensions and RDFa nodes. But that's me.

The data comes from system which are probably not JSON-LD, and will be parsed into structures which are not JSON-LD.

Breaking up the information in transit does not bring much, and ads risks of breakage. The whole point of standards like ISO 6523 is that values can be transported without any transformation in the same way as country codes (ISO 3166), language codes (ISO 639), date-times (ISO 8601).

from schemaorg.

github-actions avatar github-actions commented on June 2, 2024

This issue is being nudged due to inactivity.

from schemaorg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.