mhausenblas / schema-org-rdf Goto Github PK

Schema.org in RDF

ApacheConf 0.02% HTML 36.84% CSS 1.56% Web Ontology Language 25.53% JavaScript 34.56% Ruby 0.11% Python 1.38%

schema-org-rdf's Introduction

Schema.RDFS.org

This is a project to provide an RDF(S) version of Schema.org terms, including tools, examples and mappings to benefit from data that uses Schema.org terms. Currently, we have the following sub-projects (in descending order of maturity):

Generating variants
Schema.org gateway
Examples
Mappings

Sub-projects

Over time, a number of sub-projects of Schema.RDFS.org emerged, introduced in the following. Have a look at the respective directories for more details.

Generating variants

This Schema.RDFS.org sub-project deals with generating structured representations for Schema.org terms through the natural language definition found in Schema.org.

Schema.org gateway

This Schema.RDFS.org sub-projec develops the Schema.org gateway, a anything-to-anything data format converter, based on the Lingua Franca pattern.

Examples

This Schema.RDFS.org sub-project collects Schema.org examples in all kinds of markup and data formats, incl. RDFa, CSV, JSON etc.

Mappings

This Schema.RDFS.org sub-project collects mappings to Schema.org terms from widely deployed Linked Data vocabularies such as Dublin Core, FOAF, GoodRelations, SIOC, DBpedia ontology, etc.

Who is behind this?

Led by Michael and Richard of the Linked Data Research Centre, DERI the Schema.RDFS.org project is officially endorsed and supported by the EC FP7 LOD-Around-The-Clock Support Action (LATC). Many people from the Linked Data domain, Web of Data domain and other communities (SEO, library, archives, etc.) are contributing and have been delivering valuable input.

If you have any questions, please do not hesitate to ask Michael, either via michael.hausenblas AT gmail.com or via Twitter where he listens to @mhausenblas or drop by at the #swig channel on Freenode/IRC.

License

The software and artefacts (such as examples, mappings, etc.) provided through the Schema.RDFS.org project are, if not otherwise stated, in the Public Domain.

Roadmap and Ideas

Community
- get communities involved and give them a sense of ownership (ML, Twitter, here, etc.)
- feedback on a Wiki, issue tracker, etc. (?)
Generating variants
- multi-lang labels/comments
Schema.org gateway
- http://bibutils.refbase.org/ extensions and collaboration
Examples
- collect from snippets in the wild
- create based on existing examples
Mappings
- multi-lang suggestions into a Google spreadsheet?
- ask vocab stake-holders to provide pointers to their mapping (Michael: FOAF, DC, GR)

schema-org-rdf's People

Contributors

Stargazers

Watchers

Forkers

scor msporny gkellogg apassant johnbreslin guniorobot nichtich cveres kjetilk linclark joshsh danbri mediumbrowngirl aviggio abrahaj dinesh9920 daattaa wutitoudi ali1k shishimaru alaa-ayash mikej83 dbs essia droppe kublaj jaguar07 westurner hankazarian indeyets solveiggh adaminfinitum lukv djodjoni mallorcaio madushanka09 mulinfro bertrand- acidburn0zzz christo26 judyjiang whizkid77 link-fish unhooked oldhamoye lehmolina tavakyan devli2009 alrehamy du00cs wflanagan baala99 dean-whynow

schema-org-rdf's Issues

microdata2rdf gateway

Create a microdata2rdf gateway for Schema.org terms, that is, a script that takes a microdata-marked-up HTML page and turns it into RDF. Use case: SPARQLing a MD page, ETL for RDF stores, etc.

Aidan's comments re mapping

comments

http://schema.rdf.sorg/ -> http://schema.rdfs.org/

Encoding problems in comments: schema:ItemList, etc.

Generalise ranges from xsd:string to rdfs:Literal so people can add language tags to values if they want: schema:awards, etc.

Remove HTML from comments: schema:productID, etc.

soft comments

Not a fan of owl:unionOf classes for domains and ranges... they achieve nothing other than as formal "documentation", but I guess they add to completeness: schema:attendees, etc.

Possible to drop the last semi-colon in each term description?

rdfs:isDefinedBy <http://schema.org/...> ***;***
.

schema.org side

Properties do not have pages on schema.org: http://schema.org/audio (drop isDefinedBy there?)

schema:parents, schema:performers, etc. horrible/inconsistent naming

MensClothingStore but no WomensClothingStore?

improvements

Maybe in separate documents:

multi-lang labels/comments
mappings to well-known legacy terms
a sprinkling of OWL
get the community involved and give them a sense of ownership
- feedback on a Wiki?
- mappings/multi-lang suggestions into a Google spreadsheet?

Schema.rdfs.org files maintenance

At http://schema.rdfs.org/ website I could find different format files in which schema.org terms are represented. However, those files do not have the latest version of schema.org available.

For instance, the property "hasPart" of http://schema.org/CreativeWork type does not appear on the JSON file (neither on the other schema.rdfs.org file formats such as RDF/XML, CSV, and so on).

I would like to know where I could find the latest version of schema.org in JSON file format.

In the scrapers sub-project website (https://github.com/mhausenblas/schema-org-rdf/tree/master/scrapers) I could find a process based on a Python script to generate a JSON file with schema.org terms. However, I would like to have a website link (such as the link for the all.json file available at schema.rdfs.org: http://schema.rdfs.org/all.json).

Is that possible?

Thank you,

Martín Menes Rouco

Syntax error on http://schema.rdfs.org/all.ttl

After almost one year and several mails to Richard,
the lines around 4030 are still wrong:

schema:accessibilityAPI a rdf:Property;
    rdfs:label "Accessibility API"@en;
    rdfs:comment "Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lists possible values).
     "@en;

It lacks """ .

I understand that probably, the source code is correct here on github .
But what's the use of source code that's not applied ?

The issue is way of letting it known for fixing the issue .

Happily the N-Ttriples at http://schema.rdfs.org/all.nt is correct :)

Map to BBC PO and MO

Just wondering whether we could map the resulting schema to PO and MO. A couple of mappings that come to mind:

BBC PO (http://www.bbc.co.uk/ontologies/programmes):

schema:TVEpisode owl:equivalentClass po:Episode ; rdfs:subClassOf po:Programme .
schema:TVSeason owl:equivalentClass po:Series ; rdfs:subClassOf po:Programme .
schema:TVSeries owl:equivalentClass po:Brand ; rdfs:subClassOf po:Programme .
schema:episodes shouldn't be using a list as a range - I suggest using po:episode there.
schema:episodeNumber owl:equivalentProperty po:position .
schema:seasonNumber owl:equivalentProperty po:position .
schema:seasons owl:equivalentProperty po:series .

MO (http://musicontology.com):
schema:AudioObject owl:equivalentClass mo:AudioFile.
schema:MusicAlbum owl:equivalentClass mo:Record.
schema:MusicRecording owl:equivalentClass mo:Track. (not sure about this one - the schema.org definition is ambiguous to say the least!)
schema:Event owl:equivalentClass event:Event .
schema:Festival owl:equivalentClass mo:Festival .
schema:MusicEvent owl:equivalentClass mo:Performance .

Another note - I am unsure how Duration is supposed to be used, as it seems to map straight to a datatype.

multiple inheritance not showing in JSON

Seems sane in the turtle, but in the JSON, e.g. LocalBusiness has ancestors ["Organisation", "Thing"] with no mention of Place

Link to TopBraid Schema.org OWL versions

From https://wrdrd.github.io/docs/consulting/knowledge-engineering.html#schema-org-rdf :

Schema.org RDF
++++++++++++++++
Schema.org is maintained as RDFa.

TopBraid maintains RDF/OWL transformations of schema.org:

* http://topbraid.org/schema/
* http://topbraid.org/schema/schema.rdf
* http://topbraid.org/schema/schema.ttl
* http://topbraid.org/schema/schema-single-range.ttl

http://topbraid.org/schema/

foaf page uri has a small typo

Minor typo: as of 100644 the URI assigned to the foaf:page is "http://schema.rdf.sorg/". should be "http://schema.rdfs.org/". Not a big deal, of course, but I thought I'd mention it. -Sebastian

Kingsley's comments

See: http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fschema.org%2FPlace .

isDefinedBy relation should associate a Class or Property with its Defining Ontology.
wdrs:describedby can be used to associate Object Name with Object Address.

Looking at your first cut, I would swap current isDefinedBy relations with wdrs:describedby. Then add isDefinedBy relations between Ontology Name and each of the Classes and Properties its defines.

The subtle difference between Description and Definition (a special kind of description) is highlighted by these suggested tweaks. Ditto the difference between Name and Address via wdrs:describedby.

indirection via 303
default representation HTML+RDFa
note; I don't see said indirection or default representation at the current time. Evidence: http://uriburner.com/c/EYD4P4

http://schema.org/Thing rdfs:DefinedBy http://schema.org/ .
http://schema.org/Thing wdrs:describedby http://schema.org/Thing .

indirection via # terminated URI based Object Name
implicit de-reference (indirection) via fragment identifier handling

http://schema.org/Thing#this rdfs:DefinedBy http://schema.org/ .
Address-of relation that exposes actual Named Object Representation via a Resource URL

http://schema.org/Thing#this wdrs:describedby http://schema.org/Thing .

Sample page: http://uriburner.com/describe/?url=http%3A%2F%2Fschema.rdfs.org%2Fall&p=2&lp=4&first=&op=0&gp=2

Define URIs in schema.rdfs.org

(copied from message at LOD list)

Publishers behind http://schema.org URIs are unlikely to ever provide any RDF description, so why are those URIs declared as identifiers of RDFS classes in the http://schema.rdfs.org/all.rdf. For all I can see, http://schema.org/Person is the URI of an information resource, not of a class.
So I would rather have expected mirroring of the schema.org URIs by schema.rdfs.org URIs, the later fully dereferencable proper RDFS classes expliciting the semantics of the former, while keeping the reference to the source in some dcterms:source element.

e.g., http://schema.rdfs.org/Person dcterms:source http://schema.org/Person

etc

Cannot scrap Schema.org

Scraping Schema.org classes and properties into csv files does not work at this time.

I got the following stacktrace :
$> python scrape_csv.py classes.csv properties.csv
Traceback (most recent call last):
File "scrape_csv.py", line 12, in
types = schema_scraper.get_all_types()
File "/Users/emmanuel/Downloads/schema-org-rdf-master/scrapers/schema_scraper.py", line 20, in get_all_types
types[id] = get_type_details(base_url + id)
File "/Users/emmanuel/Downloads/schema-org-rdf-master/scrapers/schema_scraper.py", line 49, in get_type_details
id = ancestor_links[-1].text_content()
IndexError: list index out of range

Strings with Newlines Need Triple-Quotes

In the 2012/11/30 publication of the all.ttl file a number of rdfs:comments contain multi-line strings but bound with single quotation (for example schema:BusinessEntityType).

Triple quotes (""") should be applied here in keeping with the TTL standard.

Use Philip's JSON output rather than own?

see http://foolip.org/microdatajs/

Medical Condition markup tool.

Hi, I have created a MedicalCondition markup generator which could be included in the tools section.
http://www.clearhealthmedia.com/medical-condition-schema-org-generator/
Thanks.

xsd:datetime rather than xsd:date in some ranges

I know they say that the Date type is "A date value in ISO 8601 date format.", but for things like startDate and endDate, they say things like "The start date and time of the event (in ISO 8601 date format).", and they put a datetime in the example at the bottom of the Event page.
An event listing with only the date, and not the time, is not considered useful.

JSON-LD examples use wrong @context

The JSON-LD examples use a context definition such as the following:

{
   "@context": "http://schema.org/jsonld-profile",
   "@type": "Thing", 
   "@subject": "http://example.org/things#the-thing",
   "description": "The Thing is a fictional character in the Fantastic Four.",
   "image": "http://upload.wikimedia.org/wikipedia/en/a/a3/Thing_v2_1_coverart.jpg", 
   "name": "The Thing",
   "url": "http://en.wikipedia.org/wiki/Thing_(comics)"
}

Not only is there no such context, but the intention is that the context be just http://schema.org/. At some point @danbri may get around to providing this via content-negotiation, although many tools have a built in representation for this, for example http://linter.structured-data.org/.

Village support?

I wasn't able to find support for village? Village is not a town, right?

There seems to be an error about schema-org-rdf/scrapers/scrape_rdf.py

Hi,
I have tried the scrape_rdf.py file to extract schema.org terms in rdf format from http://schema.org/docs/full.html (or schema.rdfa).
The result output.ttl file seems to have only headers. (see [below])
I would appreciate if you have any advice.
e.g. scrape_rdf.py is no more working with current schema.org site, etc.

[below][BEGIN]
@Prefix schema: http://schema.org/.
@Prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#.
@Prefix rdfs: http://www.w3.org/2000/01/rdf-schema#.
@Prefix owl: http://www.w3.org/2002/07/owl#.
@Prefix xsd: http://www.w3.org/2001/XMLSchema#.
@Prefix foaf: http://xmlns.com/foaf/0.1/.
@Prefix dct: http://purl.org/dc/terms/.

http://schema.rdfs.org/all a owl:Ontology;
dct:title "The schema.org terms in RDFS+OWL"@en;
dct:description "This is a conversion of the terms defined at schema.org to RDFS and OWL."@en;
foaf:page http://schema.rdfs.org/;
rdfs:seeAlso http://schema.org/;
rdfs:seeAlso http://github.com/mhausenblas/schema-org-rdf;
dct:hasFormat http://schema.rdfs.org/all.ttl;
dct:hasFormat http://schema.rdfs.org/all.rdf;
dct:hasFormat http://schema.rdfs.org/all.nt;
dct:hasFormat http://schema.rdfs.org/all.json;
dct:hasFormat [
dct:hasPart http://schema.rdfs.org/all-classes.csv;
dct:hasPart http://schema.rdfs.org/all-properties.csv;
];
dct:source http://schema.org/;
dct:license http://schema.org/docs/terms.html;
dct:valid "2015-05-12"^^xsd:date;
.
[END]

Schema.org docs say that you can use text in place of any other type

From the schema.org documentation:

We also expect that often, where we expect a property value of type Person, Place, Organization or some other subClassOf Thing, we will get a text string.

So, where we say that the range of some property is schema:Person, we should expect to get a simple string value some of the time. Possible answers:

We document proper usage. While some improper usage is expected and consumers should deal with it, we don't have to document such usage.
Well, who said that schema:Person is disjoint from strings? In fact, we could translate the quote above into RDFS like this:

xsd:string rdfs:subClassOf schema:Thing.

Provide CSV version of Schema.org terms

Add a CSV version of the Schema.org terms

Indicate last update of Schema.org terms exports

http://schema.rdfs.org/ offers the schema.org terms in several formats, but it doesn't say how up to date they are. A simple solution would be to indicate on http://schema.rdfs.org/ how often the cron job is run (if it applies), e.g. these files are updated every Sunday night.

Instance data examples needed

Nobody likes to read schemas; everyone prefers to work from concrete examples.

It would be good to have some example instance data, perhaps populated from Freebase or DBpedia.

These should show both Microdata and RDFa1.1 representations, and not be framed as an "RDFa is better than Microdata" thing, but as an example of how things look in both notations.

Consider also using HTML::HTML5::Microdata::Parser perl module rather than 'scraping', ie. emphasise that Microdata can be treated as an RDF notation.

Yandex Testing Tool supports JSON-LD

http://webmaster.yandex.com/microtest.xml

just announced on public-vocabs mailing list: http://lists.w3.org/Archives/Public/public-vocabs/2013Dec/0005.html

"Our main goal is to provide webmasters with simple and fast tool for checking markup on the webpages. We've already supported all the popular syntaxes such as microdata, rdfa-lite and microformats. JSON-LD becomes more and more popular and is used in different products (e.g., Yandex.Islands, GMail Actions, etc). Together with all schema.org partners we understand our responsibility to help webmasters with learning new language."

comment and comment_plain field always empty for datatypes and types

I am using the JSON data provided by
http://schema.rdfs.org/all.json

but for all datatypes and types are the fields comments and commtents_plain empty:
datatypes": {
"Boolean": {
"ancestors": [
"DataType"
],
"comment": "",
"comment_plain": "",
"id": "Boolean",
"instances": [
"False",
"True"
],
"label": "Boolean",
"properties": [],
"specific_properties": [],
"subtypes": [],
"supertypes": [
"DataType"
],
"url": "http://schema.org/Boolean"
},

and

"WearAction": {
"ancestors": [
"Thing",
"Action",
"ConsumeAction",
"UseAction"
],
"comment": "",
"comment_plain": "",

Would be nice, if the comments could be filled. For all properties the comment fields are ok.

Cheers,

Uli

Parsing all.ttl from http://schema.rdfs.org/ with rdf (ruby) generates a few errors

I downloaded all.ttl from http://schema.rdfs.org/ today on 26 May 2014.

I tried parsing it with a recent version of rdf (see versions below).

I got a few errors.

I can look deeper into it, if relevant.

The first error e.g. seems caused by an unexpected newline in the
middle of the string; the source ttl around line 4029 looks like:

...
schema:accessibilityAPI a rdf:Property;
    rdfs:label "Accessibility API"@en;
    rdfs:comment "Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lists possible values).
     "@en;
    rdfs:domain schema:CreativeWork;
...

The full error log is:

[16] pry(main)> uri = RDF::URI.new("schema_org.ttl")
=> #<RDF::URI:0x3fc8a19b1140 URI:schema_org.ttl>
[17] pry(main)> schema_graph = RDF::Graph.load(uri)
ERROR [line: 4029] With input '"Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lis': Invalid token "\"Indicates" (found "\"Indicates"), production = :_predicateObjectList_5
ERROR [line: 4029] With input 'WebSchemas wiki lists possible values).
     "@en;
    rdfs:domain schema:CreativeWork;
    rdfs:rang': Invalid token "WebSchemas" (found "WebSchemas"), production = :collection
ERROR [line: 4462] With input '"The target group associated with a given audience (e.g. veterans, car owners, musicians, etc.)
     ': Invalid token "\"The" (found "\"The"), production = :_predicateObjectList_5
ERROR [line: 4462] With input 'e.g. veterans, car owners, musicians, etc.)
      domain: Audience
      Range: Text
    "@en;
    rd': Invalid token "e.g." (found "e.g."), production = :collection
ERROR [line: 4462] undefined prefix "domain"
ERROR [line: 4462] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found "A"), production = :objectList
ERROR [line: 4462] undefined prefix "Range"
ERROR [line: 4462] With input 'Text
    "@en;
    rdfs:domain schema:Audience;
    rdfs:range xsd:string;
    rdfs:isDefinedBy <http': Invalid token "Text" (found "Text"), production = :_triples_1
ERROR [line: 4462] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
ERROR [line: 4463] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
ERROR [line: 4464] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
=> #<RDF::Graph:0x3fc8a19a0ffc(default)>
[18] pry(main)> schema_graph.count
=> 8717

➜  schema.org  gem list rdf

*** LOCAL GEMS ***

rdf (1.1.3)
rdf-aggregate-repo (1.1.0)
rdf-isomorphic (1.1.0)
rdf-json (1.1.0)
rdf-microdata (1.1.1.1)
rdf-n3 (1.1.0.1)
rdf-rdfa (1.1.3.1)
rdf-rdfxml (1.1.0.1)
rdf-trig (1.1.3.1)
rdf-trix (1.1.0)
rdf-turtle (1.1.3.1)
rdf-xsd (1.1.0)

lightweight json dataset

lightweight applications consuming json might not need all of the data such as domains, ranges, supertypes, url, etc, especially if the whole dataset is loaded client-side (> 300 KB). For the Drupal project schemaorg for example, I wrote a script to extract term ids and comments, and regenerate the json file (48K). do you think other js apps might be interested in a lightweight json file which could be hosted on schema.rdfs.org maybe?

Official OWL unmaintained

http://schema.org/docs/schemaorg.owl currently states:
this file is not maintained, and probably not very useful.

http://schema.rdfs.org still links to it.

I think this should be clarified and explained on the homepage.

Which URIs to use?

Would it be reasonable to use http://schema.rdfs.org rather than http://schema.org in the URIs? Essentially mirror what one might hope for schema.org to become. Then if it does become that, link the two together?

via http://lists.w3.org/Archives/Public/public-lod/2011Jun/0096.html

xsd:string range excludes language-tagged strings

Where schema.org properties have the expected type “Text”, we express that as xsd:string. This is too narrow, because it excludes language-tagged strings, and surely a text tagged with a language is valid. The alternative, rdfs:Literal, is too broad because it includes many kinds of non-text literals.

We might want to wait until the RDF WG proposes a new datatype or class that includes both tagged and untagged strings, which is on their agenda.

(Originally pointed out by Aidan.)

Holger's comments

I suggest to replace the owl:unionOf ranges and domains with owl:allValuesFrom restrictions. Not only is this more extensible, but it also clarifies in which class which range shall be used. If you stick to owl:unionOf, please at least add the missing rdf:type owl:Class triple :)

It would also be great to make the single-valued properties owl:FunctionalProperty. I am not sure if this info can be scraped or whether heuristics (e.g. ending with 's' = plural) could help.

microdata/RDFa-Lite C library

Recently I released microdata/RDFa-Lite C library [1] on w3c mailing list [2].
If the information is added to tools page [3], I would appreciate because I could get feedback from developers.

[1] https://github.com/shishimaru/libcsem
[2] http://lists.w3.org/Archives/Public/semantic-web/2013Apr/0019.html
[3] http://schema.rdfs.org/tools.html

Thanks,
Hitoshi