Code Monkey home page Code Monkey logo

schema-org-rdf's Introduction

Schema.RDFS.org

This is a project to provide an RDF(S) version of Schema.org terms, including tools, examples and mappings to benefit from data that uses Schema.org terms. Currently, we have the following sub-projects (in descending order of maturity):

  • Generating variants
  • Schema.org gateway
  • Examples
  • Mappings

Sub-projects

Over time, a number of sub-projects of Schema.RDFS.org emerged, introduced in the following. Have a look at the respective directories for more details.

Generating variants

This Schema.RDFS.org sub-project deals with generating structured representations for Schema.org terms through the natural language definition found in Schema.org.

Schema.org gateway

This Schema.RDFS.org sub-projec develops the Schema.org gateway, a anything-to-anything data format converter, based on the Lingua Franca pattern.

Examples

This Schema.RDFS.org sub-project collects Schema.org examples in all kinds of markup and data formats, incl. RDFa, CSV, JSON etc.

Mappings

This Schema.RDFS.org sub-project collects mappings to Schema.org terms from widely deployed Linked Data vocabularies such as Dublin Core, FOAF, GoodRelations, SIOC, DBpedia ontology, etc.

Who is behind this?

Led by Michael and Richard of the Linked Data Research Centre, DERI the Schema.RDFS.org project is officially endorsed and supported by the EC FP7 LOD-Around-The-Clock Support Action (LATC). Many people from the Linked Data domain, Web of Data domain and other communities (SEO, library, archives, etc.) are contributing and have been delivering valuable input.

If you have any questions, please do not hesitate to ask Michael, either via michael.hausenblas AT gmail.com or via Twitter where he listens to @mhausenblas or drop by at the #swig channel on Freenode/IRC.

License

The software and artefacts (such as examples, mappings, etc.) provided through the Schema.RDFS.org project are, if not otherwise stated, in the Public Domain.

Roadmap and Ideas

  • Community
    • get communities involved and give them a sense of ownership (ML, Twitter, here, etc.)
    • feedback on a Wiki, issue tracker, etc. (?)
  • Generating variants
    • multi-lang labels/comments
  • Schema.org gateway
  • Examples
    • collect from snippets in the wild
    • create based on existing examples
  • Mappings
    • multi-lang suggestions into a Google spreadsheet?
    • ask vocab stake-holders to provide pointers to their mapping (Michael: FOAF, DC, GR)

schema-org-rdf's People

Contributors

ali1k avatar apassant avatar cveres avatar cygri avatar dbs avatar gkellogg avatar indeyets avatar johnbreslin avatar joshsh avatar kjetilk avatar linclark avatar mhausenblas avatar mikej83 avatar msporny avatar nichtich avatar paxa avatar scor avatar westurner avatar whizkid77 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

schema-org-rdf's Issues

microdata2rdf gateway

Create a microdata2rdf gateway for Schema.org terms, that is, a script that takes a microdata-marked-up HTML page and turns it into RDF. Use case: SPARQLing a MD page, ETL for RDF stores, etc.

Aidan's comments re mapping

comments

http://schema.rdf.sorg/ -> http://schema.rdfs.org/

Encoding problems in comments: schema:ItemList, etc.

Generalise ranges from xsd:string to rdfs:Literal so people can add language tags to values if they want: schema:awards, etc.

Remove HTML from comments: schema:productID, etc.

soft comments

Not a fan of owl:unionOf classes for domains and ranges... they achieve nothing other than as formal "documentation", but I guess they add to completeness: schema:attendees, etc.

Possible to drop the last semi-colon in each term description?

rdfs:isDefinedBy <http://schema.org/...> ***;***
.

schema.org side

Properties do not have pages on schema.org: http://schema.org/audio (drop isDefinedBy there?)

schema:parents, schema:performers, etc. horrible/inconsistent naming

MensClothingStore but no WomensClothingStore?

improvements

Maybe in separate documents:

  • multi-lang labels/comments
  • mappings to well-known legacy terms
  • a sprinkling of OWL
  • get the community involved and give them a sense of ownership
    • feedback on a Wiki?
    • mappings/multi-lang suggestions into a Google spreadsheet?

Schema.rdfs.org files maintenance

At http://schema.rdfs.org/ website I could find different format files in which schema.org terms are represented. However, those files do not have the latest version of schema.org available.

For instance, the property "hasPart" of http://schema.org/CreativeWork type does not appear on the JSON file (neither on the other schema.rdfs.org file formats such as RDF/XML, CSV, and so on).

I would like to know where I could find the latest version of schema.org in JSON file format.

In the scrapers sub-project website (https://github.com/mhausenblas/schema-org-rdf/tree/master/scrapers) I could find a process based on a Python script to generate a JSON file with schema.org terms. However, I would like to have a website link (such as the link for the all.json file available at schema.rdfs.org: http://schema.rdfs.org/all.json).

Is that possible?

Thank you,

Martín Menes Rouco

Syntax error on http://schema.rdfs.org/all.ttl

After almost one year and several mails to Richard,
the lines around 4030 are still wrong:

schema:accessibilityAPI a rdf:Property;
    rdfs:label "Accessibility API"@en;
    rdfs:comment "Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lists possible values).
     "@en;

It lacks """ .

I understand that probably, the source code is correct here on github .
But what's the use of source code that's not applied ?

The issue is way of letting it known for fixing the issue .

Happily the N-Ttriples at http://schema.rdfs.org/all.nt is correct :)

Map to BBC PO and MO

Just wondering whether we could map the resulting schema to PO and MO. A couple of mappings that come to mind:

BBC PO (http://www.bbc.co.uk/ontologies/programmes):

schema:TVEpisode owl:equivalentClass po:Episode ; rdfs:subClassOf po:Programme .
schema:TVSeason owl:equivalentClass po:Series ; rdfs:subClassOf po:Programme .
schema:TVSeries owl:equivalentClass po:Brand ; rdfs:subClassOf po:Programme .
schema:episodes shouldn't be using a list as a range - I suggest using po:episode there.
schema:episodeNumber owl:equivalentProperty po:position .
schema:seasonNumber owl:equivalentProperty po:position .
schema:seasons owl:equivalentProperty po:series .

MO (http://musicontology.com):
schema:AudioObject owl:equivalentClass mo:AudioFile.
schema:MusicAlbum owl:equivalentClass mo:Record.
schema:MusicRecording owl:equivalentClass mo:Track. (not sure about this one - the schema.org definition is ambiguous to say the least!)
schema:Event owl:equivalentClass event:Event .
schema:Festival owl:equivalentClass mo:Festival .
schema:MusicEvent owl:equivalentClass mo:Performance .

Another note - I am unsure how Duration is supposed to be used, as it seems to map straight to a datatype.

Kingsley's comments

See: http://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fschema.org%2FPlace .

isDefinedBy relation should associate a Class or Property with its Defining Ontology.
wdrs:describedby can be used to associate Object Name with Object Address.

Looking at your first cut, I would swap current isDefinedBy relations with wdrs:describedby. Then add isDefinedBy relations between Ontology Name and each of the Classes and Properties its defines.

The subtle difference between Description and Definition (a special kind of description) is highlighted by these suggested tweaks. Ditto the difference between Name and Address via wdrs:describedby.

OR

Sample page: http://uriburner.com/describe/?url=http%3A%2F%2Fschema.rdfs.org%2Fall&p=2&lp=4&first=&op=0&gp=2

Define URIs in schema.rdfs.org

(copied from message at LOD list)

Publishers behind http://schema.org URIs are unlikely to ever provide any RDF description, so why are those URIs declared as identifiers of RDFS classes in the http://schema.rdfs.org/all.rdf. For all I can see, http://schema.org/Person is the URI of an information resource, not of a class.
So I would rather have expected mirroring of the schema.org URIs by schema.rdfs.org URIs, the later fully dereferencable proper RDFS classes expliciting the semantics of the former, while keeping the reference to the source in some dcterms:source element.

e.g., http://schema.rdfs.org/Person dcterms:source http://schema.org/Person

etc

Cannot scrap Schema.org

Scraping Schema.org classes and properties into csv files does not work at this time.

I got the following stacktrace :
$> python scrape_csv.py classes.csv properties.csv
Traceback (most recent call last):
File "scrape_csv.py", line 12, in
types = schema_scraper.get_all_types()
File "/Users/emmanuel/Downloads/schema-org-rdf-master/scrapers/schema_scraper.py", line 20, in get_all_types
types[id] = get_type_details(base_url + id)
File "/Users/emmanuel/Downloads/schema-org-rdf-master/scrapers/schema_scraper.py", line 49, in get_type_details
id = ancestor_links[-1].text_content()
IndexError: list index out of range

Strings with Newlines Need Triple-Quotes

In the 2012/11/30 publication of the all.ttl file a number of rdfs:comments contain multi-line strings but bound with single quotation (for example schema:BusinessEntityType).

Triple quotes (""") should be applied here in keeping with the TTL standard.

xsd:datetime rather than xsd:date in some ranges

I know they say that the Date type is "A date value in ISO 8601 date format.", but for things like startDate and endDate, they say things like "The start date and time of the event (in ISO 8601 date format).", and they put a datetime in the example at the bottom of the Event page.
An event listing with only the date, and not the time, is not considered useful.

JSON-LD examples use wrong @context

The JSON-LD examples use a context definition such as the following:

{
   "@context": "http://schema.org/jsonld-profile",
   "@type": "Thing", 
   "@subject": "http://example.org/things#the-thing",
   "description": "The Thing is a fictional character in the Fantastic Four.",
   "image": "http://upload.wikimedia.org/wikipedia/en/a/a3/Thing_v2_1_coverart.jpg", 
   "name": "The Thing",
   "url": "http://en.wikipedia.org/wiki/Thing_(comics)"
}

Not only is there no such context, but the intention is that the context be just http://schema.org/. At some point @danbri may get around to providing this via content-negotiation, although many tools have a built in representation for this, for example http://linter.structured-data.org/.

Village support?

I wasn't able to find support for village? Village is not a town, right?

There seems to be an error about schema-org-rdf/scrapers/scrape_rdf.py

Hi,
I have tried the scrape_rdf.py file to extract schema.org terms in rdf format from http://schema.org/docs/full.html (or schema.rdfa).
The result output.ttl file seems to have only headers. (see [below])
I would appreciate if you have any advice.
e.g. scrape_rdf.py is no more working with current schema.org site, etc.

[below][BEGIN]
@Prefix schema: http://schema.org/.
@Prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#.
@Prefix rdfs: http://www.w3.org/2000/01/rdf-schema#.
@Prefix owl: http://www.w3.org/2002/07/owl#.
@Prefix xsd: http://www.w3.org/2001/XMLSchema#.
@Prefix foaf: http://xmlns.com/foaf/0.1/.
@Prefix dct: http://purl.org/dc/terms/.

http://schema.rdfs.org/all a owl:Ontology;
dct:title "The schema.org terms in RDFS+OWL"@en;
dct:description "This is a conversion of the terms defined at schema.org to RDFS and OWL."@en;
foaf:page http://schema.rdfs.org/;
rdfs:seeAlso http://schema.org/;
rdfs:seeAlso http://github.com/mhausenblas/schema-org-rdf;
dct:hasFormat http://schema.rdfs.org/all.ttl;
dct:hasFormat http://schema.rdfs.org/all.rdf;
dct:hasFormat http://schema.rdfs.org/all.nt;
dct:hasFormat http://schema.rdfs.org/all.json;
dct:hasFormat [
dct:hasPart http://schema.rdfs.org/all-classes.csv;
dct:hasPart http://schema.rdfs.org/all-properties.csv;
];
dct:source http://schema.org/;
dct:license http://schema.org/docs/terms.html;
dct:valid "2015-05-12"^^xsd:date;
.
[END]

Schema.org docs say that you can use text in place of any other type

From the schema.org documentation:

We also expect that often, where we expect a property value of type Person, Place, Organization or some other subClassOf Thing, we will get a text string.

So, where we say that the range of some property is schema:Person, we should expect to get a simple string value some of the time. Possible answers:

  1. We document proper usage. While some improper usage is expected and consumers should deal with it, we don't have to document such usage.
  2. Well, who said that schema:Person is disjoint from strings? In fact, we could translate the quote above into RDFS like this:

xsd:string rdfs:subClassOf schema:Thing.

Instance data examples needed

Nobody likes to read schemas; everyone prefers to work from concrete examples.

It would be good to have some example instance data, perhaps populated from Freebase or DBpedia.

These should show both Microdata and RDFa1.1 representations, and not be framed as an "RDFa is better than Microdata" thing, but as an example of how things look in both notations.

Consider also using HTML::HTML5::Microdata::Parser perl module rather than 'scraping', ie. emphasise that Microdata can be treated as an RDF notation.

Yandex Testing Tool supports JSON-LD

http://webmaster.yandex.com/microtest.xml

just announced on public-vocabs mailing list: http://lists.w3.org/Archives/Public/public-vocabs/2013Dec/0005.html

"Our main goal is to provide webmasters with simple and fast tool for checking markup on the webpages. We've already supported all the popular syntaxes such as microdata, rdfa-lite and microformats. JSON-LD becomes more and more popular and is used in different products (e.g., Yandex.Islands, GMail Actions, etc). Together with all schema.org partners we understand our responsibility to help webmasters with learning new language."

comment and comment_plain field always empty for datatypes and types

I am using the JSON data provided by
http://schema.rdfs.org/all.json

but for all datatypes and types are the fields comments and commtents_plain empty:
datatypes": {
"Boolean": {
"ancestors": [
"DataType"
],
"comment": "",
"comment_plain": "",
"id": "Boolean",
"instances": [
"False",
"True"
],
"label": "Boolean",
"properties": [],
"specific_properties": [],
"subtypes": [],
"supertypes": [
"DataType"
],
"url": "http://schema.org/Boolean"
},

and

"WearAction": {
"ancestors": [
"Thing",
"Action",
"ConsumeAction",
"UseAction"
],
"comment": "",
"comment_plain": "",

Would be nice, if the comments could be filled. For all properties the comment fields are ok.

Cheers,

Uli

Parsing all.ttl from http://schema.rdfs.org/ with rdf (ruby) generates a few errors

I downloaded all.ttl from http://schema.rdfs.org/ today on 26 May 2014.

I tried parsing it with a recent version of rdf (see versions below).

I got a few errors.

I can look deeper into it, if relevant.

The first error e.g. seems caused by an unexpected newline in the
middle of the string; the source ttl around line 4029 looks like:

...
schema:accessibilityAPI a rdf:Property;
    rdfs:label "Accessibility API"@en;
    rdfs:comment "Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lists possible values).
     "@en;
    rdfs:domain schema:CreativeWork;
...

The full error log is:

[16] pry(main)> uri = RDF::URI.new("schema_org.ttl")
=> #<RDF::URI:0x3fc8a19b1140 URI:schema_org.ttl>
[17] pry(main)> schema_graph = RDF::Graph.load(uri)
ERROR [line: 4029] With input '"Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lis': Invalid token "\"Indicates" (found "\"Indicates"), production = :_predicateObjectList_5
ERROR [line: 4029] With input 'WebSchemas wiki lists possible values).
     "@en;
    rdfs:domain schema:CreativeWork;
    rdfs:rang': Invalid token "WebSchemas" (found "WebSchemas"), production = :collection
ERROR [line: 4462] With input '"The target group associated with a given audience (e.g. veterans, car owners, musicians, etc.)
     ': Invalid token "\"The" (found "\"The"), production = :_predicateObjectList_5
ERROR [line: 4462] With input 'e.g. veterans, car owners, musicians, etc.)
      domain: Audience
      Range: Text
    "@en;
    rd': Invalid token "e.g." (found "e.g."), production = :collection
ERROR [line: 4462] undefined prefix "domain"
ERROR [line: 4462] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found "A"), production = :objectList
ERROR [line: 4462] undefined prefix "Range"
ERROR [line: 4462] With input 'Text
    "@en;
    rdfs:domain schema:Audience;
    rdfs:range xsd:string;
    rdfs:isDefinedBy <http': Invalid token "Text" (found "Text"), production = :_triples_1
ERROR [line: 4462] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
ERROR [line: 4463] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
ERROR [line: 4464] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
=> #<RDF::Graph:0x3fc8a19a0ffc(default)>
[18] pry(main)> schema_graph.count
=> 8717

➜  schema.org  gem list rdf

*** LOCAL GEMS ***

rdf (1.1.3)
rdf-aggregate-repo (1.1.0)
rdf-isomorphic (1.1.0)
rdf-json (1.1.0)
rdf-microdata (1.1.1.1)
rdf-n3 (1.1.0.1)
rdf-rdfa (1.1.3.1)
rdf-rdfxml (1.1.0.1)
rdf-trig (1.1.3.1)
rdf-trix (1.1.0)
rdf-turtle (1.1.3.1)
rdf-xsd (1.1.0)

lightweight json dataset

lightweight applications consuming json might not need all of the data such as domains, ranges, supertypes, url, etc, especially if the whole dataset is loaded client-side (> 300 KB). For the Drupal project schemaorg for example, I wrote a script to extract term ids and comments, and regenerate the json file (48K). do you think other js apps might be interested in a lightweight json file which could be hosted on schema.rdfs.org maybe?

xsd:string range excludes language-tagged strings

Where schema.org properties have the expected type “Text”, we express that as xsd:string. This is too narrow, because it excludes language-tagged strings, and surely a text tagged with a language is valid. The alternative, rdfs:Literal, is too broad because it includes many kinds of non-text literals.

We might want to wait until the RDF WG proposes a new datatype or class that includes both tagged and untagged strings, which is on their agenda.

(Originally pointed out by Aidan.)

Holger's comments

I suggest to replace the owl:unionOf ranges and domains with owl:allValuesFrom restrictions. Not only is this more extensible, but it also clarifies in which class which range shall be used. If you stick to owl:unionOf, please at least add the missing rdf:type owl:Class triple :)

It would also be great to make the single-valued properties owl:FunctionalProperty. I am not sure if this info can be scraped or whether heuristics (e.g. ending with 's' = plural) could help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.