Code Monkey home page Code Monkey logo

linter's Introduction

Structured Data Linter

Extract and validate embedded RDF markup in HTML and other formats.

DESCRIPTION

The Structured Data Linter digests structured data, in the form of HTML marked-up with RDFa, JSON-LD, or Microdata, or other RDF technologies supported in [Linked Data][linkeddata].

The linter is part of the structured-data.org, and runs at linter.structured-data.org

Output is expressed as HTML+RDFa in a Snippet format.

Running locally

To run locally, do a bundle install to load required dependencies. Then run with foreman or rackup:

foreman start

or

rackup

Schema.org examples

To update the examples from schema.org, run rake schema:examples. Warnings for these examples can be generated into {file:etc/schema-warnings.txt} by running rake schema:warnings; remember to run bundle install first.

Code layout

This application is represented as a Sinatra application implemented in Ruby.

assets                -- Assets for web application
config.ru             -- [Rack][] configuration file, to start application
lib
  rdf
    linter
      parser.rb         -- Parse and transform input to RDFa.
      rdfa_template.rb  -- RDFa output templates in [Haml][]
      snippets          -- Snippet templates
      views             -- Templates for view generation in [Erubis][]
      writer.rb         -- Sub-class of [RDFa][] writer for generating snippet output.
    linter.rb         -- Controller defining HTTP endpoints
spec                  -- Tests

Dependencies

AUTHORS

Setup notes

  • public/.htaccess

  • Bundle installed using:

    bundle install --path vendor/bundler

  • Start the server with:

    bundle exec shotgun -p 3000 config.ru

FEEDBACK

Contributing

  • Do your best to adhere to the existing coding conventions and idioms.
  • Don't use hard tabs, and don't leave trailing whitespace on any line.
  • Do document every method you add using YARD annotations. Read the tutorial or just look at the existing code for examples.
  • Don't touch the .gemspec, VERSION or AUTHORS files. If you need to change them, do so on your private branch only.
  • Do feel free to add yourself to the CREDITS file and the corresponding list in the the README. Alphabetical order applies.
  • Do note that in order for us to merge any non-trivial changes (as a rule of thumb, additions larger than about 15 lines of code), we need an explicit public domain dedication on record from you, which you will be asked to agree to on the first commit to a repo within the organization.

License

This is free and unencumbered public domain software. For more information, see https://unlicense.org/ or the accompanying {file:UNLICENSE} file.

linter's People

Contributors

dependabot[bot] avatar gkellogg avatar scor avatar snyk-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linter's Issues

503 service unavailable validating URLs

Hi, we are often getting a 503 service unavailable when fetching validation results. We are not sure if it's something we are doing wrong or if we should just try again later. This happens on and off all day. The responses are taking about 16 seconds to come back. Below are attached an example request and response, although it happens through the web interface too. Any assistance would be really great! Thanks.

GET /?url=http:%2F%2Fwww.theguardian.com%2Fuk HTTP/1.1
Host: linter.structured-data.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json
CSP: active
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Cookie: _gat=1; _ga=GA1.2.1146619526.1432724046

response:

HTTP/1.1 503 Service Unavailable
Connection: keep-alive
Server: Cowboy
Date: Tue, 02 Jun 2015 13:38:50 GMT
Content-Length: 484
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache, no-store

<!DOCTYPE html>
    <html>
    <head>
      <meta name="viewport" content="width=device-width, initial-scale=1">
      <style type="text/css">
        html, body, iframe { margin: 0; padding: 0; height: 100%; }
        iframe { display: block; width: 100%; border: none; }
      </style>
    <title>Application Error</title>
    </head>
    <body>
      <iframe src="//s3.amazonaws.com/heroku_pages/error.html">
        <p>Application Error</p>
      </iframe>
    </body>
    </html>

Run serverless SDL on AWS Lambda

Hey Gregg.
As you'll recall, we have tried to visualize some large graphs on your server.
When we ran into problems, your point was:
we should run them on our server because you are not set up, and never intended, to process large graphs.

An example is here:
https://gist.github.com/jaygray0919/00247a76f6f902fd936e8e98a8666d20

Per your suggestion, we've implemented SDL on an AWS server, following your installation instructions. You can see it here:
http://54.147.126.75:5000/
It is running on a spot AWS server.

Our next thought is:
modify your code to run serverless using AWS Lambda.
AWS Lambda does support recent versions of Ruby.
Our serverless goal is to minimize the expense of an always-on spot server and to visualize very large graphs.
For example, we have a fish database, currently in .n3 format, that is 43GB. We might reconfigure a subset for a schema.org + JSON-LD and visualize on SDL.

May we ask you questions about a path to serverless?

  1. Is it realistic to modify SDL to run serverless under AWS Lambda?
  2. What combination of your products and related services would you recommend we implement? For example, is there some combination of gems + services (e.g.sinatra, puma, shotgun, etc.) that we should be using?
  3. Has someone already done this, and has a roadmap that we could follow?

You also may have comments/suggestions about our current AWS AMI implementation, so please share those ideas if we are missing something or have done something wrong.

Thanks for your help here Gregg

/jay gray

"invalid byte sequence in US-ASCII" for UTF-8 encoded Literals

The Reviews RDFa example (http://linter.structured-data.org/?url=http://linter.structured-data.org/examples/google-rs/review.rdfa.html) produces the following error:

ArgumentError: invalid byte sequence in US-ASCII

However, the document is marked as UTF-8 XML using the <?xml?> tag.

Also, it does not detect the meta charset attribute:

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8"/>
  <title>L’Amourita Pizza</title>
  <base href="http://example.org/"/>
</head>
<body>
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Review">
  <span property="v:itemreviewed">L’Amourita Pizza</span>
</div>
</body>
</html>

NOTE: This only occurs if a text node is being used as an RDF literal -- e.g. the v:itemreviewed span node above. If you remove that, the data is processed correctly, even if it contains UTF-8 characters elsewhere (e.g. in the title tag above).

XSS when using textarea

Inserting the following markup into the validate by direct input and clicking submit will potentially cause cross site scripting vulnerabilities. If you switch back to the direct input tab you see the heading "Hello World". Replacing it with a script tag will cause untrusted code to be run in some browsers.

<!DOCTYPE html>
<html lang="en">
<body>
           <textarea></textarea><h1>Hello World!</h1>
</body>
</html>

This can be fixed simply by escaping the < and >.

Suggestion: Show actually parsed URL

It could be helpful for users to show the actually parsed URL (which may be different from the user input when content negotiation is involved - see e.g. #28) as part of the page.

Improve style of inferred types vs. explicit types

I like the fact that the linter also shows the inferred types. For example if I write:

typeof="schema:NewsArticle

The linter will show schema:Thing schema:CreativeWork schema:Article along with schema:NewsArticle. But if you have a lot of markup / entities, it can get confusing for the user to know which one of those types were explicitly set in the HTML and which were inferred.

Suggested approach: style the inferred types to be different, either in italic or in grey color.

Do properties also get their super-properties displayed?

missing nav tag?

While validating my website I found the Linter Message: Tag nav invalid

I made a research and it seems that it is a valid HTML5 tag so I guess it is missing!

<form> with action attribute results in white screen of death

As many web pages include form elements with action attributes, the linter is not quite as robust as one would hope.

Test case:

This works when pasted into the linter:

<!DOCTYPE html>
<html>
<body>
<div vocab="http://schema.org/" typeof="Book"></div>
<form>
  Enter your Name <input type="text" name="name">
</form>
</body>
</html>

Add in an "action" attribute for the form, and the linter dies with a white screen of death:

<!DOCTYPE html>
<html>
<body>
<div vocab="http://schema.org/" typeof="Book"></div>
<form action="whatever">
  Enter your Name <input type="text" name="name">
</form>
</body>
</html>

Making Script Input URL?

Who could please help me scripting a url input that only needs the below components?

  • Web page
  • Creativework
  • Organization
  • Person

So I would like to have a script that I can use in my own website and fill in the results according to my own idea with font and color.

Who would like to help me with that?

The URL input doesn't check the user inputs

Hi,

When I used your online tool to check the RDFa attributes of my own website, I got the following error:

error IOError: Failed to open blog.skyplabs.net: No such file or directory @ rb_sysopen - blog.skyplabs.net

The problem here is that I didn't precise http://. Consequently, the software tried to find blog.skyplabs.net as a local file. This issue leads to a directory traversal attack, allowing an attacker to disclose information about the remote system.

For example, it is possible to know if a directory exists or not (with ../etc/ssh as input):

error Errno::EISDIR: Is a directory @ io_fread - ../etc/ssh

When used on a file which the format is not recognised by the parser, the error message tends to leak some precious information (with ../etc/os-release as input):

validation ../etc/os-release: Errors found during processing

validation ../etc/os-release: ERROR [line 1] Lexer error: With input 'NAME="Ubuntu" VERSION="14.04.5 LTS, Trusty Tahr" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 14.04.5': Invalid token "NAME=\"Ubuntu\"": {:production=>:statement, :token=>"NAME=\"Ubuntu\""}:

validation ../etc/os-release: FATAL recovery: statement: ["."]

For example, an attacker could use this vulnerability to reveal the installed and running services on the remote host (with ../etc/mysql/my.cnf as input):

validation ../etc/mysql/my.cnf: Errors found during processing

validation ../etc/mysql/my.cnf: ERROR [line 19] Lexer error: With input 'client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific progr': Invalid token "client]": {:production=>:predicateObjectList, :token=>"client]"}:

validation ../etc/mysql/my.cnf: FATAL recovery: predicateObjectList: ";"

validation ../etc/mysql/my.cnf: FATAL recovery: blankNodePropertyList: "]"

validation ../etc/mysql/my.cnf: FATAL recovery: triples: ["."]

validation ../etc/mysql/my.cnf: FATAL recovery: statement: ["."] 

We know now that the MySQL server is installed on the remote server and listens on port 3306. To check if it is currently running or not (with ../var/run/mysqld/mysqld.sock as input):

error IOError: Failed to open ../var/run/mysqld/mysqld.sock: No such file or directory @ rb_sysopen - ../var/run/mysqld/mysqld.sock 

To fix this issue, the user inputs need to be checked to ensure that they are real URL addresses and not local files.

Inline SVG is marked as an invalid tag

Hi, when I try to validate this document, I get some errors:

<!DOCTYPE html>
<html><head><title>hi</title></head>
    <body>
        <svg width="400" height="200">
            <circle cx="150" cy="100" r="50" />
        </svg>
    </body>
</html>
Linter Messages
validation http://example.org/: Tag svg invalid
validation http://example.org/: Tag circle invalid

As far as I can tell it is valid. Am I missing something?

Form gets confused between Direct Input markup and URL tabs

Here is the scenario I was in: I pasted some content in the Direct Input tab, submitted the form and got the results. I then wanted to submit a URL, so in switched to the first tab and pasted my URL, but the results I got were from the pasted content (which I no longer see or care about after switching to the URL tab).

Valid urls marked as invalid

Hi, when I try to validate using: http://linter.structured-data.org/?url=http:%2F%2Fwww.theguardian.com%2Fsociety%2Fhousing
It says that

Linter Messages
property schema:url: Object <http://www.theguardian.com/society/housing> not compatible with rangeIncludes (schema:URL)

However that URL looks valid to me, assuming it's not including the less than and greater than in the url.

I'm not sure exactly which URL it's referring to and pasting in one ld+json block at a time doesn't show any problems.

Any suggestions or help you can give would be appreciated! John

UTC being added to openingHoursSpecification opens and closes

I noticed a small, non-critical behaviour that seems to be counter to the W3 XML schema for dateTime. When entering the below code into the linter:

<div itemscope itemtype="http://schema.org/LocalBusiness">
    <span itemprop="openingHoursSpecification" itemscope itemtype="http://schema.org/OpeningHoursSpecification">
        <link itemprop="dayOfWeek" href="http://purl.org/goodrelations/v1#Monday">
        <link itemprop="dayOfWeek" href="http://purl.org/goodrelations/v1#Tuesday">
        <link itemprop="dayOfWeek" href="http://purl.org/goodrelations/v1#Wednesday">
        M-W <time itemprop="opens" dateTime="08:00:00">8</time>-<time itemprop="closes" dateTime="18:00:00">6</time>
    </span>
    <span itemprop="openingHoursSpecification" itemscope itemtype="http://schema.org/OpeningHoursSpecification">
        <link itemprop="dayOfWeek" href="http://purl.org/goodrelations/v1#Thursday">
        Th <time itemprop="opens" dateTime="06:00:00">6</time>-<time itemprop="closes" dateTime="18:00:00">6</time>
    </span>
    <span itemprop="openingHoursSpecification" itemscope itemtype="http://schema.org/OpeningHoursSpecification">
        <link itemprop="dayOfWeek" href="http://purl.org/goodrelations/v1#Friday">F <time itemprop="opens" dateTime="08:00:00">8</time>-<time itemprop="closes" dateTime="18:00:00">6</time>
    </span>
    <span itemprop="openingHoursSpecification" itemscope itemtype="http://schema.org/OpeningHoursSpecification">
        <link itemprop="dayOfWeek" href="http://purl.org/goodrelations/v1#Saturday">Sa <time itemprop="opens" dateTime="08:00:00">8</time>-<time itemprop="closes" dateTime="12:00:00">12</time>
    </span>
</div>

the opens and closes properties display a spurious UTC after the times in the parsed output. Assuming that schema.org is actually following the W3 spec (yes, a big assumption) the timezone is optional, and if one is not present, it's assumed to be the "local" timezone. A UTC should only be added if there is a Z, -0:00, or +0:00 after the dateTime.

Request to update LINT BY DIRECT INPUT: linter.structured-data.org

On my Hill Web Creations site Google's SDTT still can't handle multi-type entities, therefore, I tested several pages using the http://linter.structured-data.org/ tool, pages that the GSC confirms are rich and correct with structured data markup.

WHEN USING LINT URL

"The results returned by the tool are: "Disclaimer: this preview is only shown as a example of what a search engine might display. It is to the discretion of each search engine provider to decide whether your page will be displayed as an enhanced search result or not in their search results pages."

And then:
"No structured data detected."

WHEN USING LINT BY DIRECT INPUT

Then, I see the results that I expected.

The same issue happens for site LinkedIn for Business.

schema:height

Am getting the following error message:

property schema:height: Object "85"^^<http://www.w3.org/2001/XMLSchema#integer> not compatible with rangeIncludes (schema:Distance,schema:QuantitativeValue)

This occurs when I don't quote the height integer.

If I put the integer in quotes it passes linter but then Google SDTT does not properly parse the number as the value for height.

Is there a 'correct' technique that is valid on both SDL and GSDTT?

Linter not parsing non-HTML content-types

Something broke in detecting and parsing non-HTML types. For example, https://w3c.github.io/rdf-tests/turtle/manifest.ttl should lint properly.

Reported on the Structured Data mailing list.

It seems that the linter sends quite a lot various acceptable media types in the Accept HTTP header, but unfortunately, when receiving non-HTML based content (i.e. application/n-triples) it fails to provide any results and claims an error occurred. Is there any chance of having that corrected? I believe two solutions are available: remove unnecessery media types and leav only those that are actually supported (quick) or present parsers for other media types.

Linter should indicate if a superseded property is used.

For example:

schema:episodes a rdf:Property;
  schema:domainIncludes schema:Season, schema:TVSeason, schema:Series, schema:TVSeries, schema:RadioSeason, schema:RadioSeries;
  rdfs:label "episodes";
  schema:rangeIncludes: schema:Episode;
  schema:supersededBy schema:episode .

Ability to programmatically POST a HTML/JSON file for linting?

Re: "Programatic Access to the Linter" (http://linter.structured-data.org/about/),

The docs say:

To do this, construct an HTTP GET request using the Accept: application/json HTTP header with a url query parameter referencing the page to be processed.
curl -H 'Accept: application/json' http://linter.structured-data.org/?url=http://linter.structured-data.org/

Is there a way to do a POST request with an HTML string to be processed? I have a bunch of .jsonld files locally that I need to try linting, but they aren't online. Having the ability to POST a request would let me wrap the .jsonld files in a <script>...</script> tag and validate them easily using Node.

Event v. EducationEvent

This structure works (2 events):
"performerIn": [ { "@type": "Event", "name": "Session 1", "offers": { "@type": "Offer", "category": "advice", "price": "0", "url": "https://en.wikipedia.org/wiki/Academic_advising" }, "workFeatured": { "@type": "CreativeWork", "name": "Review syllabus" }, "startDate": "2016-01-23", "location": { "@type": "Place", "name": "Review syllabus in ESH-103", "address": "ESH-103" } },{ "@type": "Event", "name": "Session 2", "offers": { "@type": "Offer", "category": "advice", "price": "0", "url": "https://en.wikipedia.org/wiki/Academic_advising" }, "workFeatured": { "@type": "CreativeWork", "name": "Problem set 1" }, "startDate": "2016-02-13", "location": { "@type": "Place", "name": "Problem set 1 in ESH-103", "address": "ESH-103" } }]

However, this structure fails to load (2 events):

"performerIn": [ { "@type": "EducationEvent", "name": "Session 1", "startDate": "2016-01-23", "doorTime": "2016-01-23T09:00:00+07:00", "duration": "PT3H00M00S", "offers": { "@type": "Offer", "category": "advice", "price": "0", "url": "https://en.wikipedia.org/wiki/Academic_advising" }, "workFeatured": { "@type": "CreativeWork", "name": "Review syllabus", "description": "", "educationalUse": ["group work"], "learningResourceType": ["presentation", "discussion"] }, "location": { "@type": "Place", "name": "Review syllabus in ESH-103", "address": "ESH-103" } },{ "@type": "EducationEvent", "name": "Session 2", "startDate": "2016-02-13", "doorTime": "2016-02-13T09:00:00+07:00", "duration": "PT3H00M00S", "offers": { "@type": "Offer", "category": "advice", "price": "0", "url": "https://en.wikipedia.org/wiki/Academic_advising" }, "workFeatured": { "@type": "CreativeWork", "name": "Problem set 1", "description": "", "educationalUse": ["group work", "assignment"], "learningResourceType": ["presentation", "lab results", "discussion"] }, "location": { "@type": "Place", "name": "Problem set 1 in ESH-103", "address": "ESH-103" } }]

The error messages are:

validation http://example.org/: Failed to parse input document: loading document failed: The same key is defined more than once: author.performerIn.2.@type

and

validation http://example.org/: FATAL Failed to parse input document: loading document failed: The same key is defined more than once: author.performerIn.2.@type: Called from /app/vendor/bundle/ruby/2.3.0/bundler/gems/json-ld-6b5af3315dcd/lib/json/ld/reader.rb:61

IMHO, linter is not evaluating EducationEvent in the same (correct) way it evaluates Event

Note: the JSON-LD may not be valid syntax; it's extracted from a longer file and closure may be wrong. However, the EducationEvent syntax is properly parsed by Google SDTT.

/jg

report: implementing SDL on AWS/Lambda

Structured Data Linter on AWS Lambda

See SDL on AWS/Lambda

Challenge 1:

Ruby creates native extensions written in C.
How to build native extensions requires compilation of the C code into the platform and environment specific machine language code?

Solution 1:

  • We compiled the extensions on the same environment as the AWS/Lambda machine.
  • We used lambci/lambda: build-ruby2.5 version docker image -- the same environment as used by AWS.

Challenge 2:

How to synchronize AWS Ruby environment version with SDL Ruby environment version?

Solution 2:

  • We synchronized our AWS version with SDL Ruby Gemfile.

Challenge 3:

AWS Lambda only supports Deployment unzipped size of 250 MB.
See Lambda payload limits
Our deployment package is ≥ 335 MB.
How to fit the SDL deployment package into AWS/Lambda?

Solution 3:

  • We removed several items to conform to AWS/Lambda constraints.

Challenge 4:

AWS/Lambda limits Response/Request payload size
See Lambda payload limits

Solution 4:

  • We used AWS spot server and modified SDL code to support .zip file in "Linter By Upload" option.

Challenge 5:

How to post or get request to API Gateway endpoint and successfully run the get/post request?

Solution 5:

  • We modified API Gateway settings to add a POST method.
  • We modified application.js link to point to a different location.
  • We modified self link to API Gateway endpoint link.

Challenge 6

You recently revised SDL. The new size is approximately 400MB.

Solution 6 (pending)

We need your guidance for removing libraries or other compaction techniques to reduce the new version to 250 MB.

Challenge 7

How to setup API Gateway Endpoint such that it can be invoked just from a specific website?

Solution (pending)

  • Working on it.

Follow up

If you have private questions or suggestions please contact me at [email protected]
ankita

Issue with preview and objects with an url property

Hi,

In my page I use several time the url property to link to different page.

Example :

<div itemscope="itemscope" itemtype="http://schema.org/BusinessEvent">
<a itemprop="url" class="colorBlue linkOnHover" href="http://www.wiktik.com/formation/offres/english-langage-quotidien-1400">'Say it in English': le langage du quotidien</a>
<span itemprop="location" itemscope="itemscope" itemtype="http://schema.org/Place">
<span itemprop="address" itemscope="itemscope" itemtype="http://schema.org/PostalAddress">(<span itemprop="postalCode">34000</span> - <span itemprop="addressLocality">Montpellier</span>)</span></span>
</div>

I would not expect the date of such object with an url property to be display as a potentiel snippet for the current page.

Concret url :

http://linter.structured-data.org/?url=http%3A%2F%2Fwww.wiktik.com%2Fformation%2Foffres%2Faffirmation-gestion-conflits-2978&commit=Submit&content=

Thanks and regards,
Jocelyn Fournier

Issue using creator property with http://www.schema.org/Review

Using the linter to validate my schema.org microdata and noticed that using the creator property instead of author in the Review itemscope gives warning messages. Below is test code to show the warning:

<!DOCTYPE html>
<html lang="en">
    <head>
    </head>
    <body itemscope itemtype="http://schema.org/Corporation" itemid="AllCareCorp">
        <div class="carousel-inner">
            <div class="item active" itemprop="review" itemscope itemtype="http://schema.org/Review">
                <h3 itemprop="reviewBody">"We get everything on time."</h3>
                <h4 itemprop="creator" itemscope itemtype="http://schema.org/Person"><span itemprop="name">John Doe</span>,
                    <em itemprop="worksFor" itemscope itemtype="http://schema.org/Organization"><span itemprop="name">A Company</span></em>
                </h4>
            </div>
            <div class="item" itemprop="review" itemscope itemtype="http://schema.org/Review">
                <h3 itemprop="reviewBody">"Amazing Team."</h3>
                <h4 itemprop="creator" itemscope itemtype="http://schema.org/Person"><span
                        itemprop="name">Jane Doe</span>,
                    <em itemprop="worksFor" itemscope itemtype="http://schema.org/Organization"><span
                            itemprop="name">Another Company</span></em>
                </h4>
            </div>
        </div>
    </body>
</html>

Returns:

Linter Messages
property schema:creator: Object _:g70059505877260(rdfs:Resource,rdf:List) not compatible with rangeIncludes (schema:Organization,schema:Person)

property schema:creator: Object _:g70059505146240(rdfs:Resource,rdf:List) not compatible with rangeIncludes (schema:Organization,schema:Person)

The schema.org docs indicate that creator should be fully valid as a Review is a CreativeWork.

Linter message: "not compatible with rangeIncludes"

Hello,
I am constructing a structured data bibliography with JSON-LD.

When I use the linter to evaluate a list of CreativeWorks that includes itemListElements with IRI identifiers (for example, the DOI of an article: "@id":"http://dx.doi.org/10.3917/psy.031.0093"), I receive the following message for some but not all of the IRIs:

property schema:itemListElement: Object <IRI> not compatible with rangeIncludes (schema:ListItem,schema:Text,schema:Thing)

I am confused by this message because each itemListElement is further identified as a type of CreativeWork (for example, "@type":"Chapter" or "@type":"ScholarlyArticle"). I would have thought that the identifier would apply to the CreativeWork rather than to the higher-level itemListElement container. If need be, I could certainly change the DOIs to simple "url" properties, but that does not address the cases in which I would like to use an "@id" to identify a periodical in which several articles have appeared.

The JSON-LD passes Google's structured data testing tool, and it is considered valid by the tools at http://jsonviewer.stack.hu/ and http://jsonlint.com/.

Are these error messages? Should I be worried about them?

Here's a more complete example, with the DOIs changed from "@id" to "url" but the "@id" for periodical retained:

<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "ItemList",
  "@id": "https://example.com/en/writing.html#publications",
  "itemListElement": [
    {
      "@type": "ScholarlyArticle",
      "url": "http://dx.doi.org/10.3917/psy.031.0093",
      "name": "Entretien avec Frédéric Diart, peintre classique",
      "datePublished": "2014",
      "pageStart": "95",
      "pageEnd": "104",
      "author": [
        {
          "@type": "Person",
          "@id": "http://orcid.org/0000-0002-8518-7999"
        },
        {
          "@type": "Person",
          "name": "Véronique Sidoit"
        }
      ],
      "inLanguage": "fr",
      "isPartOf": {
        "@type": "PublicationIssue",
        "issueNumber": "31",
        "isPartOf": {
          "@type": "Periodical",
          "@id": "https://www.worldcat.org/title/psychanalyse/oclc/464588080"
        }
      }
    },
    {
      "@type": "ScholarlyArticle",
      "name": "La fin du monde",
      "datePublished": "2013",
      "pageStart": "61",
      "pageEnd": "76",
      "url": "http://dx.doi.org/10.3917/psy.028.0059",
      "author": {
        "@type": "Person",
        "@id": "http://orcid.org/0000-0002-8518-7999"
      },
      "inLanguage": "fr",
      "isPartOf": {
        "@type": "PublicationIssue",
        "issueNumber": "28",
        "isPartOf": {
          "@type": "Periodical",
          "@id": "https://www.worldcat.org/title/psychanalyse/oclc/464588080"
        }
      }
    },
    {
      "@type": "Chapter",
      "name": "Caring for Knowledge: Transmission in 'The Figure in the Carpet' and 'Nona Vincent'",
      "datePublished": "2013",
      "pageStart": "70",
      "pageEnd": "78",
      "author": {
        "@type": "Person",
        "@id": "http://orcid.org/0000-0002-8518-7999"
      },
      "inLanguage": "en",
      "isPartOf": {
        "@type": "Book",
        "name": "Henry James and the Poetics of Duplicity",
        "isbn": "9781443844178",
        "@id": "http://www.worldcat.org/isbn/9781443844178",
        "url": "http://www.cambridgescholars.com/henry-james-and-the-poetics-of-duplicity-14",
        "sameAs": "http://www.sudoc.fr/170467511",
        "editor": [
          "Annick Duperray",
          "Adrian Harding",
          "Dennis Tredy"
        ],
        "publisher": {
          "@type": "Organization",
          "name": "Cambridge Scholars Publishing"
        }
      }
    },
    {
      "@type": "Periodical",
      "@id": "https://www.worldcat.org/title/psychanalyse/oclc/464588080",
      "name": "Psychanalyse",
      "url": "http://www.editions-eres.com/collection/156/psychanalyse",
      "sameAs": "http://www.sudoc.fr/085440868",
      "issn": "1874-9062"
    }
  ]
}
</script>

The linter returns the following message for that code:
property schema:itemListElement: Object <https://www.worldcat.org/title/psychanalyse/oclc/464588080> not compatible with rangeIncludes (schema:ListItem,schema:Text,schema:Thing)

For some reason, the "@id" for the "Book" in the "Chapter" item above does not get flagged. It is true that this "@id" is not cited again on the same page, but then again the DOI "@id"s for the "ScholarlyArticle" items did get flagged even though they only appeared once.

Thanks in advance for any clarifications you can provide about this, and thanks also for this excellent tool, which has really helped my attempt to learn to use schema.org.

SDL and a Google AMP document

Here's a site with JSON-LD but composed using Google AMP requirements.
https://ontomatica.io/
SDL initially process the JSON-LD but then bails-out when it sees the <style amp-custom>.
Or maybe something else is going on that confuses SDL.
FYI the JSON-LD is properly parsed by Google Structured Data Testing tool (see the link in the footer).
Anything we should be doing differently to cure this problem?

/jay gray

Issue with parsing of openingHours

Putting this into the linter:

<div itemscope itemtype="http://schema.org/Pharmacy">
    <time itemprop="openingHours" datetime="Mo-Fr 08:30-18:00">M-F 8:30-6</time>; <time itemprop="openingHours" datetime="Sa 09:00-17:00">Sa 9-5</time>
</div>

Gives a warning of:
property schema:openingHours: Object "Mo-Fr 08:30-18:00" not compatible with rangeIncludes (schema:Duration)

property schema:openingHours: Object "Sa 09:00-17:00" not compatible with rangeIncludes (schema:Duration)

This should be valid according to schema.org that allows ranges denoted by a hyphen. GSDTT appears to parse it properly, as does Yandex and foolip.

Duplicated items when OG meta tags are used.

Hi!
I discovered some issue when the OG meta tags are used.

Please look at this snippet and its result in linter:

<!DOCTYPE html>
<html>
<head>
<title>Home &laquo; Website</title>
<meta name="description" content="Website description">
<meta name="keywords" content="Website keywords">
</head>
<body itemscope itemtype="http://schema.org/WebPage">
</body>
</html>

screenshot 10-01-2014 21 36 26

And this one:

<!DOCTYPE html>
<html>
<head>
<title>Home &laquo; Website</title>
<meta name="description" content="Website description">
<meta name="keywords" content="Website keywords">
<meta property="og:title" content="Home">
<meta property="og:site_name" content="Website" />
<meta property="og:description" content="Website description" />
</head>
<body itemscope itemtype="http://schema.org/WebPage">
</body>
</html>

screenshot 10-01-2014 21 36 47

As you can see these items:

md:item
rdf:type schema:WebPage schema:CreativeWork schema:Thing

are duplicated.

Just wanted to inform you about that.
Cheers and thank you for such a good tool:)

No structured data detected

we have a file that is harvested correctly using the Google Structured Data Testing Tool.
the source file is here: https://afdsi.org/rdf_50/en-US/
here is the GSDTT link: https://search.google.com/structured-data/testing-tool/u/0/#url=https%3A%2F%2Fafdsi.org%2Frdf_50%2Fen-US%2F

should we re-organize the structured data to be processed by SDL?
we'd like to embed a link on our pages that passes the page to SDL so folks can better visualize the structure - as is possible on SDL.

/jay gray

JSON-LD array parsing: Only first array object is displayed in proper, embedded place

I am testing the linter with JSON-LD that represented multiple schema.org/branchOf locations that each have multiple schema.org/openingHoursSpecification objects. The first hours object displays properly hierarchically (under each branch), but subsequent objects show at the top level, and appear to be referred to by the internal parser object number. I've seen those numbers as the @id on the JSON-LD Playground in the Flattened view, so I'm guessing it's a JSON-LD->RDFa parsing issue. Data used for test can be found in this gist.

IOError: Failed to open ... Errors found during processing

Hi, I love your structured data linter, it's been a really good tool for learning about structured data.

I tried creating my own instance your tool using the source you've uploaded here https://github.com/structured-data/linter/releases/tag/2.3.7 but some sites throw this error on my instance while it works perfectly fine on http://linter.structured-data.org/

error IOError: Failed to open http://www.modelmayhem.com/: Errors found during processing

ERROR <http://www.modelmayhem.com/>: error parsing attribute name attributes construct error Couldn't find end of Start Tag n.length line 5 xmlParseEntityRef: no name StartTag: invalid element name Couldn't find end of Start Tag o line 5 EntityRef: expecting ';' Couldn't find end of Start Tag u line 5 Entity 'v.prototype' not defined AttValue: " or ' expected Couldn't find end of Start Tag script line 97 Opening and ending tag mismatch: head line 3 and script Opening and ending tag mismatch: html line 2 and head Extra content at the end of the document

MusicRelease is invalid

I am learning how to utilize the schema plans on my site.
And one of the ones that I am working on now, is a MusicRelease.
However, when i try to validate a sample, it does not validate.
I even copied your json code into the Google Validator, and it
States that it is not valid.
The type MusicRelease is not a valid type.

The validator that I am using is here.
https://www.google.com/webmasters/markup-tester/

Could you please assist me, in finding out why?

Thank You
Wayne Barron

figcaption fails validation

The figcaption element is being tagged as invalid: validation https://www.allcarepharmacy.com: Tag figcaption invalid. Example code:

<figcaption itemprop="caption">
    <span itemprop="copyrightHolder" itemscope itemtype="http://schema.org/Organization"> Copyright: <a href="http://www.123rf.com/profile_stylephotographs" itemprop="url">stylephotographs / 123RF Stock Photo</a>
    </span>
</figcaption>

problem with item-number and/or file-size

Greg, would take a look at this gist:
https://gist.github.com/jaygray0919/4276ce845f53495ff73012faad4cda37

SDL seems to bail-out on the 21st script (from the top). It properly handles supersededBy until references to trailing scripts (below 21).

Have checked this on GSDTT and it's valid - but GSDTT does not generate the hierarchy that is generated by SDL (an important 'customer education' feature that we want to emphasize).

In the past, we've raised issues with SDL and file size. We plan to include SDL links in specific web pages and hope to link to pages with larger data sets than in the test gist.

/jay

thumbNail

Do not understand this error message:
property schema:thumbnailURL: No property definition found

For example:
"@type": "CreativeWorkSeries", "name": "Syllabus", "thumbnailURL": "https://profalbrecht.files.wordpress.com/2012/07/easycapture3.jpg", ...

It is a property; why does it need a property definition?

/jg

improved accept header

the Accept header being sent from your tool is:

application/n-triples, text/plain;q=0.5, application/n-quads, text/x-nquads, application/ld+json, application/x-ld+json, application/rdf+json, text/html;q=0.5, application/xhtml+xml;q=0.7, image/svg+xml, text/n3, text/rdf+n3, application/rdf+n3, text/turtle, text/rdf+turtle, application/turtle, application/x-turtle, application/rdf+xml, text/csv, text/tab-separated-values, application/csvm+json, application/trig, application/x-trig, application/trix, */*;q=0.1

it seems that text/csv is preferred over text/html ?

more then one properties for one subject - displaying issue

Example code:

http://pastebin.com/vFUurF18

in this markup, object of both "brand" and "manufacturer" properties are same: an Organization. But Linter displays different.
Rich snippet tool and http://www.w3.org/2012/pyMicrodata/ displays "organization" object for both "brand" and "manufacturer"

However, Linter displays:
*Organization link (itemid) as object for manufacturer property
*Type and name as objects for brand property.
Live example: http://www.kaahsap.com/ic-mekan-sandalye/21-patara1402-mutfak-sandalyesi.html

strange with link

screenshot from 2015-08-14 00 29 24
screenshot from 2015-08-14 00 28 05

<body itemprop="hasPart" itemscope itemtype="http://schema.org/WebPage">
        <link itemprop="breadcrumb" href="breadcrumb">
...
<div id="breadcrumb" itemscope itemtype="http://schema.org/BreadcrumbList"><meta itemprop="itemListOrder" content="http://schema.org/ItemListOrderUnordered"><ol class="breadcrumb"><li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"><a itemprop="item" title="Перейти к insteria." href="http://dev0.cms.dev.itgalaxy.company" data-async="true" class="home"><span itemprop="name">insteria</span></a><meta itemprop="position" content="1"></li>
<li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"><a itemprop="item" title="Перейти к рубрике Uncategorized" href="/category/uncategorized/" data-async="true" class="taxonomy category"><span itemprop="name">Uncategorized</span></a><meta itemprop="position" content="2"></li>
<li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"><span itemprop="name">Hello world!</span><meta itemprop="position" content="3"></li>
</ol></div>
...

It does not make a tree structure.
But https://developers.google.com/structured-data/testing-tool/ parsing normal

Updated UI

Some UI improvements suggested by Jarno Van Driel.
visual-graph-tool-1
visual-graph-tool-2

command line version

could be really cool if it was possible to use this tool in a CI (Continuous integration) tool by using a command line version

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.