Code Monkey home page Code Monkey logo

specref's Introduction

Specref API

Specref is an open-source, community-maintained database of Web standards & related references.

Table of Contents

API

The API to the service is very simple. It supports three operations which are:

Get a set of bibliographic references

GET https://api.specref.org/bibrefs?refs=FileAPI,rfc2119

parameters:

refs=comma-separated,list,of,reference,IDs
callback=nameOfCallbackFunction

returns: a JSON object indexed by IDs

{
    "FileAPI": {
        "authors": [
            "Arun Ranganathan",
            "Jonas Sicking"
        ],
        "date": "12 September 2013",
        "deliveredBy": [
            {
                "shortname": "webapps",
                "url": "http://www.w3.org/2008/webapps/"
            }
        ],
        "edDraft": "http://dev.w3.org/2006/webapi/FileAPI/",
        "href": "http://www.w3.org/TR/FileAPI/",
        "id": "FileAPI",
        "publisher": "W3C",
        "status": "LCWD",
        "title": "File API"
    },
    "rfc2119": {
        "authors": [
            "S. Bradner"
        ],
        "date": "March 1997",
        "href": "http://www.ietf.org/rfc/rfc2119.txt",
        "id": "rfc2119",
        "publisher": "IETF",
        "status": "Best Current Practice",
        "title": "Key words for use in RFCs to Indicate Requirement Levels"
    }
}

Search bibliographic references

GET https://api.specref.org/search-refs?q=coffee

parameters:

q=search%20term
callback=nameOfCallbackFunction

returns: a JSON object indexed by IDs

{
    "rfc2324": {
        "authors": [
            "L. Masinter"
        ],
        "date": "1 April 1998",
        "href": "http://www.ietf.org/rfc/rfc2324.txt",
        "id": "rfc2324",
        "publisher": "IETF",
        "status": "Informational",
        "title": "Hyper Text Coffee Pot Control Protocol (HTCPCP/1.0)"
    },
    "rfc7168": {
        "authors": [
            "I. Nazar"
        ],
        "date": "1 April 2014",
        "href": "http://www.ietf.org/rfc/rfc7168.txt",
        "id": "rfc7168",
        "publisher": "IETF",
        "status": "Informational",
        "title": "The Hyper Text Coffee Pot Control Protocol for Tea Efflux Appliances (HTCPCP-TEA)"
    }
}

Used to get a set of bibliographic references that include the search term in any of their attributes. This is useful to find specs related to a given area of study, specs by a given editor, etc.

Reverse lookup

GET https://api.specref.org/reverse-lookup?urls=http://www.w3.org/TR/2012/WD-FileAPI-20121025/

parameters:

urls=comma-separated,list,of,reference,URLs.
callback=nameOfCallbackFunction

returns: a JSON object indexed by URLs

{
    "http://www.w3.org/TR/2012/WD-FileAPI-20121025/": {
        "authors": [
            "Arun Ranganathan",
            "Jonas Sicking"
        ],
        "date": "12 September 2013",
        "deliveredBy": [
            {
                "shortname": "webapps",
                "url": "http://www.w3.org/2008/webapps/"
            }
        ],
        "edDraft": "http://dev.w3.org/2006/webapi/FileAPI/",
        "href": "http://www.w3.org/TR/FileAPI/",
        "id": "FileAPI",
        "publisher": "W3C",
        "status": "LCWD",
        "title": "File API"
    }
}

Notice this finds you the canonical version of a spec and not the precise version the URL points to. This is by design.

Aliases

Because of legacy references, case sensitivity issues and taste, many entries have multiple identifiers. Thus an aliasing system was put in place. It isn't that complicated really: an identifier either points directly to the reference object or to another identifier (through the aliasOf property), recursively. All aliases are resolved (there are tests for that) and when you query the API for a reference you always get all the objects necessary to resolve it in the same response. So for example, https://api.specref.org/bibrefs?refs=rfc7230 responds with:

{
    "rfc7230": {
        "authors": [
            "R. Fielding, Ed.",
            "J. Reschke, Ed."
        ],
        "date": "June 2014",
        "href": "https://tools.ietf.org/html/rfc7230",
        "id": "rfc7230",
        "publisher": "IETF",
        "status": "Proposed Standard",
        "title": "Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing"
    }
}

while https://api.specref.org/bibrefs?refs=HTTP11 gives you:

{
    "HTTP11": {
        "aliasOf": "RFC7230",
        "id": "HTTP11"
    },
    "RFC7230": {
        "aliasOf": "rfc7230",
        "id": "RFC7230"
    },
    "rfc7230": {
        "authors": [
            "R. Fielding, Ed.",
            "J. Reschke, Ed."
        ],
        "date": "June 2014",
        "href": "https://tools.ietf.org/html/rfc7230",
        "id": "rfc7230",
        "publisher": "IETF",
        "status": "Proposed Standard",
        "title": "Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing"
    }
}

Which let's you get to the data by using a simple while loop over the response. The contract guaranteed by the API is to always let you resolve aliases.

Now whether you decide to display the result as [HTTP1], [rfc7230], [RFC7230], or even [1] is up to you. Of course, it's silly to reference both [HTTP1] and [rfc7230] in the same specification, but that's something for the editors and/or their tools to avoid.

Obsoleted references

Some entries have an obsoletedBy property which contains an array of identifiers. These identifiers reference specifications that replace this one and can be queried separately from the database.

Like aliases, these identifiers are resolved (there are tests for that), but, unlike aliases, they are not returned with the response to the initial query.

Note that these identifiers can themselves point to aliases or have their own obsoletedBy property, recursively.

CORS

CORS is enabled for all origins. By default the service returns JSON data, which is great but not convenient for browsers that do not support CORS yet. For those, simply adding the callback parameter with the name of the callback function you want will switch the response to JSON-P.

Examples

Some examples should help:

// get references for SVG, REX, and DAHUT
GET https://api.specref.org/bibrefs?refs=SVG,REX,DAHUT

// the same as JSON-P
GET https://api.specref.org/bibrefs?refs=SVG,REX,DAHUT&callback=yourFunctionName

If you need to find a reference ID (for either bibliographic or cross-references) you need to look for it on specref.org.

Updating & adding new references

Commit rights

Specref loosely follows the process described in The Pull Request Hack. Contributors are generally granted commit access to the repo after their first pull request is successfully merged.

It's expected contributors read-up on how to make manual changes and follow the review policy described below.

Review policy

The review policy has three key principles:

  1. Get non-trivial changes reviewed by someone.
  2. You can merge trivial changes yourself, but allow enough time for others to comment on them before you do.
  3. Never merge a pull request unless travis is green.

We trust contributors to be a good judge of what is trivial, what isn't, and how long to wait before merging a trivial fix. Generally, the more trivial the fix, the shorter the wait.

Similarly, the more a commit message explains the why of a slightly unexpected fix, the less it requires a review.

For example, for a fix that changes an existing HTTPS url to an HTTP one:

Bad:

Updating URL.

Good:

There now exists a Persistent URI Registry of EU Institutions and Bodies[1]
which is to be used when referencing such documents.
Unfortunately it doesn't use HTTPS yet.

[1]: http://data.europa.eu/

Hourly auto-updating

There are scripts that pull fresh data from IETF, W3C, WHATWG and WICG, and update their relevant files in the refs directory. These are now run hourly. Their output is tested, committed and deployed without human intervention. Content should now always be up to date.

Manual changes

Generally, manual changes should be limited to the refs/biblio.json file.

If you have commit rights, please don't commit to main directly. Commit to a separate branch (preferably to your fork) and send a pull request.

All changes are automatically tested using travis and automatically deployed within minutes if all tests pass. You can check that your changes have been properly deployed on www.specref.org, @-mention @tobie in a pull request comment if they haven't.

You can run the tests locally by installing node.js, project dependencies (by running $ npm install from the root of the repository) and running $ npm test. The test suite is quite large and can take a few minutes to run.

Some rules to observe when editing the refs/biblio.json file:

  • Don't remove entries unless you are 100% certain that no one is using it. Typically that only applies to cases in which you have just added a reference and want to remove it.
  • Don't duplicate entries. Make sure that what you want to add is not in the DB. If it is, add an alias.
  • Please use structured objects instead of raw strings.
  • Always favor HTTPS URLs.
  • The format for structured objects is described in JSON-schema. The schema is used to test new entries, so you better abide by it. :) (Note I'm still looking for a tool to turn the JSON schema into something more easily consumed by human beings. Let me know if you have an idea, or better yet, send a pull request.)
  • When you want to update an existing reference, if you see that it uses the old string style, please convert it to a structured object. Edit both refs/biblio.json and refs/legacy.json in the same pull request, or you won't pass validation.
  • References in this database are expected to be to the “latest and greatest” version of a given specification. In some cases this may be the draft residing in the editor's repository, or it may be the latest snapshot as published by a Working Group into TR — this choice is left to your appreciation. If you really, really want to have a reference to a dated version, then use the versions property like so:
{
    "REFID": {
        "versions": {
            "YYYYMMDD": {
                "href": "http://..."
            }
        }
    }, //...
}

Licenses

specref's People

Contributors

addison avatar andrea-perego avatar annevk avatar anssiko avatar arronei avatar arunranga avatar chrisn avatar darobin avatar dependabot[bot] avatar domenic avatar dontcallmedom avatar exe-boss avatar fluffy avatar foolip avatar frivoal avatar johnriv avatar lanthaler avatar marcoscaceres avatar marcoscaceres-remote avatar sandersaares avatar sideshowbarker avatar silviapfeiffer avatar specworker avatar svgeesus avatar tabatkins avatar tidoust avatar timothygu avatar tobie avatar xfq avatar ylafon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

specref's Issues

Support xrefs via Shepherd

Ok, so I had look at the output from Shepherd... unfortunately, the API is not very customizable, leading to massive downloads. For example the HTML spec is 1MB of JSON... and DOM is 400Kb. It's too much to send down to the client (specially phones)... so, what would be great would be to bring in data for known specs into spec ref, then allow smarter queries like:

?spec=dom&terms=fire an event,the foo,the bar

And get back just the xrefs needed.

{
    "dom": {
      "fire an event": "#concept-event-fire",
      "the bar": null,
     }
}

Probably need a bit more detail, like returning the "type" (so to correctly style)... but you get the idea.

DB has both "uievents" and "ui-events"

The biblio db has both "uievents" and "ui-events", which aren't aliases of each other, but clearly should be. If possible, we should keep the "ui-events" name, but this doesn't really matter, and you should use whatever name is easiest for your tools.

(This is currently messing up the bibliography of the DOM spec.)

Convert legacy HTML references to structured data.

These references are still in need of conversion:

Exposing ED urls

We're now getting ED urls directly from the W3C RDF.

Would like your thoughts on how to best expose them through the Specref API.

I see the following options:

  1. Pass an extra query param to specify what you want, e.g.:

http://specref.jit.su/bibrefs?refs=service-workers&type=ED|TR|LATEST

Which would give you, respectively:

This has the benefit that clients could be smart about it and pick ED when
the spec is in ED, but change to TR for publishing WDs for example.

  1. Automatically generate extra references, e.g.:

http://specref.jit.su/bibrefs?refs=service-workers-ED
http://specref.jit.su/bibrefs?refs=service-workers-TR
http://specref.jit.su/bibrefs?refs=service-workers-LATEST

  1. Simply add a edURL or similar prop to all references which have one.

We could also do a mix of the above.

Thoughts and use cases welcomed.

/cc @domenic, @marcoscaceres, @tabatkins, @darobin

beautifier

If there isn't already, I wonder if there could be some kind of beautifier added to the build process (don't know if travis supports such a thing - or whatever does the integration). But it would be good to not have to bug people about white space and beautification, as I tend to do :)

Document the format of entries

Just glancing over the entries, I can infer a format for each, but I'd greatly enjoy an actual format description, to ensure I don't miss anything.

WebIDL-1 reported as WD

WebIDL-1 was published as a Rec, but specref reports it as Rec - raising this as an issue since there is probably a problem in the synchronization with TR data.

Better cache indicators

Would it be possible to put in expires information into the responses? This would make ReSpec much faster, as I wouldn't need to hit the network as often.

Additionally, I want to use the expires information to know if I should update a local database with references.

Nice to have: it would be nice if the data for a particular request hasn't been updated, then 304. However, understand that that is much more tricky.

HTML pointing to multipage

At some point, HTML started pointing to "multipage":
https://html.spec.whatwg.org/multipage

This breaks all specs that were using fragment identifiers pointing back to HTML :( It also makes it really annoying, because now one needs to know in which "page" each concept or interface is defined (and makes referencing awkward and unique for this one spec - no other spec relies on multipage!).

Could we have a reference back to, please:
https://html.spec.whatwg.org/

Or, make a reference entry for each page:

[[HTML-Common-infrastructure]]

@annevk

Upgrade links to HTTPS

There are several links in biblio.json which don't use https where available.

Is it OK for me to go in an change links like this to point to https sources where appropriate?

Or do I need to change them in a specific location and regenerate them? Or have I missed the point entirely? :-)

(Was originally speced/bikeshed#966 and w3c/html#832 )

UAX21 and UAX27 need updating — but how?

The UAX21 and UAX27 entries are out of date. Not only are there later incarnations of the same documents, but also these documents have eventually been discontinued and their content merged to an other spec (Core Specification of the Unicode Standard). What should we do?

It makes sense to me to:

  1. Update them to the latest standalone version
  2. add an obsoletedBy key to point to the document that replaces them.

However, client software that read from specref do not currently support obsoletedBy. Should we just fix them? Should we mark it some other way?

--

issue raised from w3c/csswg-drafts#1169 (comment)
bikeshed issue about supporting obsoletedBy: speced/bikeshed#427

use https://www.w3.org/ where possible

@w3c updated their infrastructure to https and future references to w3c specs should use https-URLs.

While some of the changes to https might in the future come from an updated https://www.w3.org/2002/01/tr-automation/tr.rdf some APIs of specref might need to be updated to treat https and http URLs as the same.
E.g. http://www.w3.org/TR/2015/WD-FileAPI-20150421/ redirects to https://www.w3.org/TR/2015/WD-FileAPI-20150421/ already.
So specref should treat them as equal and answer queries about FILE_API using the https URL only.

Add shortname for WGs in deliveredBy

As discussed in:
web-platform-tests/wpt-pr-bot#1 (comment)
the transformation of the WG homepage url to a human readable shortname would probably be a useful addition to specref.

not sure how to add the suggested structure deliveredBy:{shortname:"foo",url:"http://…"} without breaking backwards compatibility: what is the general approach to managing these changes in specref?

(the real full name of the WGs could also be obtained from http://www.w3.org/2000/04/mem-news/public-groups.rdf as I do in https://github.com/w3c-webmob/mobile-web-app-standards/blob/master/tools/extract-spec-data.py )

Document entry format

The entry format is documented using JSON schema.

I would be great to generate doc from the schema. Unfortunately, either the tools to do so are terrible or there small issues with Specref's schemas that make it difficult to use.

  • check that Specref's schemas are JSON-schema-compliant.
  • fix them if not.
  • check that the test suites still run properly after fixing the schemas.
  • find a cool doc tool.
  • generate docs and include them either in the readme or on specref.org.
  • document the code generating process or add a script to do so in the scripts folder.

Mismatch between Bert's and Specref's data model?

Bert's biblio.ref file, which forms the basis of Specref from the history in the README, has several more pieces of data than appears possible to document in Specref.

In particular, the %I (Issuer) metadata shows up in several entries that don't otherwise have authors. Several have %O (Other) as well, though this appears to be mostly (entirely?) used to denote things that aren't yet Recs.

In total, when I last checked, all the following Refer codes were used at least once in Bert's file:

        "B": "bookName",
        "C": "city",
        "D": "date",
        "I": "issuer",
        "J": "journal",
        "L": "linkText",
        "N": "numberInVolume",
        "O": "other",
        "P": "pageNumber",
        "R": "reportNumber",
        "S": "status",
        "T": "title",
        "U": "url",
        "V": "volumeNumber",
        "X": "abstract",
        "A": "authors",
        "Q": "foreignAuthors",

Not all of these are necessary (I have no clue why Refer separates A and Q, for example), but several sound reasonable, particularly when referring to specific things outside of the specs we publish.

Improve caching a bit more

Emailed with mnot briefly, he suggested:

"You can drop Expires and just do CC: max-age."

I'm wondering if we could play with extending the cache duration for up to a week? If it's still going to revalidate against the hashed content (via etag), then it should be fairly safe to extend to a week.

Anyone have opinions?

Are the CSS entries auto-updated?

I want to fix all the CSS entries to use the proper shortnaming, and have unversioned shortnames aliasing the latest versioned shortname. Are they currently maintained by an auto-update, such that my changes would be wiped out? If so, anything I can do about it?

If not, expect a decent-sized PR in the near future.

Is it ok for an entry to have only an edDraft, not an href?

We have a few CSSWG EDs that haven't been published yet. Are they allowed in the database? If so, what structure should they have? I'm imagining something like:

"css-color-4": {
  "authors": [
    "Tab Atkins Jr",
    "Chris Lilley"
  ],
  "title": "CSS Color Module Level 4",
  "status": "ED",
  "publisher": "W3C",
  "versions": {},
  "deliveredBy": [
    "http://www.w3.org/Style/CSS/members"
  ],
  "rawDate": "2015-04-16",
  "edDraft": "http://dev.w3.org/csswg/css-color-4"
},

Specs can have multiple deliveredBy

As far as I can tell, currently the database only expose specs with a single Working Group owner (deliveredBy), while there are quite a few specs that have more than one.

I guess the import-from-rdf script needs to be fixed; I'm not sure whether consumers of the data are expecting that deliveredBy can be an array (not just a string)

SVG should point to SVG1.1, not SVG1.0

Currently, the entry for "SVG" points to the SVG 1.0 spec, with "SVG10" as an alias for that.

The content of the "SVG" entry should be moved to "SVG10", and "SVG" should be listed as an alias for "SVG11".

SVG 1.0 should not be referenced by any modern spec; SVG 1.1 is much better defined. SVG 2 will be better defined still

If you agree to this , I can make the changes.

Please specify license information

https://github.com/tobie/specref states specref is 'An open-source, community-maintained database'.
However I am unable to find license information so that it is hard to reuse specref code and/or data.

https://help.github.com/articles/open-source-licensing/#what-happens-if-i-dont-choose-a-license

Generally speaking, the absence of a license means that the default copyright laws apply.

I would like to use this great data, please specify license information. Or please let me know if there is license information.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.