Code Monkey home page Code Monkey logo

rdf's Introduction

RDF.rb: Linked Data for Ruby

This is a pure-Ruby library for working with Resource Description Framework (RDF) data.

Gem Version Build Status Coverage Status Gitter chat

Table of contents

  1. Features
  2. Differences between RDF 1.0 and RDF 1.1
  3. Differences between RDF 1.1 and RDF 1.2
  4. Tutorials
  5. Command Line
  6. Examples
  7. Reader/Writer convenience methods
  8. RDF 1.2
  9. Documentation
  10. Dependencies
  11. Installation
  12. Download
  13. Resources
  14. Mailing List
  15. Authors
  16. Contributors
  17. Contributing
  18. License

Features

  • 100% pure Ruby with minimal dependencies and no bloat.
  • Fully compatible with RDF 1.1 specifications.
  • Provisional support for RDF 1.2 specifications.
  • 100% free and unencumbered public domain software.
  • Provides a clean, well-designed RDF object model and related APIs.
  • Supports parsing and serializing N-Triples and N-Quads out of the box, with more serialization format support available through add-on extensions.
  • Includes in-memory graph and repository implementations, with more storage adapter support available through add-on extensions.
  • Implements basic graph pattern (BGP) query evaluation.
  • Plays nice with others: entirely contained in the RDF module, and does not modify any of Ruby's core classes or standard library.
  • Based entirely on Ruby's autoloading, meaning that you can generally make use of any one part of the library without needing to load up the rest.
  • Compatible with Ruby Ruby >= 3.0, Rubinius and JRuby 9.0+.
    • Note, changes in mapping hashes to keyword arguments for Ruby 3+ may require that arguments be passed more explicitly, especially when the first argument is a Hash and there are optional keyword arguments. In this case, Hash argument may need to be explicitly included within {} and the optional keyword arguments may need to be specified using **{} if there are no keyword arguments.
  • Performs auto-detection of input to select appropriate Reader class if one cannot be determined from file characteristics.

HTTP requests

RDF.rb uses Net::HTTP for retrieving HTTP and HTTPS resources. If the RestClient gem is included, that will be used instead to retrieve remote resources. Clients may also consider using RestClient Components to enable client-side caching of HTTP results using Rack::Cache or other Rack middleware.

See {RDF::Util::File} for configuring other mechanisms for retrieving resources.

Term caching and configuration

RDF.rb uses a weak-reference cache for storing internalized versions of URIs and Nodes. This is particularly useful for Nodes as two nodes are equivalent only if they're the same node.

By default, each cache can grow to an unlimited size, but this can be configured using {RDF.config}, for general limits, along with URI- or Node-specific limits.

For example, to limit the size of the URI intern cache only:

RDF.config.uri_cache_size = 10_000

The default for creating new caches without a specific initialization size can be set using:

RDF.config.cache_size = 100_000

Differences between RDF 1.0 and RDF 1.1

This version of RDF.rb is fully compatible with RDF 1.1, but it creates some marginal incompatibilities with RDF 1.0, as implemented in versions prior to the 1.1 release of RDF.rb:

  • Introduces {RDF::IRI}, as a synonym for {RDF::URI} either {RDF::IRI} or {RDF::URI} can be used interchangeably. Versions of RDF.rb prior to the 1.1 release were already compatible with IRIs. Internationalized Resource Identifiers (see [RFC3987][]) are a super-set of URIs (see [RFC3986][]) which allow for characters other than standard US-ASCII.
  • {RDF::URI} no longer uses the Addressable gem. As URIs typically don't need to be parsed, this provides a substantial performance improvement when enumerating or querying graphs and repositories.
  • {RDF::List} no longer emits a rdf:List type. However, it will now recognize any subjects that are {RDF::Node} instances as being list elements, as long as they have both rdf:first and rdf:rest predicates.
  • {RDF::Graph} adding a graph_name to a graph may only be done when the underlying storage model supports graph_names (the default {RDF::Repository} does). The notion of graph_name in RDF.rb is treated equivalently to Named Graphs within an RDF Dataset, and graphs on their own are not named.
  • {RDF::Graph}, {RDF::Statement} and {RDF::List} now include {RDF::Value}, and not {RDF::Resource}. Made it clear that using {RDF::Graph} does not mean that it may be used within an {RDF::Statement}, for this see {RDF::Term}.
  • {RDF::Statement} now is stricter about checking that all elements are valid when validating.
  • {RDF::NTriples::Writer} and {RDF::NQuads::Writer} now default to validate output, only allowing valid statements to be emitted. This may disabled by setting the :validate option to false.
  • {RDF::Dataset} is introduced as a class alias of {RDF::Repository}. This allows closer alignment to the RDF concept of Dataset.
  • The graph_name of a graph within a Dataset or Repository may be either an {RDF::IRI} or {RDF::Node}. Implementations of repositories may restrict this to being only {RDF::IRI}.
  • There are substantial and somewhat incompatible changes to {RDF::Literal}. In RDF 1.1, all literals are typed, including plain literals and language tagged literals. Internally, plain literals are given the xsd:string datatype and language tagged literals are given the rdf:langString datatype. Creating a plain literal, without a datatype or language, will automatically provide the xsd:string datatype; similar for language tagged literals. Note that most serialization formats will remove this datatype. Code which depends on a literal having the xsd:string datatype being different from a plain literal (formally, without a datatype) may break. However note that the #has\_datatype? will continue to return false for plain or language-tagged literals.
  • {RDF::Query#execute} now accepts a block and returns {RDF::Query::Solutions}. This allows enumerable.query(query) to behave like query.execute(enumerable) and either return an enumerable or yield each solution.
  • {RDF::Queryable#query} now returns {RDF::Query::Solutions} instead of an Enumerator if it's argument is an {RDF::Query}.
  • {RDF::Util::File.open_file} now performs redirects and manages base_uri based on W3C recommendations:
    • base_uri is set to the original URI if a status 303 is provided, otherwise any other redirect will set base_uri to the redirected location.
    • base_uri is set to the content of the Location header if status is success.
  • Additionally, {RDF::Util::File.open_file} sets the result encoding from charset if provided, defaulting to UTF-8. Other access methods include last_modified and content_type,
  • {RDF::StrictVocabulary} added with an easy way to keep vocabulary definitions up to date based on their OWL or RDFS definitions. Most vocabularies are now StrictVocabularies meaning that an attempt to resolve a particular term in that vocabulary will error if the term is not defined in the vocabulary.
  • New vocabulary definitions have been added for ICal, Media Annotations (MA), Facebook OpenGraph (OG), PROV, SKOS-XL (SKOSXL), Data Vocabulary (V), VCard, VOID, Powder-S (WDRS), and XHV.

Notably, {RDF::Queryable#query} and {RDF::Query#execute} are now completely symmetric; this allows an implementation of {RDF::Queryable} to optimize queries using implementation-specific logic, allowing for substantial performance improvements when executing BGP queries.

Differences between RDF 1.1 and RDF 1.2

  • {RDF::Literal} has an optional direction property for directional language-tagged strings.
  • Removes support for legacy text/plain (as an alias for application/n-triples) and text/x-nquads (as an alias for application/n-quads)

Tutorials

Command Line

When installed, RDF.rb includes a rdf shell script which acts as a wrapper to perform a number of different operations on RDF files using available readers and writers.

  • count: Parse and RDF input and count the number of statements.
  • predicates: Returns unique objects from parsed input.
  • objects: Returns unique objects from parsed input.
  • serialize: Parse an RDF input and re-serializing to N-Triples or another available format using --output-format option.
  • subjects: Returns unique subjects from parsed input.

The serialize command can also be used to serialize as a vocabulary.

Different RDF gems will augment the rdf script with more capabilities, which may require specifying the appropriate --input-format option to revel.

Examples

require 'rdf'
include RDF

Writing RDF data using the N-Triples format

require 'rdf/ntriples'
graph = RDF::Graph.new << [:hello, RDF::RDFS.label, "Hello, world!"]
graph.dump(:ntriples)

or

RDF::Writer.open("hello.nt") { |writer| writer << graph }

Reading RDF data in the N-Triples format

require 'rdf/ntriples'
graph = RDF::Graph.load("https://ruby-rdf.github.io/rdf/etc/doap.nt")

or

RDF::Reader.open("https://ruby-rdf.github.io/rdf/etc/doap.nt") do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

Reading RDF data in other formats

{RDF::Reader.open} and {RDF::Repository.load} use a number of mechanisms to determine the appropriate reader to use when loading a file. The specific format to use can be forced using, e.g. format: :ntriples option where the specific format symbol is determined by the available readers. Both also use MimeType or file extension, where available.

require 'rdf/nquads'

graph = RDF::Graph.load("https://ruby-rdf.github.io/rdf/etc/doap.nq", format: :nquads)

A specific sub-type of Reader can also be invoked directly:

require 'rdf/nquads'

RDF::NQuads::Reader.open("https://ruby-rdf.github.io/rdf/etc/doap.nq") do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

Reader/Writer implementations may override {RDF::Format.detect}, which takes a small sample if input and return a boolean indicating if it matches that specific format. In the case that a format cannot be detected from filename or other options, or that more than one format is identified, {RDF::Format.for} will query each loaded format by invoking it's detect method, and the first successful match will be used to read the input.

Writing RDF data using other formats

{RDF::Writer.open}, {RDF::Enumerable#dump}, {RDF::Writer.dump} take similar options to {RDF::Reader.open} to determine the appropriate writer to use.

require 'linkeddata'

RDF::Writer.open("hello.nq", format: :nquads) do |writer|
  writer << RDF::Repository.new do |repo|
    repo << RDF::Statement.new(:hello, RDF::RDFS.label, "Hello, world!", graph_name: RDF::URI("http://example/graph_name"))
  end
end

A specific sub-type of Writer can also be invoked directly:

require 'rdf/nquads'

repo = RDF::Repository.new << RDF::Statement.new(:hello, RDF::RDFS.label, "Hello, world!", graph_name: RDF::URI("http://example/graph_name"))
File.open("hello.nq", "w") {|f| f << repo.dump(:nquads)}

Reader/Writer convenience methods

{RDF::Enumerable} implements to_{format} for each available instance of {RDF::Reader}. For example, if rdf/turtle is loaded, this allows the following:

graph = RDF::Graph.new << [:hello, RDF::RDFS.label, "Hello, world!"]
graph.to_ttl

Similarly, {RDF::Mutable} implements from_{format} for each available instance of {RDF::Writer}. For example:

graph = RDF::Graph.new
graph.from_ttl("[ a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Resource>]")

Note that no prefixes are loaded automatically, however they can be provided as arguments:

graph.from_ttl("[ a rdf:Resource]", prefixes: {rdf: RDF.to_uri})

Querying RDF data using basic graph patterns (BGPs)

require 'rdf/ntriples'

graph = RDF::Graph.load("https://ruby-rdf.github.io/rdf/etc/doap.nt")
query = RDF::Query.new({
  person: {
    RDF.type  => FOAF.Person,
    FOAF.name => :name,
    FOAF.mbox => :email,
  }
}, **{})

query.execute(graph) do |solution|
  puts "name=#{solution.name} email=#{solution.email}"
end

The same query may also be run from the graph:

graph.query(query) do |solution|
  puts "name=#{solution.name} email=#{solution.email}"
end

In general, querying from using the queryable instance allows a specific implementation of queryable to perform query optimizations specific to the datastore on which it is based.

A separate SPARQL gem builds on basic BGP support to provide full support for SPARQL 1.1 queries.

Using pre-defined RDF vocabularies

DC.title      #=> RDF::URI("http://purl.org/dc/terms/title")
FOAF.knows    #=> RDF::URI("http://xmlns.com/foaf/0.1/knows")
RDF.type      #=> RDF::URI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")
RDFS.seeAlso  #=> RDF::URI("http://www.w3.org/2000/01/rdf-schema#seeAlso")
RSS.title     #=> RDF::URI("http://purl.org/rss/1.0/title")
OWL.sameAs    #=> RDF::URI("http://www.w3.org/2002/07/owl#sameAs")
XSD.dateTime  #=> RDF::URI("http://www.w3.org/2001/XMLSchema#dateTime")

Using ad-hoc RDF vocabularies

foaf = RDF::Vocabulary.new("http://xmlns.com/foaf/0.1/")
foaf.knows    #=> RDF::URI("http://xmlns.com/foaf/0.1/knows")
foaf[:name]   #=> RDF::URI("http://xmlns.com/foaf/0.1/name")
foaf['mbox']  #=> RDF::URI("http://xmlns.com/foaf/0.1/mbox")

RDF-star CG

RDF.rb includes provisional support for RDF-star with an N-Triples/N-Quads syntax for quoted triples in the subject or object position.

Support for RDF-star quoted triples is now deprecated, use RDF 1.2 triple terms instead.

RDF 1.2

RDF.rb includes provisional support for RDF 1.2 with an N-Triples/N-Quads syntax for triple terms in the object position. RDF.rb includes provisional support for RDF 1.2 directional language-tagged strings, which are literals of type rdf:dirLangString having both a language and direction.

Internally, an RDF::Statement is treated as another resource, along with RDF::URI and RDF::Node, which allows an RDF::Statement to have a #subject or #object which is also an RDF::Statement.

Note: This feature is subject to change or elimination as the standards process progresses.

Serializing a Graph containing quoted triples

require 'rdf/ntriples'
statement = RDF::Statement(RDF::URI('bob'), RDF::Vocab::FOAF.age, RDF::Literal(23))
graph = RDF::Graph.new << [statement, RDF::URI("ex:certainty"), RDF::Literal(0.9)]
graph.dump(:ntriples, validate: false)
# => '<<<bob> <http://xmlns.com/foaf/0.1/age> "23"^^<http://www.w3.org/2001/XMLSchema#integer>>> <ex:certainty> "0.9"^^<http://www.w3.org/2001/XMLSchema#double> .'

Reading a Graph containing quoted triples

By default, the N-Triples reader will reject a document containing a subject resource.

nt = '<<<bob> <http://xmlns.com/foaf/0.1/age> "23"^^<http://www.w3.org/2001/XMLSchema#integer>>> <ex:certainty> "0.9"^^<http://www.w3.org/2001/XMLSchema#double> .'
graph = RDF::Graph.new do |graph|
  RDF::NTriples::Reader.new(nt) {|reader| graph << reader}
end
# => RDF::ReaderError

Documentation

https://ruby-rdf.github.io/rdf

RDF Object Model

  • {RDF::Value}
    • {RDF::Term}
      • {RDF::Literal}
        • {RDF::Literal::Boolean}
        • {RDF::Literal::Date}
        • {RDF::Literal::DateTime}
        • {RDF::Literal::Decimal}
        • {RDF::Literal::Double}
        • {RDF::Literal::Integer}
        • {RDF::Literal::Time}
        • RDF::XSD (extension)
      • {RDF::Resource}
        • {RDF::Node}
        • {RDF::URI}
    • {RDF::List}
    • {RDF::Graph}
    • {RDF::Statement}

RDF Serialization

  • {RDF::Format}
  • {RDF::Reader}
  • {RDF::Writer}

RDF Serialization Formats

The following is a partial list of RDF formats implemented either natively, or through the inclusion of other gems:

The meta-gem LinkedData includes many of these gems.

RDF Datatypes

RDF.rb only implements core datatypes from the RDF Datatype Map. Most other XSD and RDF datatype implementations can be find in the following:

  • {RDF::XSD}

Graph Isomorphism

Two graphs may be compared with each other to determine if they are isomorphic. As BNodes within two different graphs are no equal, graphs may not be directly compared. The RDF::Isomorphic gem may be used to determine if they make the same statements, aside from BNode identity (i.e., they each entail the other)

  • RDF::Isomorphic

RDF Storage

RDF Querying

  • {RDF::Query}
    • {RDF::Query::HashPatternNormalizer}
    • {RDF::Query::Pattern}
    • {RDF::Query::Solution}
    • {RDF::Query::Solutions}
    • {RDF::Query::Variable}
  • SPARQL (extension)

RDF Vocabularies

  • {RDF} - Resource Description Framework (RDF)
  • {RDF::OWL} - Web Ontology Language (OWL)
  • {RDF::RDFS} - RDF Schema (RDFS)
  • {RDF::RDFV} - RDF Vocabulary (RDFV)
  • {RDF::XSD} - XML Schema (XSD)

Change Log

See Release Notes on GitHub

Dependencies

Installation

The recommended installation method is via RubyGems. To install the latest official release of RDF.rb, do:

% [sudo] gem install rdf             # Ruby 3+

Download

To get a local working copy of the development repository, do:

% git clone git://github.com/ruby-rdf/rdf.git

Alternatively, download the latest development version as a tarball as follows:

% wget https://github.com/ruby-rdf/rdf/tarball/master

Resources

Mailing List

Authors

Contributors

Contributing

This repository uses Git Flow to mange development and release activity. All submissions must be on a feature branch based on the develop branch to ease staging and integration.

  • Do your best to adhere to the existing coding conventions and idioms.
  • Don't use hard tabs, and don't leave trailing whitespace on any line. Before committing, run git diff --check to make sure of this.
  • Do document every method you add using YARD annotations. Read the tutorial or just look at the existing code for examples.
  • Don't touch the .gemspec or VERSION files. If you need to change them, do so on your private branch only.
  • Do feel free to add yourself to the CREDITS file and the corresponding list in the the README. Alphabetical order applies.
  • Don't touch the AUTHORS file. If your contributions are significant enough, be assured we will eventually add you in there.
  • Do note that in order for us to merge any non-trivial changes (as a rule of thumb, additions larger than about 15 lines of code), we need an explicit public domain dedication on record from you, which you will be asked to agree to on the first commit to a repo within the organization. Note that the agreement applies to all repos in the Ruby RDF organization.

License

This is free and unencumbered public domain software. For more information, see https://unlicense.org/ or the accompanying {file:UNLICENSE} file.

rdf's People

Contributors

abrisse avatar artob avatar bhuga avatar brixen avatar cbeer avatar cjcolvar avatar conorsheehan1 avatar cpence avatar danny avatar devwout avatar doriantaylor avatar dwbutler avatar fumi avatar gkellogg avatar janschill avatar jcoyne avatar jfieber avatar jgeiger avatar jperville avatar kna avatar l00mi avatar mistydemeo avatar mmn80 avatar nyarly avatar petervandenabeele avatar pezra avatar pius avatar tomjnixon avatar ujifgc avatar ursm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rdf's Issues

RDF literal escaping/unescaping

Consider using the String#rdf_escape and String#rdf_unescape monkey patches. They properly deal with going from UTF-8 to escaped ASCII and back, somewhat based on JSON utf8_to_json.

# coding: utf-8
require 'iconv'

class String
  #private
  # "Borrowed" from JSON utf8_to_json
  RDF_MAP = {
    "\x0" => '\u0000',
    "\x1" => '\u0001',
    "\x2" => '\u0002',
    "\x3" => '\u0003',
    "\x4" => '\u0004',
    "\x5" => '\u0005',
    "\x6" => '\u0006',
    "\x7" => '\u0007',
    "\b"  =>  '\b',
    "\t"  =>  '\t',
    "\n"  =>  '\n',
    "\xb" => '\u000B',
    "\f"  =>  '\f',
    "\r"  =>  '\r',
    "\xe" => '\u000E',
    "\xf" => '\u000F',
    "\x10" => '\u0010',
    "\x11" => '\u0011',
    "\x12" => '\u0012',
    "\x13" => '\u0013',
    "\x14" => '\u0014',
    "\x15" => '\u0015',
    "\x16" => '\u0016',
    "\x17" => '\u0017',
    "\x18" => '\u0018',
    "\x19" => '\u0019',
    "\x1a" => '\u001A',
    "\x1b" => '\u001B',
    "\x1c" => '\u001C',
    "\x1d" => '\u001D',
    "\x1e" => '\u001E',
    "\x1f" => '\u001F',
    '"'   =>  '\"',
    '\\'  =>  '\\\\',
    '/'   =>  '/',
  } # :nodoc:

  if defined?(::Encoding)
    # Funky way to define constant, but if parsed in 1.8 it generates an 'invalid regular expression' error otherwise
    eval %(ESCAPE_RE = %r([\u{80}-\u{10ffff}]))
  else
    ESCAPE_RE = %r(
                    [\xc2-\xdf][\x80-\xbf]    |
                    [\xe0-\xef][\x80-\xbf]{2} |
                    [\xf0-\xf4][\x80-\xbf]{3}
                  )nx
  end

  # Convert a UTF8 encoded Ruby string _string_ to an escaped string, encoded with
  # UTF16 big endian characters as \U????, and return it.
  #
  # \\:: Backslash
  # \':: Single quote
  # \":: Double quot
  # \n:: ASCII Linefeed
  # \r:: ASCII Carriage Return
  # \t:: ASCCII Horizontal Tab
  # \uhhhh:: character in BMP with Unicode value U+hhhh
  # \U00hhhhhh:: character in plane 1-16 with Unicode value U+hhhhhh
  def rdf_escape
    string = self + '' # XXX workaround: avoid buffer sharing
    string.gsub!(/["\\\/\x0-\x1f]/) { RDF_MAP[$&] }
    if defined?(::Encoding)
      string.force_encoding(Encoding::UTF_8)
      string.gsub!(ESCAPE_RE) { |c|
                      s = c.dump.sub(/\"\\u\{(.+)\}\"/, '\1').upcase
                      (s.length <= 4 ? "\\u0000"[0,6-s.length] : "\\U00000000"[0,10-s.length]) + s
                    }
      string.force_encoding(Encoding::ASCII_8BIT)
    else
      string.gsub!(ESCAPE_RE) { |c|
                      s = Iconv.new('utf-16be', 'utf-8').iconv(c).unpack('H*').first.upcase
                      "\\u" + s
                    }
    end
    string
  end

  # Unescape characters in strings.
  RDF_UNESCAPE_MAP = Hash.new { |h, k| h[k] = k.chr }
  RDF_UNESCAPE_MAP.update({
    ?"  => '"',
    ?\\ => '\\',
    ?/  => '/',
    ?b  => "\b",
    ?f  => "\f",
    ?n  => "\n",
    ?r  => "\r",
    ?t  => "\t",
    ?u  => nil, 
  })

  if defined?(::Encoding)
    UNESCAPE_RE = %r(
      (?:\\[\\bfnrt"/])   # Escaped control characters, " and /
      |(?:\\U00\h{6})     # 6 byte escaped Unicode
      |(?:\\u\h{4})       # 4 byte escaped Unicode
    )x
  else
    UNESCAPE_RE = %r((?:\\[\\bfnrt"/]|(?:\\u(?:[A-Fa-f\d]{4}))+|\\[\x20-\xff]))n
  end

  # Reverse operation of escape
  # From JSON parser
  def rdf_unescape
    return '' if self.empty?
    string = self.gsub(UNESCAPE_RE) do |c|
      case c[1,1]
      when 'U'
        raise RdfException, "Long Unicode escapes no supported in Ruby 1.8" unless defined?(::Encoding)
        eval(c.sub(/\\U00(\h+)/, '"\u{\1}"'))
      when 'u'
        bytes = [c[2, 2].to_i(16), c[4, 2].to_i(16)]
        Iconv.new('utf-8', 'utf-16').iconv(bytes.pack("C*"))
      else
        RDF_UNESCAPE_MAP[c[1]]
      end
    end
    string.force_encoding(Encoding::UTF_8) if defined?(::Encoding)
    string
  rescue Iconv::Failure => e
    raise RdfException, "Caught #{e.class}: #{e}"
  end
end

Interned URIs should be marked as frozen

Since interned RDF::URI instances are global to a Ruby process, being shared across different threads and varying use cases, they should be immutable in more than just principle.

The way to ensure this is for RDF::URI.intern to call #freeze whenever it constructs a new URI instance, which will then cause Ruby to throw a RuntimeError: can't modify frozen object exception if somebody inadvertently tries to modify a returned URI object.

Time to XSD.time mapping is ambiguous

Ruby's Time class can represent either a datetime or just a time by itself. Currently, however, RDF.rb treats Time instances as if they always straightforwardly mapped to the XSD.time datatype. This is clearly wrong, as the following demonstrates:

>> RDF::Literal.new(Time.parse("2010-12-31T12:34:56Z"))
=> #<RDF::Literal::Time:0x80f9f378("12:34:56Z"^^<http://www.w3.org/2001/XMLSchema#time>)>

We need additional logic in RDF::Literal.new to ensure we correctly map Time instances to the XSD.dateTime datatype when the object in question contains a date component as well.

HTTP proxy support

open-uri has a :proxy option - we currently can't use rdf.rb for a client as their internal network uses a proxy to get out (yes, they're consuming their own data...).

RDF::Literal, RDF::Graph do not support #anonymous?

gkellog noticed that RDF::Literal does not support #anonymous? or #unlabeled?, which are currently defined only on RDF::URI and RDF::Node.

I implemented #anonymous on RDF::Literal and RDF::Graph and sent a pull request. Not sure you'll agree with the semantics for Graph but I think it's what we want.

Reader/Writer#prefix value should not be a URI

The current implementation of Reader/Writer#prefix takes an optional uri to associate with the prefix. In fact, this may not be a URI at all. The only requirement is that when the prefix value as attached to a suffix, that that be a URI. Consider these rules from RDF/XML, used for creating prefix mappings required for defining predicate relatinonships:

An XML namespace-qualified name (QName) has restrictions on the legal characters such that not all property URIs can be expressed
as these names. It is recommended that implementors of RDF serializers, in order to break a URI into a namespace name and a local
name, split it after the last XML non-NCName character, ensuring that the first character of the name is a Letter or '_'. If the
URI ends in a non-NCName character then throw a "this graph cannot be serialized in RDF/XML" exception or error.

One of the RDFa tests verifies that, without prefix mappings, that dc:title will be treated as a URI, not a CURIE. It is, in fact, a valid URI. Following the process outlined above, you come up with a prefix of mapping of "dc:", which, when applied to the suffix "title", re-generates the original URI "dc:title".

The change need to #prefix would be to just not cast the uri parameter as an RDF::URI, but just intern it as a string:

def prefix(name, uri = nil)
  name = name.to_s.empty? ? nil : (name.respond_to?(:to_sym) ? name.to_sym : name.to_s.to_sym)
  uri.nil? ? prefixes[name] : prefixes[name] = (uri.respond_to?(:to_sym) ? uri.to_sym : uri.to_s.to_sym)
end

Make it easier to enumerate serialisers

Hello,

Please can you make it easier to enumerate the available serialisers. It is currently quite difficult to get the name, extensions and mime-type for each of the serialisers.

<link rel="alternate" type="application/rdf+xml" href="http://dbpedia.org/data/Oxford.rdf" title="Structured Descriptor Document (RDF/XML format)" />
<link rel="alternate" type="text/rdf+n3" href="http://dbpedia.org/data/Oxford.n3" title="Structured Descriptor Document (N3/Turtle format)" />
<link rel="alternate" type="application/json+rdf" href="http://dbpedia.org/data/Oxford.jrdf" title="Structured Descriptor Document (RDF/JSON format)" />
<link rel="alternate" type="application/json" href="http://dbpedia.org/data/Oxford.json" title="Structured Descriptor Document (RDF/JSON format)" />

It would be great to be able to do this:
>> f = RDF::Format.for(:ntriples)
=> RDF::NTriples::Format
>> f.name
=> "N-Triples"
>> f.content_types.first
=> "text/plain"
>> f.file_extensions.first
=> "nt"

nick.

Non-linear performance curve in graph traversal

The attached code runs the same test three times, each time it uses a larger source file. The test consists of: create a new graph, load the source document into the graph, identify a list of concepts resources, query for the rdfs:label of each concept resource. The time taken for the last step grows out-of-proportion with the size of the input document.

Here's the output I get on my machine:

ian@rowan-15 $ ruby rdf_misc_tests.rb 
Loaded suite rdf_misc_tests
Started
Initializing with account-code.ttl
 ... parsing complete in 1.1s producing 4711 triples
 ... got code list root, now indexing 
 ... got 587 concepts to index in 0.1s
 ... collected names in 17.3s.
4241.37 triples/sec parsing, 5579.79 resources/sec query, collected 34.00 names/sec
.Initializing with programme-object-group-code.ttl
 ... parsing complete in 3.1s producing 15895 triples
 ... got code list root, now indexing 
 ... got 1985 concepts to index in 0.4s
 ... collected names in 207.1s.
5086.36 triples/sec parsing, 5476.99 resources/sec query, collected 9.59 names/sec
.Initializing with programme-object-code.ttl
 ... parsing complete in 16.7s producing 38855 triples
 ... got code list root, now indexing 
 ... got 4855 concepts to index in 0.9s
 ... collected names in 1286.2s.
2333.01 triples/sec parsing, 5188.34 resources/sec query, collected 3.77 names/sec
.

Finished in 1533.469101951 seconds.

3 tests, 0 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed

Note before running the test that the last step takes over 20 minutes. For reference, I'm using Ruby 1.9.1 on a four-core 64 bit linux machine with 8Gb of memory. Ruby version says:

ian@rowan-15 $ ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux]
~/workspace/coins/ruby/bugrep

I'm using the following version of RDF.rb:

ian@rowan-15 $ gem list --local | grep rdf
rdf (0.2.1)
rdf-raptor (0.4.0)
rdf_context (0.5.6)

Ah. Just realised that I can't attach a file to this issue report (unless I'm missing something on github). Code is here: http://iandickinson.me.uk/download/rdf-ruby-perftest.tar

RDF::Literal#canonicalize should downcase the language tag

Literal#language is currently transformed into a constant. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-plain-literal indicates that a plain literal may have a language tag as defined in RFC-3066, normalized to lower case. This includes tags with a primary-subtag and a subtag, such as "en-us". Changing options[:language].to_sym, dis-allows the this, because :en-us is not a Ruby symbol.

Also, note that normalization should force the language value to lower-case.

Addressable ~> 2.1.2 does not allow 2.2.0

Addressable::URI 2.2.0 adds some important fixes to URI format checking. If another gem includes Addressable 2.2.0, RDF will fail when loading with the following:

RubyGem version error: addressable(2.2.0 not ~> 2.1.2) (Gem::LoadError)

Literal subclasses must ensure datatype is a URI

Consider the following:

RDF::Literal.new("10", :datatype => "http://www.w3.org/2001/XMLSchema#integer").datatype.inspect

Note that this is a string, and not a URI. This is because Literal.new does a case comparison by first typecasting the datatype to a URI, but not using that type-casted value in the instantiation of a subclass.

RDF::Writer#insert_graph error since RDF.rb 0.3.2

Hello,

Since I updated to rdf-0.3.2 when I run:
require 'rdf'
src = %{
http://rdf.rubyforge.org/RDF/Writer.html#insert_graph http://www.w3.org/1999/02/22-rdf-syntax-ns#label "Writer#insert_graph test" .
}

reader = RDF::Reader.for(:ntriples).new(src)
graph = RDF::Graph.new << reader

RDF::Writer.open("insert_graph.nt") do |writer|
    writer.insert_graph graph
end

I raises:

insert_graph.rb:11: protected method `insert_graph' called for #<RDF::NTriples::Writer:0x1020d2548> (NoMethodError)
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:186:in `call'
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:186:in `initialize'
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:155:in `new'
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:155:in `open'
  from /Library/Ruby/Gems/1.8/gems/rdf-0.3.2/lib/rdf/writer.rb:154:in `open'
  from insert_graph.rb:10

If I use the method #write_graph instead, it works as expected but, the source code (lib/rdf/writer.rb:284) says:

# @deprecated replace by `RDF::Writable#insert_graph`

Am I missing something?

Thanks!

Vocabulary.new does not allow vocabulary to be enumerated

Create a new ad-hoc vocabulary such as the following:

foo = RDF::Vocabulary.new("http://foo.com#")

Running Vocabulary.each(&:to_s) should return the newly created vocabulary. This is necessary if you want to be able to use it for URI#qname, for example. Note that if you name the anonymous class, such as

RDF::FOO = Class.new(Vocabulary.new("http://foo.com#"))

It will be enumerated. Perhaps either have a #name= method, or some other way to assign the ad-hoc vocabulary a name. Borrowing from ActiveSupport#constantize:

"RDF::FOO".constantize = Class.new(Vocabulary.new("http://foo.com#"))

XMLLiteral canonicalization

XMLLiterals need to be treated differently than other literals. In particular, it is necessary for XML and RDF readers to add namespace definitions to XMLLiterals. Also, equivalence tests look for two semantically equivalent XMLLiterals that are textually different to be equivalent; this is best handled by canonicalizing XMLLiterals.

Requirements are defined more specifically for RDFa [1], but should apply to all readers. Many tests look for equivalence of XMLLiterals that are defined somewhat differently, so the real thing to do is to perform an exclusive canonicalization [2]. See also in RDF Concepts [3].

In rdf-rdfxml this is handled incompletely by transferring namespaces and performing a partial re-write of the XML. See Literal.xmlliteral in rdf-rdfxml. A more complete solution would involve using the c14n module from libXML2, not usable directly through standard ruby bindings (is implemented at [4]).

RdfConcept deals with this by performing a partial transformation with namespace transfer and minimal rewriting and putting the burden in the literal comparison (which could be done in ref-isomorphic) by turning each XML Literal into a hash using ActiveSupport::XmlMini.parse and doing hash comparison.

[1] http://www.w3.org/TR/rdfa-core/#s_xml_literals
[2] http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/
[3] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral
[4] http://rubygems.org/gems/coupa-libxml-ruby

RDF::Literal equality for non-canonical literals intended?

This is the current behavior for non-canonical literals in HEAD:

irb(main):024:0* x = RDF::Literal.new("001", :datatype => RDF::XSD.integer)
=> #<RDF::Literal::Integer:0xb97094("001"^^<http://www.w3.org/2001/XMLSchema#integer>)>
irb(main):025:0> y = x.canonicalize
=> #<RDF::Literal::Integer:0xb96356("1"^^<http://www.w3.org/2001/XMLSchema#integer>)>
irb(main):026:0> y == x
=> true
irb(main):027:0> y.eql? x
=> true

Is this intended? I realized while doing the canonicalize option for rdf-isomorphic that this is the behavior, but this would mean it's not needed.

RDF::Mutable does not open URIs

RDF::Mutable does not open URIs via load:

RDF::Repository.load('http://datagraph.org/jhacker/foaf.nt')
Errno::ENOENT: No such file or directory - http://datagraph.org/jhacker/foaf.nt
    from /opt/local/lib/ruby/gems/1.8/gems/rdf-0.1.1/lib/rdf/reader.rb:107:in `initialize'
    ...

RDF::NTriples::Writer#format_uri should escape value

Just as literals must be escaped to be represented as valid RDF strings, URIs must also be escaped.

Consider making the following change:

def format_uri(uri, options = {})
  "<%s>" % escaped(uri_for(uri))
end

Here are specs I've used:

describe "utf-8 escaped" do
  {
    %(http://a/D%C3%BCrst)                => %(<http://a/D%C3%BCrst>),
    %(http://a/D\u00FCrst)                => %(<http://a/D\\u00FCrst>),
    %(http://b/Dürst)                     => %(<http://b/D\\u00FCrst>),
    %(http://a/\u{15678}another)          => %(<http://a/\\U00015678another>),
  }.each_pair do |uri, dump|
    it "should dump #{uri} as #{dump}" do
      RDF::URI.new(uri).to_ntriples.should == dump
    end
  end
end

running a sparql query on a sesame based rdf store

Hi,

I am trying to run a basic sparql query on a sesame based rdf store.

I can connect to an RDF store on sesame and print out all of the results, but that's about it. Having some challenges with the documentation for doing more advanced stuff (and pretty new to ruby, but not coding)

Here is my simple query:

#SELECT ?title
#WHERE
#{
#  <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title .
#} 

So far I have the following:

puts "Trying a different method for test"
  urlTest = "http://localhost:8080/openrdf-sesame/repositories/test" 

the above works, but when I append the below phrase to the above I get nothing.

not sure if you're supposed to do it this way anyways

?query=SELECT+%3Ftitle+WHERE+{+http://example.org/book/book1+http://purl.org/dc/elements/1.1/title+%3Ftitle+.+}"

repositoryTest = RDF::Sesame::Repository.new(urlTest)
repositoryTest.each {|x| puts x} #(&block)
puts "run a query:"

not sure if this is the right way to set up a query

queryTest = RDF::Query.new( urlTest ) 
puts "New query instantiated"
query.select(:title)
puts "Title selected from query"
query.each {|x| puts x} #(&block)
puts "Query results printed out"

Thanks in advance,

Bryan

Implement RDF::List support

Support for the other RDF collection types can wait until someone actually needs them, but RDF::List is pretty crucial. Dealing with rdf:List structures in the form of blank nodes is just painful.

We laid the groundwork for collection support earlier in ensuring that we always first check that an object responds to #each_statement before we check for #each, which becomes important with containers that return non-statements from #each. Let's build from there.

XSD.string is a curious special case

The recent round of RDF::Literal updates left XSD.string in a strange place. Strings are an implicit default type. Thus, currently, RDF::Literal handles language directly, which shouldn't be the case, as it's only defined on strings.

I'd like to factor out Strings into their own RDF::Literal::String class, and further, to return for the Ruby version of the literal not an instance of String but of a subclass thereof, which contains language data. This will make round-tripping easier and let me cleanly solve Spira issue 15 at http://github.com/datagraph/spira/issues/#issue/15.

If I do this, will you merge it, or is there a reason that Strings are the way they are?

Consider graph validation option for RDF.rb's in-memory repository

I just tracked down an issue on spira in which could have been found if we had a repository that performed validation before writing things down; a predicate was being saved as a string. It would be useful for testing if we had a version of RDF::Repository that performed input validation.

So I am thinking something like this:

RDF::Validating::Repository.new

or

RDF::Repository.new(:validate => true)

Whereupon:

RDF::Repository << RDF::Statement.new(RDF::DC.title, "a string", "another string")
#=> RDF::TypeError: Statement predicate must respond to #to_uri

If I implemented either of these, is that something you'd want to have available in core?

N-Triples output escaped incorrectly on Ruby 1.9

The following works in 1.8 but not 1.9 (forgive the invalid ntriples as input):

require 'rdf'
s = RDF::NTriples.unserialize '<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." '
RDF::NTriples.serialize(s)

1.8:

ben:rdf ben$ irb
>>     require 'rdf'
=> true
>>     s = RDF::NTriples.unserialize '<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." '
=> #<RDF::Statement:0x90bbb8(<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." .)>
>>     RDF::NTriples.serialize(s)
=> "<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jh\305\253l\304\201." .\n"

1.9:

ben:rdf ben$ irb1.9
irb(main):001:0>     require 'rdf'
=> true
irb(main):002:0>     s = RDF::NTriples.unserialize '<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." '
=> #<RDF::Statement:0x93f260(<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> "Jhūlā." .)>
irb(main):003:0>     RDF::NTriples.serialize(s)
=> "<http://openlibrary.org/b/OL3M> <http://RDVocab.info/Elements/titleProper> \"Jhūlā.\" .\n"

Enable Ruby-idiomatic aliases for camelCased property names

Instead of contaminating our Ruby code with camelCased monstrosities such as:

FOAF.firstName  #=> RDF::URI("http://xmlns.com/foaf/0.1/firstName")
RDFS.seeAlso    #=> RDF::URI("http://www.w3.org/2000/01/rdf-schema#seeAlso")  
OWL.sameAs      #=> RDF::URI("http://www.w3.org/2002/07/owl#sameAs")
XSD.dateTime    #=> RDF::URI("http://www.w3.org/2001/XMLSchema#dateTime")

...we ought to be able to stick with Ruby conventions and say:

FOAF.first_name #=> RDF::URI("http://xmlns.com/foaf/0.1/firstName") 
RDFS.see_also   #=> RDF::URI("http://www.w3.org/2000/01/rdf-schema#seeAlso")
OWL.same_as     #=> RDF::URI("http://www.w3.org/2002/07/owl#sameAs")
XSD.date_time   #=> RDF::URI("http://www.w3.org/2001/XMLSchema#dateTime")

There's no reason we can't transparently support both naming conventions.

Please pass :base_uri to readers

Please can you pass the URI being loaded as :base_uri to readers, so that it is possible to write:

graph = RDF::Graph.load('http://rdfa.digitalbazaar.com/test-suite/test-cases/xhtml1/0001.xhtml')

Instead of:

graph = RDF::Graph.load('http://rdfa.digitalbazaar.com/test-suite/test-cases/xhtml1/0001.xhtml', :base_uri => 'http://rdfa.digitalbazaar.com/test-suite/test-cases/xhtml1/0001.xhtml')

RuntimeError: can't modify frozen object on Ruby 1.9.2

I just did a local install of the latest checked in source (0.3.0.pre). A simple vocabulary expansion results in a "can't modify frozen object" error.

[rdf] irb
ruby-1.9.2-p0 > require 'rdf'
 => true 
ruby-1.9.2-p0 > RDF::FOAF.to_uri
RuntimeError: can't modify frozen object
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/util/cache.rb:58:in `define_finalizer'
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/util/cache.rb:58:in `define_finalizer!'
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/util/cache.rb:93:in `[]='
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/model/uri.rb:57:in `intern'
from /Users/gregg/.rvm/gems/ruby-1.9.2-p0/gems/rdf-0.2.3/lib/rdf/vocab.rb:93:in `to_uri'
from (irb):2
from /Users/gregg/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in `<main>'

private method `puts' called for "spec/data/output.nt":String (NoMethodError)

When i run

require 'rdf'

graph = RDF::Graph.new

s = RDF::URI.new("http://gemcutter.org/gems/rdf")
p = RDF::DC.creator
o = RDF::URI.new("http://ar.to/#self")

graph << RDF::Statement.new(s, p, o)

graph.each do |elem|
  puts elem.inspect
end

RDF::Writer.for(:ntriples).new("spec/data/output.nt") do |writer|
  graph.each_statement do |statement|
    writer << statement
  end
end

i got this

c:/ruby/lib/ruby/gems/1.8/gems/rdf-0.0.9/lib/rdf/writer.rb:248:in `puts': private method `puts' called for "spec/data/output.nt":String (NoMethodError)

URI#join and normalization issues

URI joining and normalization is not well documented, but can be inferred from various W3C tests. Best described in RFC3986 section 5.2 [1]. Much of this is handled by Addressable::URI#join

The following specs were created when developing RdfContext to ensure proper normalization of joined URIs:

describe "normalization" do
  {
    %w(http://foo ) =>  "http://foo/",
    %w(http://foo a) => "http://foo/a",
    %w(http://foo /a) => "http://foo/a",
    %w(http://foo #a) => "http://foo/#a",

    %w(http://foo/ ) =>  "http://foo/",
    %w(http://foo/ a) => "http://foo/a",
    %w(http://foo/ /a) => "http://foo/a",
    %w(http://foo/ #a) => "http://foo/#a",

    %w(http://foo# ) =>  "http://foo/", # Special case for Addressable
    %w(http://foo# a) => "http://foo/a",
    %w(http://foo# /a) => "http://foo/a",
    %w(http://foo# #a) => "http://foo/#a",

    %w(http://foo/bar ) =>  "http://foo/bar",
    %w(http://foo/bar a) => "http://foo/a",
    %w(http://foo/bar /a) => "http://foo/a",
    %w(http://foo/bar #a) => "http://foo/bar#a",

    %w(http://foo/bar/ ) =>  "http://foo/bar/",
    %w(http://foo/bar/ a) => "http://foo/bar/a",
    %w(http://foo/bar/ /a) => "http://foo/a",
    %w(http://foo/bar/ #a) => "http://foo/bar/#a",

    %w(http://foo/bar# ) =>  "http://foo/bar",
    %w(http://foo/bar# a) => "http://foo/a",
    %w(http://foo/bar# /a) => "http://foo/a",
    %w(http://foo/bar# #a) => "http://foo/bar#a",

    %w(http://foo/bar# #D%C3%BCrst) => "http://foo/bar#D%C3%BCrst",
    %w(http://foo/bar# #Dürst) => "http://foo/bar#D%C3%BCrst",
  }.each_pair do |input, result|
    it "should create <#{result}> from <#{input[0]}> and '#{input[1]}'" do
      RDF::URI.new(input[0]).join(input[1].to_s).normalize.to_s.should == result
    end
  end

Note that rules for URIs are different than rules for namespace declarations. A URI can/should be canonicalized (e.g. http://foo.com => http://foo.com/) but a namespace should not (e.g., @Prefix foo: http://foo.com#. foo:a foo:b foo:c. => http://foo.com#a http://foo.com#b http://foo.com#c).

[1] http://tools.ietf.org/html/rfc3986#page-30
W3C rdfcore xmlbase tests: http://www.w3.org/2000/10/rdf-tests/rdfcore/xmlbase/

Problem in N-Triples writer

A serious bug slipped through to the 0.1.0 release's N-Triples writer implementation:

NameError: undefined local variable or method `node' for #<RDF::NTriples::Writer:0x1023c6628>
    rdf-0.1.0/lib/rdf/ntriples/writer.rb:36:in `format_node'
    rdf-0.1.0/lib/rdf/writer.rb:226:in `format_value'
    rdf-0.1.0/lib/rdf/ntriples/writer.rb:26:in `write_triple'
    rdf-0.1.0/lib/rdf/ntriples/writer.rb:26:in `map'
    rdf-0.1.0/lib/rdf/ntriples/writer.rb:26:in `write_triple'
    rdf-0.1.0/lib/rdf/writer.rb:199:in `write_statement'
    rdf-0.1.0/lib/rdf/writer.rb:163:in `<<'

This affects the serialization of any statements that contain blank nodes. Fix coming up ASAP.

Add dump method to RDF::Enumerable

What do you think about adding the following method to RDF::Enumerable? Makes it super easy to serialise something...

def dump(args)
  RDF::Writer.for(*args).dump(self)
end

Literals should allow for validation and normalization

RDF places limitations on the lexical value of typed literals [1]. Values must belong the lexical space of the relevant datatype. XML Schema defines the value space of various primitive datatype [2].

RDF::Literal should implement a #valid? method to verify the validity of typed literals.

Specs for various different datatypes are implemented in RdfContext, the relevant mapping information is included here.

xsd:decimal:

  "1"                              => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "-1"                             => %("-1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1."                             => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1.0"                            => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1.00"                           => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "+001.00"                        => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "123.456"                        => %("123.456"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.345"                          => %("2.345"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1.000000000"                    => %("1.0"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.3"                            => %("2.3"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.234000005"                    => %("2.234000005"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.2340000000000005"             => %("2.2340000000000005"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.23400000000000005"            => %("2.234"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "2.23400000000000000000005"      => %("2.234"^^<http://www.w3.org/2001/XMLSchema#decimal>),
  "1.2345678901234567890123457890" => %("1.2345678901234567"^^<http://www.w3.org/2001/XMLSchema#decimal>),

xsd:boolean

  "true"  => %("true"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "false" => %("false"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "tRuE"  => %("true"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "FaLsE" => %("false"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "1"     => %("true"^^<http://www.w3.org/2001/XMLSchema#boolean>),
  "0"     => %("false"^^<http://www.w3.org/2001/XMLSchema#boolean>),

xsd:integer

  "01" => %("1"^^<http://www.w3.org/2001/XMLSchema#integer>),
  "1"  => %("1"^^<http://www.w3.org/2001/XMLSchema#integer>),
  "-1" => %("-1"^^<http://www.w3.org/2001/XMLSchema#integer>),
  "+1" => %("1"^^<http://www.w3.org/2001/XMLSchema#integer>),

xsd:double

  "1"         => %("1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "-1"        => %("-1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "+01.000"   => %("1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "1."        => %("1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "1.0"       => %("1.0E0"^^<http://www.w3.org/2001/XMLSchema#double>),
  "123.456"   => %("1.23456E2"^^<http://www.w3.org/2001/XMLSchema#double>),
  "1.0e+1"    => %("1.0E1"^^<http://www.w3.org/2001/XMLSchema#double>),
  "1.0e-10"   => %("1.0E-10"^^<http://www.w3.org/2001/XMLSchema#double>),
  "123.456e4" => %("1.23456E6"^^<http://www.w3.org/2001/XMLSchema#double>),

xsd:date, xsd:dateTime and xsd:Time are implemented as follows:

    contents.is_a?(Time) ? contents.strftime("%H:%M:%S%Z").sub(/\+00:00|UTC/, "Z") : contents.to_s
    contents.is_a?(DateTime) ? contents.strftime("%Y-%m-%dT%H:%M:%S%Z").sub(/\+00:00|UTC/, "Z") : contents.to_s
    contents.is_a?(Date) ? contents.strftime("%Y-%m-%d%Z").sub(/\+00:00|UTC/, "Z") : contents.to_s

RdfContext also implements a Duration class that transforms integer milliseconds and floating point seconds into XSD format: [+1]PYYYYMMDDTHHMMSS.MMM

[1] http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
[2] http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#built-in-primitive-datatypesg

Enumerators on Ruby 1.8/1.9

Prompted by a recent contribution to fix Ruby 1.9 enumerator compatibility (to be included in RDF.rb 0.1.8), I'm investigating what it will take to ensure that our use of enumerators is safe and compatible with all Ruby baseline versions that we wish to support (that is, 1.8.2+ and 1.9.x).

Expensive URI#qname could cache vocabulary

Consider allowing a vocabulary to be assigned to a URI, such as might happen from uri = RDF::FOAF.name, which could have a side-effect of setting uri.vocab to RDF::FOAF. This would remove the O(N!) lookup of the URI's vocabulary. Also, a URI#vocab method would be useful in determining the assigned vocabulary of a given URI.

Inconsistent handling of context (quads)

Take this RDF_Mutable spec:

it "should not insert a statement twice" do
  @repository.insert(@statements.first)
  @repository.insert(@statements.first)
  @repository.count.should == 1
end

That is fine and good. But if I alter the the second insert to by adding (or changing) the context of the Statement object, I would expect @repository.count.should == 2. Yes, it is the same s-p-o, but in two different contexts. But with the RDF::Repository base implementation, the answer is still 1. Drilling down, that is because the == operator for the Statement objects throws away the context.

There are a variety of fixes for this, and some of them are certainly wrong, so I combed through RDF.rb to pick out behaviors of note around context handling and offer them up here with my thought on what a correct fix would be.

First off, Statement objects behaves explicitly as a triple with these methods:

  • ==
  • []
  • to_a, :to_ary
  • to_hash

And they behaves as quad with these methods closely related to those above:

  • eql?
  • ===
  • []=
  • to_s

I gather from the rdf-spec that the equality methods are intentional as they are, though I think I disagree with their current behavior. I think a Statement should always be treated as a quad, and refine the meaning of the context bit. I see two conflated API uses of the context: I have a context, or a I don't have a context versus I don't care about the context. The current behavior of the == method is a problem because it injects the I-don't-care semantics into places where the I-do-or-I-don't-have-a-context needs to be faithfully preserved, such as adding the same s-p-o into two different contexts of a RDF::Repository. The I-don't-care cases shows up mostly in query sorts APIs, such as Enumerable.has_statement? and should be intentionally handled there.

My proposal would be to move all the Statement methods listed above under the triple-like behavior to be quad like, and introduce the default context value of a boolean false for statements with no defined context, and leave the explicit value of nil for the I-don't-care case to be consistent with use of nil as a wildcard for s-p-o in various other query-oriented parts of the API. I'm fairly certain that will break some existing downstream things, so I'm putting this out for feedback and counter proposals.

So, on to some specific observations...

RDF::Mutable

Mutable.insert --- Rejects statements for which Statement.valid? is false. Valid admits statements without a context, which conspires to create problem with Mutable.delete.

Mutable.delete --- Context is currently treated as a wildcard if not supplied. The problem: A statement without a context is valid to insert, but you cannot isolate it to delete it without also taking the same triple out of other contexts. If statements with no context have a distinct value for the context, say the boolean false, they could be distinguished from an explicit "don't care" value of nil.

Mutable.update --- Implies a delete, so must behave consistently. Current behavior tosses the context on the delete, which is certainly a bug.

RDF::Enumerable

Enumerable.has_statement? --- The base class implementation is Enumeration.include? so the meaning is dictated by the == method of Statement, which currently discards the context. It behaves the same as Enumerable.has_triple?, which is not what I'd expect if I supply a Statement with an explicit context. Like Mutable.delete, this method should be able to verify both the existence of a statement with specific context, and a triple with no context (context == false), and with an explicit nil context, behave like a wildcard.

Enumerable.triples, Enumerable.each_triple -- If we cast away the context, the same triple may appear more than once. Is that a problem?

RDF::Graph --- The has_statement?, insert_statement and delete_statement implementations all depend on the Statement.== method, which throws away the context. This happens to work for Graph because the context is coerced to the same value all statements going in, so they would match if == was a quad match.

RDF::Repository --- Like Graph, the base class implementation depends on the Statement.== method and makes the latent bugs in Graph actual bugs.

Repository.has_statement? --- See Enumerable.has_statement.

Repository.insert_statement --- The duplicate check discards the context, so only one context can contain a given triple, which is a bug (and, incidentally, what lead me into investigating all this).

Feedback welcome.

More flexible literal implementation

The current implementation of RDF::Literal has some default handling for dates, floats, and so forth, but it's somewhat inflexible and not extensible. The system ought to provide a way for different XSD types to do different things with different Ruby classes, so that one could, for example, get an XSD.float as a Rational, or an XSD.XMLLiteral as a parsed Nokogiri object.

Using the RDF::RDF vocabulary

I'm having a problem accessing the RDF::RDF vocabulary. The following program fails:

require 'rdf'
puts "#{RDF::RDF.first}"

with:

ian@rowan-15 $ ruby rdf-ns-2.rb
rdf-ns-2.rb:4:in `': uninitialized constant RDF::RDF (NameError)

I think this is because the autoload isn't being triggered for RDF::RDF. If I manually force a load of the RDF vocabulary:

require 'rdf'
require 'rdf/vocab/rdf'

puts "#{RDF::RDF.first}"

then other things break:

ian@rowan-15 $ ruby rdf-ns-2.rb
/var/lib/gems/1.9.1/gems/rdf-0.2.1/lib/rdf/vocab.rb:83: warning: toplevel constant URI referenced by RDF::RDF::URI
/var/lib/gems/1.9.1/gems/rdf-0.2.1/lib/rdf/vocab.rb:83:in `[]': undefined method `intern' for URI:Module (NoMethodError)
    from /var/lib/gems/1.9.1/gems/rdf-0.2.1/lib/rdf/vocab.rb:74:in `block in property'
    from rdf-ns-2.rb:4:in `'

I'm pretty sure I'm doing something wrong, but for the time being I've resorted to defining my own RDF Namespace object, so avoid having to touch RDF::RDF.

N-Triples serializer sometimes serializes nodes invalidly

The N-Triples spec says that a node is identifed as '_:' name, where name is [A-Za-z][A-Za-z0-9]*. However, on ruby 1.8.7 from a recent Ubuntu distro, Node.new creates identifiers with a dash in them, which the N-Triples serializer incorrectly passes on to an output file, e.g.:

_:g-605660708 <http://www.w3.org/2000/01/rdf-schema#label> "Movie Tickets" .

This is kinda nasty, since rapper will reject them, thus breaking any serialization to other formats, too.

This is RDF.rb 0.2.0.1.

RDF::URI join method doesn't work for URIs ending with a hash

$ irb -rrdf
>> p = RDF::URI('http://www.w3.org/ns/rdfa#')
=> #<RDF::URI:0x810d2150(http://www.w3.org/ns/rdfa#)>
>> p.join('term')
=> #<RDF::URI:0x810d0670(http://www.w3.org/ns/rdfa/term)>

I would expect that the result would be:

http://www.w3.org/ns/rdfa#term

RDF vocabulary

The RDF vocabulary is defined and usable but not actually documented.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.