It could be possible to inline the language tag for some rdf

Future improvement: rdf:langString (lang tag) inlining about rdf4cpp HOT 1 CLOSED

rdf4cpp commented on June 13, 2024

Future improvement: rdf:langString (lang tag) inlining

from rdf4cpp.

Clueliss commented on June 13, 2024

More details about a possible implementation now that I have though about it.

First create an enum CommonLanguageTags or similar.

On rdf:langString creation do roughly the following, if the language tag is a common language tag:

NodeStorage::find_or_make_id(LiteralView{lexical_form, lang_tag})
in the resulting NodeID check if the upper X bits of the LiteralID are unset
- If yes, store the language tag (= the enum value) as in these X bits and set the inlining tagging bit
- If no, you are done

On language tag fetching do the following:

if the inlining tagging bit is set look at the upper X bits and index a lookup table that translates the enum value into a string in static memory otherwise do the normal thing

This approach has two interesting properties

language tags are effectively stored twice (once in static memory and once in the backend, but that is probably fine and saves some branches elsewhere)
rdf:langString is a type that can be partially stored in the backend which is new for rdf4cpp

The remaining question is: How many bits should this use?

more bits means more different language tags can be inlined (i.e. the number of variants in CommonLanguageTags increases)
the threshhold where language tag inlining is possible decreases (i.e. more than 2^(42-X) literals means no more language tag inlining)

from rdf4cpp.

Recommend Projects