Code Monkey home page Code Monkey logo

jsonschema-rs's Introduction

jsonschema

ci codecov Crates.io docs.rs gitter

A JSON Schema validator implementation. It compiles schema into a validation tree to have validation as fast as possible.

Supported drafts:

  • Draft 7 (except optional idn-hostname.json test case)
  • Draft 6
  • Draft 4 (except optional bignum.json test case)

Partially supported drafts (some keywords are not implemented):

  • Draft 2019-09 (requires the draft201909 feature enabled)
  • Draft 2020-12 (requires the draft202012 feature enabled)
# Cargo.toml
jsonschema = "0.18"

To validate documents against some schema and get validation errors (if any):

use jsonschema::JSONSchema;
use serde_json::json;

fn main() {
    let schema = json!({"maxLength": 5});
    let instance = json!("foo");
    let compiled = JSONSchema::compile(&schema)
        .expect("A valid schema");
    let result = compiled.validate(&instance);
    if let Err(errors) = result {
        for error in errors {
            println!("Validation error: {}", error);
            println!(
                "Instance path: {}", error.instance_path
            );
        }
    }
}

Each error has an instance_path attribute that indicates the path to the erroneous part within the validated instance. It could be transformed to JSON Pointer via .to_string() or to Vec<String> via .into_vec().

If you only need to know whether document is valid or not (which is faster):

use jsonschema::is_valid;
use serde_json::json;

fn main() {
    let schema = json!({"maxLength": 5});
    let instance = json!("foo");
    assert!(is_valid(&schema, &instance));
}

Or use a compiled schema (preferred):

use jsonschema::JSONSchema;
use serde_json::json;

fn main() {
    let schema = json!({"maxLength": 5});
    let instance = json!("foo");
    // Draft is detected automatically
    // with fallback to Draft7
    let compiled = JSONSchema::compile(&schema)
        .expect("A valid schema");
    assert!(compiled.is_valid(&instance));
}

Output styles

jsonschema supports basic & flag output styles from Draft 2019-09, so you can serialize the validation results with serde:

use jsonschema::{Output, BasicOutput, JSONSchema};
use serde_json::json;

fn main() {
    let schema_json = json!({
        "title": "string value",
        "type": "string"
    });
    let instance = json!("some string");
    let schema = JSONSchema::compile(&schema_json)
        .expect("A valid schema");
    
    let output: BasicOutput = schema.apply(&instance).basic();
    let output_json = serde_json::to_value(output)
        .expect("Failed to serialize output");
    
    assert_eq!(
        output_json, 
        json!({
            "valid": true,
            "annotations": [
                {
                    "keywordLocation": "",
                    "instanceLocation": "",
                    "annotations": {
                        "title": "string value"
                    }
                }
            ]
        })
    );
}

Custom keywords

jsonschema allows you to implement custom validation logic by defining custom keywords. To use your own keyword, you need to implement the Keyword trait and add it to the JSONSchema instance via the with_keyword method:

use jsonschema::{
    paths::{JSONPointer, JsonPointerNode},
    ErrorIterator, JSONSchema, Keyword, ValidationError,
};
use serde_json::{json, Map, Value};
use std::iter::once;

struct MyCustomValidator;

impl Keyword for MyCustomValidator {
    fn validate<'instance>(
        &self,
        instance: &'instance Value,
        instance_path: &JsonPointerNode,
    ) -> ErrorIterator<'instance> {
        // ... validate instance ...
        if !instance.is_object() {
            let error = ValidationError::custom(
                JSONPointer::default(),
                instance_path.into(),
                instance,
                "Boom!",
            );
            Box::new(once(error))
        } else {
            Box::new(None.into_iter())
        }
    }
    fn is_valid(&self, instance: &Value) -> bool {
        // ... determine if instance is valid ...
        true
    }
}

// You can create a factory function, or use a closure to create new validator instances.
fn custom_validator_factory<'a>(
    // Parent object where your keyword is defined
    parent: &'a Map<String, Value>,
    // Your keyword value
    value: &'a Value,
    // JSON Pointer to your keyword within the schema
    path: JSONPointer,
) -> Result<Box<dyn Keyword>, ValidationError<'a>> {
    // You may return validation error if the keyword is misused for some reason
    Ok(Box::new(MyCustomValidator))
}

fn main() {
    let schema = json!({"my-type": "my-schema"});
    let instance = json!({"a": "b"});
    let compiled = JSONSchema::options()
        // Register your keyword via a factory function
        .with_keyword("my-type", custom_validator_factory)
        // Or use a closure
        .with_keyword("my-type-with-closure", |_, _, _| Ok(Box::new(MyCustomValidator)))
        .compile(&schema)
        .expect("A valid schema");
    assert!(compiled.is_valid(instance));
}

Reference resolving and TLS

By default, jsonschema resolves HTTP references via reqwest without TLS support. If you'd like to resolve HTTPS, you need to enable TLS support in reqwest:

reqwest = { version = "*", features = [ "rustls-tls" ] }

Otherwise, you might get validation errors like invalid URL, scheme is not http.

Status

This library is functional and ready for use, but its API is still evolving to the 1.0 API.

Bindings

  • Python - See the ./bindings/python directory
  • Ruby - a crate by @driv3r
  • NodeJS - a package by @ahungrynoob

Running tests

The tests in jsonschema/ depend on the JSON Schema Test Suite. Before calling cargo test, download the suite:

$ git submodule init
$ git submodule update

These commands clone the suite to jsonschema/tests/suite/.

Now, enter the jsonschema directory and run cargo test.

$ cd jsonschema
$ cargo test

Performance

There is a comparison with other JSON Schema validators written in Rust - jsonschema_valid==0.5.2 and valico==4.0.0.

Test machine i8700K (12 cores), 32GB RAM.

Input values and schemas:

Case Schema size Instance size
OpenAPI 18 KB 4.5 MB
Swagger 25 KB 3.0 MB
Canada 4.8 KB 2.1 MB
CITM catalog 2.3 KB 501 KB
Fast (valid) 595 B 55 B
Fast (invalid) 595 B 60 B

Here is the average time for each contender to validate. Ratios are given against compiled JSONSchema using its validate method. The is_valid method is faster, but gives only a boolean return value:

Case jsonschema_valid valico jsonschema (validate) jsonschema (is_valid)
OpenAPI - (1) - (1) 3.500 ms 3.147 ms (x0.89)
Swagger - (2) 180.65 ms (x32.12) 5.623 ms 3.634 ms (x0.64)
Canada 40.363 ms (x33.13) 427.40 ms (x350.90) 1.218 ms 1.217 ms (x0.99)
CITM catalog 5.357 ms (x2.51) 39.215 ms (x18.44) 2.126 ms 569.23 us (x0.26)
Fast (valid) 2.27 us (x4.87) 6.55 us (x14.05) 465.89 ns 113.94 ns (x0.24)
Fast (invalid) 412.21 ns (x0.46) 6.69 us (x7.61) 878.23 ns 4.21ns (x0.004)

Notes:

  1. jsonschema_valid and valico do not handle valid path instances matching the ^\\/ regex.

  2. jsonschema_valid fails to resolve local references (e.g. #/definitions/definitions).

You can find benchmark code in benches/jsonschema.rs, Rust version is 1.78.

Support

If you have anything to discuss regarding this library, please, join our gitter!

jsonschema-rs's People

Contributors

aaron-makowski avatar alexjg avatar blacha avatar dependabot[bot] avatar derridda avatar djmitche avatar duckontheweb avatar ermakov-oleg avatar gavadinov avatar iliya-malecki avatar jacobmischka avatar jayvdb avatar jqnatividad avatar jrdngr avatar kgutwin avatar leoschwarz avatar macisamuele avatar orangetux avatar pdogr avatar qrayven avatar rafaelcaricio avatar samgqroberts avatar samwilsn avatar stranger6667 avatar syheliel avatar tamasfe avatar thebearingedge avatar tobz avatar wolfgangwalther avatar zhiburt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsonschema-rs's Issues

Do not include empty nodes in the validation tree

When a schema is compiled it is possible to have empty nodes. Example

{
    "items": {"additionalProperties": true}
}

It compiles to items: {} because true is a default value for this keyword which makes this schema empty.
Such cases should be checked and removed from the tree

Possible truncation & panic

e.g. in min_properties:

let limit = limit.as_u64().unwrap() as usize;

If the schema will contain a negative/float number for this keyword then this line will panic

On a 32-bit platform, an integer that exceeds usize it will be truncated which may lead to wrong results during validation

Affected validators:

  • max_items
  • max_length
  • max_properties
  • min_items
  • min_length
  • min_properties

As a result we should have clippy::cast_possible_truncation lint + test cases

Bug in AdditionalPropertiesFalseValidator

There is no test case for it, but instead of :

fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Object(item) = instance {
            return item.iter().next().is_some();
        }
        true
    }

it should be:

fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Object(item) = instance {
            return item.iter().next().is_none();
        }
        true
    }

I.e. it is only valid on objects without properties

Generate validators without dispatching

Even though compiling validators gives pretty good results, it is not the fastest way to perform validation in all circumstances. If we know the schema during the build time, then we can generate code that will be more efficient, than the current approach.

For example if we have this schema:

{"type": "string", "maxLength": 5}

then our current approach will basically iterate over a vector of trait objects and call their validate / is_valid methods.

The idea is to generate code like this:

fn is_valid(instance: &Value) -> bool {
    match instance {
        Value::String(value) => value.len() <= 5,
        _ => false
    }
}

https://github.com/horejsek/python-fastjsonschema does this.

Macros to return validation error

Instead of

let message = format!("'{}' is too long", item);
return Err(ValidationError::ValidationError(message));

it can be

return validation_error!("`{}` is too long", item)

Handle errors instead `unwrap`

In some cases, it might be better to return an error to the client instead. they are mostly in the resolver.

But for known to be valid regexes & URLs we can use expect

Refactor benchmarks

At the moment I see these disadvantages of the current implementation:

  • They test only the performance of is_valid. We should bench validate as well
  • benchmark names are hardcoded and often duplicated. we should autogenerate them, so they are not re-written accidentally during the run
  • a lot of duplicated schemas. They could be reorganized with some macro
  • a lot of duplicate code in benches implementation
  • commented code. actually it will be better to uncomment and then select by name

Restructure project

  • resolver & validators should be at the same level
  • errors should be grouped in the same file
  • move format checkers into a separate file
  • types separately

Avoid copying to ValidationError

In most of the cases, we copy data into ValidationError instance like this (taken from the implementation of required keyword):

    fn validate<'a>(&self, _: &'a JSONSchema, instance: &'a Value) -> ErrorIterator<'a> {
        if let Value::Object(item) = instance {
            for property_name in &self.required {
                if !item.contains_key(property_name) {
                    return error(ValidationError::required(instance, property_name.clone()));
                }
            }
        }
        no_error()
    }

instance is later wrapped in Cow::Borrowed but property_name is cloned. Sometimes instance is cloned too via ValidationError::into_owned (e.g. in additional_properties keyword implementation) so it can be used in our error iterator.

I assume that it is possible to avoid cloning, but it will require some lifetime tweaks which I failed to implement (a couple of times)

Group validators by the input type

The idea is to store validators in groups by the input type, e.g. all validators that can be applied to a number, object, array, string, etc.

What we can get from it

Less pattern matching on the matching type

Consider this schema: {"minimum": 1, "maximum": 10}

Essentially we have 2 validators that together roughly do the following:

if let Value::Number(item) = instance {
    let item = item.as_f64().unwrap();
    if item < self.limit {
        return false;
    }
}
if let Value::Number(item) = instance {
    let item = item.as_f64().unwrap();
    if item > self.limit {
        return false;
    }
}

Pattern matching twice, and item.as_f64().unwrap() twice. Instead, we can do on the root validation method (and in nodes where it is appropriate):

... // some common validators for any type here
match instance {
    Value::Number(item) => {
        let item = item.as_f64().unwrap();
        // first validator inlined for illustration
        if item < self.limit {
            return false;
        };
        if item > self.limit {
            return false;
        }
        true
    }
    ...
}

In this arm, we can apply exclusiveMaximum, exclusiveMinimum, minimum, maximum, and multipleOf.

Much simpler validators

Instead of this:

    fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Number(item) = instance {
            let item = item.as_f64().unwrap();
            if item < self.limit {
                return false;
            }
        }
        true
    }

we can do this:

    fn is_valid(&self, item: f64) -> bool {
        item < self.limit
    }

And there is no need to pass a not used reference to JSONSchema instance. The same simplification can be applied to the validate method.

Faster execution for not-matching types

Currently, if we pass null to the validator above, we'll still call both of them in a loop. and they both will return true. With that idea, there will be only 1 pattern matching in the root + maybe some small checks which I'll describe below

More insights where to apply parallel execution

We can know for sure that there is no point to apply any parallel execution for numeric validators, since they are fast and there are only 5 of them. In other words, the surface of possibilities will be more visible (only applicable to arrays and objects) and smaller.

As a downside, I see that there could be some extra logic to iterate over two vectors (common & specific validators) which may have higher overhead for some small schemas with a single keyword

Also, the implementation will require splitting to multiple traits.

But anyway, this option is worth exploring, maybe some other optimizations will be more visible on the way

I think that this idea can be also applied to the compilation phase

Optimize check_time

It might be faster with a single regex rather than with 4 calls to parse_from_str

Canonicalise schemas during compilation

We can eliminate some not-efficient options. e.g:

{"anyOf": [{"type": "string"}, {"type": "number"}]}

can be simplified to:

{"type": ["string", "number"}

And if there are integer and number, then it can be replaced with {"type": "number"} since number includes integer.

Another approach that we can take additionally:

When a schema is compiled it is possible to have empty nodes. Example

{
    "items": {"additionalProperties": true}
}

It compiles to items: {} because true is a default value for this keyword which makes this schema empty. Such cases should be checked and removed from the tree

Update "Performance" section

It will be more fair to have two groups - compiled and not compiled. Currently, results from jsonschema_valid and valico are done with compiled validators. So, basically we need to move jsonschema (not compiled) column into a new table and compare it with not compiled versions of jsonschema_valid and valico.

Rust compiler version & options will also be useful there. As for benchmarks probably it will be better to compile with LTO and RUSTFLAGS="--emit=asm"

Store & use meta-schemas

If we'll validate the input schemas for conformance to the respective specs, then:

  • We probably can skip a lot of our own checks during the compilation process
  • There will be an understandable error message in case if the input schema is not valid

Regarding the implementation details - it can be done via lazy_static! so the schema is not re-compiled. In the perfect scenario I'd like to have it done via code-generation (like described here - #46)

Restructure project

  • Rename Validator -> Validate
  • Rename Schema -> Validator
  • Move Scope and related things to a separate module
  • Put types.rs to relevant places
  • Rename validate_sub -> descend
  • Rename validators -> keywords
  • Move compile to the root
  • Move validators/mod.rs to the root

Improve validators debug representation

I think it will be better if this representation will be closer to the original schema. E.g.

<unique_items> vs {"uniqueItems": true}

It might be less confusing since it will use the same keywords as in the original schema

Improve compilation

Currently, all possible sub-schemas are built. Maybe only subschemas for existing refs should be built instead?

Avoid mutable context - in this case, it is harder to parallelize compilation, simple clones should work

Cache for loaded documents

Once a remote reference is resolved I think it will make sense to cache it somewhere. I assume that it might be done with RefCell. Some kind of LRU cache with a small capacity (usually there are not many remote schemas under the same document)

Setup CI

  • GitHub actions
  • Each commit - cargo fmt & cargo clippy
  • Test build

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.