stranger6667 / jsonschema-rs Goto Github PK

View Code? Open in Web Editor NEW

459.0 9.0 90.0 3.82 MB

JSON Schema validation library

Home Page: https://docs.rs/jsonschema

License: MIT License

Rust 97.26% Shell 0.12% Python 2.62%

jsonschema rust python hacktoberfest

jsonschema-rs's Introduction

jsonschema

A JSON Schema validator implementation. It compiles schema into a validation tree to have validation as fast as possible.

Supported drafts:

Draft 7 (except optional idn-hostname.json test case)
Draft 6
Draft 4 (except optional bignum.json test case)

Partially supported drafts (some keywords are not implemented):

Draft 2019-09 (requires the draft201909 feature enabled)
Draft 2020-12 (requires the draft202012 feature enabled)

# Cargo.toml
jsonschema = "0.18"

To validate documents against some schema and get validation errors (if any):

use jsonschema::JSONSchema;
use serde_json::json;

fn main() {
    let schema = json!({"maxLength": 5});
    let instance = json!("foo");
    let compiled = JSONSchema::compile(&schema)
        .expect("A valid schema");
    let result = compiled.validate(&instance);
    if let Err(errors) = result {
        for error in errors {
            println!("Validation error: {}", error);
            println!(
                "Instance path: {}", error.instance_path
            );
        }
    }
}

Each error has an instance_path attribute that indicates the path to the erroneous part within the validated instance. It could be transformed to JSON Pointer via .to_string() or to Vec<String> via .into_vec().

If you only need to know whether document is valid or not (which is faster):

use jsonschema::is_valid;
use serde_json::json;

fn main() {
    let schema = json!({"maxLength": 5});
    let instance = json!("foo");
    assert!(is_valid(&schema, &instance));
}

Or use a compiled schema (preferred):

use jsonschema::JSONSchema;
use serde_json::json;

fn main() {
    let schema = json!({"maxLength": 5});
    let instance = json!("foo");
    // Draft is detected automatically
    // with fallback to Draft7
    let compiled = JSONSchema::compile(&schema)
        .expect("A valid schema");
    assert!(compiled.is_valid(&instance));
}

Output styles

jsonschema supports basic & flag output styles from Draft 2019-09, so you can serialize the validation results with serde:

use jsonschema::{Output, BasicOutput, JSONSchema};
use serde_json::json;

fn main() {
    let schema_json = json!({
        "title": "string value",
        "type": "string"
    });
    let instance = json!("some string");
    let schema = JSONSchema::compile(&schema_json)
        .expect("A valid schema");
    
    let output: BasicOutput = schema.apply(&instance).basic();
    let output_json = serde_json::to_value(output)
        .expect("Failed to serialize output");
    
    assert_eq!(
        output_json, 
        json!({
            "valid": true,
            "annotations": [
                {
                    "keywordLocation": "",
                    "instanceLocation": "",
                    "annotations": {
                        "title": "string value"
                    }
                }
            ]
        })
    );
}

Custom keywords

jsonschema allows you to implement custom validation logic by defining custom keywords. To use your own keyword, you need to implement the Keyword trait and add it to the JSONSchema instance via the with_keyword method:

use jsonschema::{
    paths::{JSONPointer, JsonPointerNode},
    ErrorIterator, JSONSchema, Keyword, ValidationError,
};
use serde_json::{json, Map, Value};
use std::iter::once;

struct MyCustomValidator;

impl Keyword for MyCustomValidator {
    fn validate<'instance>(
        &self,
        instance: &'instance Value,
        instance_path: &JsonPointerNode,
    ) -> ErrorIterator<'instance> {
        // ... validate instance ...
        if !instance.is_object() {
            let error = ValidationError::custom(
                JSONPointer::default(),
                instance_path.into(),
                instance,
                "Boom!",
            );
            Box::new(once(error))
        } else {
            Box::new(None.into_iter())
        }
    }
    fn is_valid(&self, instance: &Value) -> bool {
        // ... determine if instance is valid ...
        true
    }
}

// You can create a factory function, or use a closure to create new validator instances.
fn custom_validator_factory<'a>(
    // Parent object where your keyword is defined
    parent: &'a Map<String, Value>,
    // Your keyword value
    value: &'a Value,
    // JSON Pointer to your keyword within the schema
    path: JSONPointer,
) -> Result<Box<dyn Keyword>, ValidationError<'a>> {
    // You may return validation error if the keyword is misused for some reason
    Ok(Box::new(MyCustomValidator))
}

fn main() {
    let schema = json!({"my-type": "my-schema"});
    let instance = json!({"a": "b"});
    let compiled = JSONSchema::options()
        // Register your keyword via a factory function
        .with_keyword("my-type", custom_validator_factory)
        // Or use a closure
        .with_keyword("my-type-with-closure", |_, _, _| Ok(Box::new(MyCustomValidator)))
        .compile(&schema)
        .expect("A valid schema");
    assert!(compiled.is_valid(instance));
}

Reference resolving and TLS

By default, jsonschema resolves HTTP references via reqwest without TLS support. If you'd like to resolve HTTPS, you need to enable TLS support in reqwest:

reqwest = { version = "*", features = [ "rustls-tls" ] }

Otherwise, you might get validation errors like invalid URL, scheme is not http.

Status

This library is functional and ready for use, but its API is still evolving to the 1.0 API.

Bindings

Python - See the ./bindings/python directory
Ruby - a crate by @driv3r
NodeJS - a package by @ahungrynoob

Running tests

The tests in jsonschema/ depend on the JSON Schema Test Suite. Before calling cargo test, download the suite:

$ git submodule init
$ git submodule update

These commands clone the suite to jsonschema/tests/suite/.

Now, enter the jsonschema directory and run cargo test.

$ cd jsonschema
$ cargo test

Performance

There is a comparison with other JSON Schema validators written in Rust - jsonschema_valid==0.5.2 and valico==4.0.0.

Test machine i8700K (12 cores), 32GB RAM.

Input values and schemas:

Zuora OpenAPI schema (zuora.json). Validated against OpenAPI 3.0 JSON Schema (openapi.json).
Kubernetes Swagger schema (kubernetes.json). Validated against Swagger JSON Schema (swagger.json).
Canadian border in GeoJSON format (canada.json). Schema is taken from the GeoJSON website (geojson.json).
Concert data catalog (citm_catalog.json). Schema is inferred via infers-jsonschema & manually adjusted (citm_catalog_schema.json).
Fast is taken from fastjsonschema benchmarks (fast_schema.json, fast_valid.json and fast_invalid.json).

Case	Schema size	Instance size
OpenAPI	18 KB	4.5 MB
Swagger	25 KB	3.0 MB
Canada	4.8 KB	2.1 MB
CITM catalog	2.3 KB	501 KB
Fast (valid)	595 B	55 B
Fast (invalid)	595 B	60 B

Here is the average time for each contender to validate. Ratios are given against compiled JSONSchema using its validate method. The is_valid method is faster, but gives only a boolean return value:

Case	jsonschema_valid	valico	jsonschema (validate)	jsonschema (is_valid)
OpenAPI	- (1)	- (1)	3.500 ms	3.147 ms (x0.89)
Swagger	- (2)	180.65 ms (x32.12)	5.623 ms	3.634 ms (x0.64)
Canada	40.363 ms (x33.13)	427.40 ms (x350.90)	1.218 ms	1.217 ms (x0.99)
CITM catalog	5.357 ms (x2.51)	39.215 ms (x18.44)	2.126 ms	569.23 us (x0.26)
Fast (valid)	2.27 us (x4.87)	6.55 us (x14.05)	465.89 ns	113.94 ns (x0.24)
Fast (invalid)	412.21 ns (x0.46)	6.69 us (x7.61)	878.23 ns	4.21ns (x0.004)

Notes:

jsonschema_valid and valico do not handle valid path instances matching the ^\\/ regex.
jsonschema_valid fails to resolve local references (e.g. #/definitions/definitions).

You can find benchmark code in benches/jsonschema.rs, Rust version is 1.78.

Support

If you have anything to discuss regarding this library, please, join our gitter!

jsonschema-rs's People

Contributors

Stargazers

Watchers

Forkers

macisamuele fimbault mtreinish gitter-badger dodomorandi driv3r aaron-makowski wmain duckontheweb gavadinov jrdngr peddermaster2 thebearingedge jcburnside dzikichrzan josfeenstra tamasfe benferse mrceperka alexjg rahulahoop arcsi42 pure-peace djmitche ru5ty0ne kyle-mccarthy zhiburt jacobmischka celeo vishalsodani francismurillo rafaelcaricio blacha brooooooklyn isgasho sthagen ecyrbe jqnatividad fujiapple852 adamtrs antouhou ballpointcarrot sudhanvadixit jvanstraten kod-kristoff jeromegn benfalk evanrichter simrit1 olirice levenson 6293 expyron ermakov-oleg niraj-kamdar danielbauman88 theori-io britisharmy tu6ge samwilsn aciba90 tw39124-1 whereistejas santhosh-tekuri tobz plato-solutions vectordotdev tempbottle cbor-schema iq-scm iliya-malecki m1guelpf samgqroberts eastside middle-app wugouzi kgutwin torkeln red-abierta whytheplatypus orangetux getsentry jayvdb dm-duys neurelo-public derridda solsolution bryncooke

jsonschema-rs's Issues

Do not include empty nodes in the validation tree

When a schema is compiled it is possible to have empty nodes. Example

{
    "items": {"additionalProperties": true}
}

It compiles to items: {} because true is a default value for this keyword which makes this schema empty.
Such cases should be checked and removed from the tree

Possible truncation & panic

e.g. in min_properties:

let limit = limit.as_u64().unwrap() as usize;

If the schema will contain a negative/float number for this keyword then this line will panic

On a 32-bit platform, an integer that exceeds usize it will be truncated which may lead to wrong results during validation

Affected validators:

max_items
max_length
max_properties
min_items
min_length
min_properties

As a result we should have clippy::cast_possible_truncation lint + test cases

Bug in AdditionalPropertiesFalseValidator

There is no test case for it, but instead of :

fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Object(item) = instance {
            return item.iter().next().is_some();
        }
        true
    }

it should be:

fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Object(item) = instance {
            return item.iter().next().is_none();
        }
        true
    }

I.e. it is only valid on objects without properties

Generate validators without dispatching

Even though compiling validators gives pretty good results, it is not the fastest way to perform validation in all circumstances. If we know the schema during the build time, then we can generate code that will be more efficient, than the current approach.

For example if we have this schema:

{"type": "string", "maxLength": 5}

then our current approach will basically iterate over a vector of trait objects and call their validate / is_valid methods.

The idea is to generate code like this:

fn is_valid(instance: &Value) -> bool {
    match instance {
        Value::String(value) => value.len() <= 5,
        _ => false
    }
}

https://github.com/horejsek/python-fastjsonschema does this.

Normalize URLs in resolver

to avoid extra network calls:

http://example.com/? is the same as http://example.com/

Reject more invalid schemas

E.g. multipleOf MUST accept a number strictly greater than 0. Currently, we accept any number.

We should check more cases. However, it might be solved by using a meta schema (after validation we'll know that the schema is valid)

Examples:

https://tools.ietf.org/html/draft-fge-json-schema-validation-00#section-5.1.2.1

Provide explanation in errors when the input schema is not valid

At the moment it is only Schema compilation error

Macros to return validation error

Instead of

let message = format!("'{}' is too long", item);
return Err(ValidationError::ValidationError(message));

it can be

return validation_error!("`{}` is too long", item)

Cache regexps

in funcs.rs to avoid re-computing them

`is_valid` should return Result

Because it can fail on SchemaError

Provide a better documentation for keywords implementation

Probably some places might look quite confusing without a proper explanation. E.g. why some case converts to true validator. Maybe some links to the official documentation might help

Eliminate function call overhead in `contentMediaType` validator

it might be faster to compile validator for a concrete type. So we'll have a separate struct for application/json and separate ones for all supported content encoding + all their combinations (to avoid overhead). I assume that the simplest way to implement it without duplication will be using macros

Current implementation

Handle errors instead `unwrap`

In some cases, it might be better to return an error to the client instead. they are mostly in the resolver.

But for known to be valid regexes & URLs we can use expect

Add rayon back

It was removed during error iterator implementation

Implement From trait for missing errors

e.g. reqwest error

Refactor benchmarks

At the moment I see these disadvantages of the current implementation:

They test only the performance of is_valid. We should bench validate as well
benchmark names are hardcoded and often duplicated. we should autogenerate them, so they are not re-written accidentally during the run
a lot of duplicated schemas. They could be reorganized with some macro
a lot of duplicate code in benches implementation
commented code. actually it will be better to uncomment and then select by name

Restructure project

resolver & validators should be at the same level
errors should be grouped in the same file
move format checkers into a separate file
types separately

Extract resolver from Config

Python interface

PyO3

As proposed by @macisamuele , this library might help - https://github.com/macisamuele/json-trait-rs

CFFI implementation that I started is far from being done, there are some weird edge cases that I don't know how to handle. E.g. calling PyLong_FromLongLong on a big integer produces more corruption :/

Optimize double replace call

in pointer

Compile validators for specific schemas

As python-fastjsonschema does. Not yet sure how to implement

Avoid copying to ValidationError

In most of the cases, we copy data into ValidationError instance like this (taken from the implementation of required keyword):

    fn validate<'a>(&self, _: &'a JSONSchema, instance: &'a Value) -> ErrorIterator<'a> {
        if let Value::Object(item) = instance {
            for property_name in &self.required {
                if !item.contains_key(property_name) {
                    return error(ValidationError::required(instance, property_name.clone()));
                }
            }
        }
        no_error()
    }

instance is later wrapped in Cow::Borrowed but property_name is cloned. Sometimes instance is cloned too via ValidationError::into_owned (e.g. in additional_properties keyword implementation) so it can be used in our error iterator.

I assume that it is possible to avoid cloning, but it will require some lifetime tweaks which I failed to implement (a couple of times)

Group validators by the input type

The idea is to store validators in groups by the input type, e.g. all validators that can be applied to a number, object, array, string, etc.

What we can get from it

Less pattern matching on the matching type

Consider this schema: {"minimum": 1, "maximum": 10}

Essentially we have 2 validators that together roughly do the following:

if let Value::Number(item) = instance {
    let item = item.as_f64().unwrap();
    if item < self.limit {
        return false;
    }
}
if let Value::Number(item) = instance {
    let item = item.as_f64().unwrap();
    if item > self.limit {
        return false;
    }
}

Pattern matching twice, and item.as_f64().unwrap() twice. Instead, we can do on the root validation method (and in nodes where it is appropriate):

... // some common validators for any type here
match instance {
    Value::Number(item) => {
        let item = item.as_f64().unwrap();
        // first validator inlined for illustration
        if item < self.limit {
            return false;
        };
        if item > self.limit {
            return false;
        }
        true
    }
    ...
}

In this arm, we can apply exclusiveMaximum, exclusiveMinimum, minimum, maximum, and multipleOf.

Much simpler validators

Instead of this:

    fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Number(item) = instance {
            let item = item.as_f64().unwrap();
            if item < self.limit {
                return false;
            }
        }
        true
    }

we can do this:

    fn is_valid(&self, item: f64) -> bool {
        item < self.limit
    }

And there is no need to pass a not used reference to JSONSchema instance. The same simplification can be applied to the validate method.

Faster execution for not-matching types

Currently, if we pass null to the validator above, we'll still call both of them in a loop. and they both will return true. With that idea, there will be only 1 pattern matching in the root + maybe some small checks which I'll describe below

More insights where to apply parallel execution

We can know for sure that there is no point to apply any parallel execution for numeric validators, since they are fast and there are only 5 of them. In other words, the surface of possibilities will be more visible (only applicable to arrays and objects) and smaller.

As a downside, I see that there could be some extra logic to iterate over two vectors (common & specific validators) which may have higher overhead for some small schemas with a single keyword

Also, the implementation will require splitting to multiple traits.

But anyway, this option is worth exploring, maybe some other optimizations will be more visible on the way

I think that this idea can be also applied to the compilation phase

Optimize check_time

It might be faster with a single regex rather than with 4 calls to parse_from_str

Canonicalise schemas during compilation

We can eliminate some not-efficient options. e.g:

{"anyOf": [{"type": "string"}, {"type": "number"}]}

can be simplified to:

{"type": ["string", "number"}

And if there are integer and number, then it can be replaced with {"type": "number"} since number includes integer.

Another approach that we can take additionally:

When a schema is compiled it is possible to have empty nodes. Example

{
    "items": {"additionalProperties": true}
}

It compiles to items: {} because true is a default value for this keyword which makes this schema empty. Such cases should be checked and removed from the tree

Finish incomplete error messages

Update "Performance" section

It will be more fair to have two groups - compiled and not compiled. Currently, results from jsonschema_valid and valico are done with compiled validators. So, basically we need to move jsonschema (not compiled) column into a new table and compare it with not compiled versions of jsonschema_valid and valico.

Rust compiler version & options will also be useful there. As for benchmarks probably it will be better to compile with LTO and RUSTFLAGS="--emit=asm"

Add code coverage reports

Something like this + uploading to codecov

https://github.com/mozilla/grcov
https://github.com/actions-rs/grcov

Implement Draft 4

Store & use meta-schemas

If we'll validate the input schemas for conformance to the respective specs, then:

We probably can skip a lot of our own checks during the compilation process
There will be an understandable error message in case if the input schema is not valid

Regarding the implementation details - it can be done via lazy_static! so the schema is not re-compiled. In the perfect scenario I'd like to have it done via code-generation (like described here - #46)