Code Monkey home page Code Monkey logo

jsonschema's Introduction

jsonschema: JSON Schema Validation for Rust

⚠️ THIS LIBRARY IS WORK-IN-PROGRESS ⚠️

This crate is a from-scratch rewrite of jsonschema-rs that aims to address some of the design flaws. It started as a separate private repo, but I plan to move the development back to that one. For an in-depth roadmap, please take a look here This README represent the end goal and serves as the reference for the ongoing development.

The jsonschema crate offers performant and flexible JSON Schema validation for Rust. It provides both async and blocking reference resolving and is designed to be easy to use. The following JSON Schema drafts are supported:

  • Draft 4
  • Draft 6
  • Draft 7
  • Draft 2019-09
  • Draft 2020-12

Installation

Add this to your Cargo.toml:

[dependencies]
jsonschema = "0.18.0"

Quick Start

One-off validation:

use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let schema = json!({"type": "integer"});
    let instance = json!("a");
    jsonschema::validate(&instance, &schema).await;
    Ok(())
}

Usage

jsonschema provides an async API by default:

use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let schema = json!({"type": "integer"});
    let instance = json!("a");
    // Boolean result
    assert!(!jsonschema::is_valid(&instance, &schema).await);
    // Only first error as `Result<(), jsonschema::ValidationError>`
    jsonschema::validate(&instance, &schema).await?;
    // Iterate over all errors
    for error in jsonschema::iter_errors(&instance, &schema).await {
        println!("{}", error);
    }
    Ok(())
}

This method is preferred if your schema includes external references, requiring non-blocking IO operations. The blocking API is available inside the blocking module. Use it if your schema does not contain any external references.

use serde_json::json;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let schema = json!({"type": "integer"});
    let instance = json!("a");
    // Boolean result
    assert!(!jsonschema::blocking::is_valid(&instance, &schema));
    // Only first error as `Result<(), jsonschema::ValidationError>`
    jsonschema::blocking::validate(&instance, &schema)?;
    // Iterate over all errors
    for error in jsonschema::blocking::iter_errors(&instance, &schema) {
        println!("{}", error);
    }
    Ok(())
}

If you need to validate multiple instances against the same schema, build a validator upfront:

use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let schema = json!({"type": "integer"});
    // Build once, reuse many times
    let validator = jsonschema::validator_for(&schema).await?;
    let instances = vec![json!(1), json!(2), json!("a"), json!(3)];
    for instance in instances {
        assert!(validator.is_valid(&instance));
        validator.validate(&instance)?;
        for error in validator.iter_errors(&instance) {
            println!("{}", error);
        }
    }
    Ok(())
}

Advanced Usage

Output formatting

jsonschema supports multiple output formats for validation results in accordance with the current proposal for the next version of the JSON Schema specification:

  • Flag
  • List
  • Hierarchical
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // ... omitted for brevity
    let hierarchical = jsonschema::evaluate(&instance, &schema).await.hierarchical();
    // Serialize validation output to JSON
    let serialized = serde_json::to_string(&hierarchical)?;
    Ok(())
}

Customization

use jsonschema::{Json, Draft, BuildResult, BoxedFormat, BoxedKeyword};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // ... omitted for brevity
    struct Resolver;

    impl<J: Json> jsonschema::ReferenceResolver<J> for Resolver {
        fn resolve_external(&self, url: &str) -> impl core::future::Future<Output = BuildResult<J>> {
            async { Ok(J::from_str("{}")?) }
        }
    };

    struct FixedSize {
        size: usize,
    }

    impl jsonschema::Format for FixedSize {
        fn is_valid(&self, value: &str) -> bool {
            value.len() == self.size
        }
    }

    fn fixed_size_factory<J: Json>(schema: &J) -> BuildResult<BoxedFormat> {
        Ok(Box::new(FixedSize { size: 43 }))
    }

    #[derive(Debug)]
    struct AsciiKeyword {
        max_size: usize
    }

    impl<J: Json> jsonschema::Keyword<J> for AsciiKeyword {
        fn is_valid(&self, instance: &J) -> bool {
            if let Some(string) = instance.as_string().map(AsRef::as_ref) {
                if string.is_ascii() {
                    return string.len() <= self.max_size;
                }
            }
            true
        }
    }

    fn ascii_keyword_factory<J: Json>(schema: &J) -> BuildResult<BoxedKeyword<J>> {
        Ok(Box::new(AsciiKeyword { max_size: 42 }))
    }

    let validator = jsonschema::ValidatorBuilder::default()
        .draft(Draft::Draft07)
        .resolver(Resolver)
        .format(
            "fixed-size-1",
            |schema| -> BuildResult<BoxedFormat> {
                Ok(Box::new(FixedSize { size: 5 }))
            }
        )
        .format("fixed-size-2", fixed_size_factory)
        .keyword(
            "ascii",
            |schema| -> BuildResult<BoxedKeyword<_>> {
                Ok(Box::new(AsciiKeyword { max_size: 42 }))
            }
        )
        .keyword("also-ascii", ascii_keyword_factory)
        .build(&schema)
        .await?;

    Ok(())
}

jsonschema's People

Contributors

stranger6667 avatar

Stargazers

tieway59  avatar Jonathan Daniel avatar

Watchers

 avatar  avatar

Forkers

tieway59

jsonschema's Issues

Roadmap

This is a somewhat detailed roadmap for the development of this crate and its predecessor jsonschema-rs. I haven't been very active with jsonschema-rs recently, but I have a desire to work on it together with the community towards 1.0.

Here I describe the vision for 1.0, how I see the path there, and how to make it actually happen.

This project has been a lot of fun for me over the years, I learned a lot building it and it was useful for me in other projects as well. I hope that it could serve the same purpose for others too.

Therefore I invite YOU to participate, especially if you have the use case for a fast JSON Schema validator and want to shape the development process. Feel free to open new issues/discussions and tag me there or ask any questions here (I want to make a discord chat eventually) or invite people who might be interested. A contributing guide will be added soon, but briefly, the contribution should be aligned with the goals described below.

I also plan to record the progress in a series of blog posts (monthly or so) in detail and share it here (and in the possible Discord channel / Twitter).

Please, let me know what you think!

❤️

Goals

I am dissatisfied with the number of decisions I made in the original implementation, and now I want to properly address them during the design phase.

  • Support all recent JSON Schema drafts (starting from Draft 4, including all output formats)
  • Better performance (less memory fragmentation, no intermediate allocations, real lazy validation)
  • Async ref resolving
  • Easier customization (custom keywords, format checks for Python)
  • More flexibility (in API, in the error type, generic JSON input, arbitrary number precision)
  • Simpler implementation (current boxing-everything and "combined" keywords drive me nuts)
  • Reusable components (I want to have a crate as a building block for JSON Schema canonicalization)
  • Feature parity in Python bindings. TBH, I'd like to have it as a drop-in replacement for jsonschema (or, at least, have near-feature parity)
  • Better project structure (workspace, extracting CLI and some other components into sub-crates),

Non-goals

  • Backward compatibility. Reaching the goals WILL break the compatibility, especially around errors (it should not be bound to validation input) and FFI (e.g format check functions aren't usable outside Rust)
  • Macro-based validator compilation. For a long time, I wanted to try code generation like Python's fastjsonschema does, so this crate can just build a big function and avoid all the state machine transitions, but I'd put it for later.
  • WASM. Great to have, and keep the new version compatible, but don't do anything specific.
  • Bindings for other languages. While desirable, I'd put it for later.
  • nostd. I think it could be possible to store validator internals in the provided block of memory, but it is complex and definitely could be done later on.

Core ideas behind the vision

Vec-backed tree

The first idea is to store all keyword implementations in a single Vec, and work with them as an ID-tree or more generally.

This should reduce the memory fragmentation (I tried this approach in css-inline and it worked great) and therefore, improve performance.

Iterating over such a structure also helps to make the iteration over errors actually lazy. The current implementation collects validation results from subtrees into heap-allocated memory.

Any other thoughts on using state machines are welcome, not sure if this approach would be the most appropriate here

Independent storage

Then, store all the keyword data in a format that is independent of the input, i.e. avoid the dependency on serde_json.

This will remove overhead in Python bindings (which is often >70%), as there will be no need for direct serialization/deserialization (though there could be some intermediate costs anyway). Users will be able to implement it for their types, i.e. for some FFI wrappers, and use this library with C/C++ codebases.

Resolve external references upfront

Resolving all reachable external references upfront makes it possible to build a state machine that will properly address recursive references. Currently, resolving happens in runtime and there is RWLock which adds a sensible overhead.

This also isolates resolving as a separate step, making it easier to provide blocking/non-blocking interface to it.

Plan

On the high level, I think that the progress towards 1.0 could be split into a few phases:

  1. Updating the current repo
  2. Refine the API design
  3. Reference resolving
  4. State machine builder
  5. State machine transitions
  6. Implementing vocabularies (validation, applicators, etc)
  7. Extensions
  8. Bindings
  9. Migration

There is a description for each of them below. Over time I am going to expand each phase into a separate issue with more details.

Update jsonschema-rs repo

Fixing issues that need immediate attention + updating the README to reflect the plans.
There are a few issues I want to address in the main repo before diving into this one, specifically:

API design

Here I think there are still some unclear corners around custom validators, format checkers, and the validation output. I.e. I want to be sure that the API design is ergonomic, and feasible to implement.

Reference resolving

This phase will require implementing a more or less draft-agnostic reference resolving (in a separate crate), which will be pluggable into the main crate.

I took a lot of inspiration from Python's referencing and implemented some similar logic, but it is incomplete. I am also not sure if it should be that generic as referencing, because the scope here is only JSON Schema.

The implementation should be built around a generic JSON access interface, which is more or less done in the jsonlike crate.

It is great that there is a referencing test suite, so it can be verified :)

State machine compiler & execution

This part takes the input document, and all resolved resources and builds a state machine where each transition leads to another validation state and the next step performs some action. I.e. validation keywords can transition either to an "error" state or "pass" state, applicators are more complicated but I believe feasible to implement too.

Probably worth taking some inspiration here - however, I am leaning more towards DFA

Vocabularies

Each keyword implements some aspect of JSON Schema validation and should be reflected in the state machine execution.

Extensions

At this point, the overall structure should be clear and it is time to implement extensions like custom keywords, format checkers, etc. I am thinking about the approach taken in minijinja, i.e. it is a boxed function (Arc<dyn Fn>)

Bindings

After the API of the main crate is ready, the Python bindings will be updated. Generally, I don't see many issues here, but need to ensure that all types in the main crate are compatible with making the Python bindings flexible enough.

Migration

Replace all content in the jsonschema-rs repo with the new one, write the migration guide + rename the repo to jsonschema avoiding the -rs suffix.

The `assert_array_iter` test is unclear

assert_eq!(integer, CustomInteger::new(1));
}

This test assumes the array is an integer array, and contains a 1 in the beginning. But this is a little over what the test name conveys.

Since this test is used in another crate in this package. I prefer to give it a better name or remove the "value check" contained in it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.