pyeve / cerberus Goto Github PK

View Code? Open in Web Editor NEW

3.1K 3.1K 236.0 2.67 MB

Lightweight, extensible data validation library for Python

Home Page: http://python-cerberus.org

License: ISC License

Python 100.00%

data-validation python

cerberus's People

Contributors

Stargazers

Watchers

Forkers

martijnvermaat hvdklauw chronidev ykhalyavin k-pom arshsingh hachaboob vishwajith eelkeh egabancho peterdemin girogiro tony7126 baubie eode vidz1979 ebmajor lujeni kynan ch3pjw da4089 joshvillbrandt helgi dangyogi hthieu1110 nikitavlaznev russellluo svisser opticode entropiae brettatoms qfinsoft otibsa jminh aleksey-kutepov pavlov99 goneri menudoproblema funkyfuture vinilios mdomke kangkot alanderex calve rredkovich machu-gwu dkellner ndenev finwood stratosgear dnohales carlos-alberto freylis pombredanne stevegzr stefanoterna mmellison txomon rahul91 stijnvanhoey mackjoner philwhln jonathanhuot jeanpralo quantumghost lfdesousa wuchengjiang daviskirk nicoddemus awesome-python imoapps gramotei gitter-badger omorillo inirudebwoy aguirrel gilbsgilbs alub augustand tm57 gwainer sawon1234 elecay bornfreesoul davidt99 mbelletti trandiepbang frexvahi harvimt headrun kornholi cjhoffice audricschiltknecht i-made amedeo91 fomars amauricio 1cph93 admjls hieueastagile

cerberus's Issues

Video walkthrough to the website

We have an YouTube recording of Nicola giving an excellent presentation about EVE from yesterday: https://www.youtube.com/watch?v=DppRTGeIoV0

Though most developers like read text, some still watch videos, especially for introduction material. I was thinking would it be ok to drop this video to EVE website http://python-eve.org/ (video material gets old quite fast, though)

I can make a pull request for this if it is ok by the authors.

TypeError "takes at at least 2 arguments" in _validate_keyschema

Between 0.7 and 0.7.2 the _validate_keyschema function has changed.

0.7 https://github.com/nicolaiarocci/cerberus/blob/bc52d9f39b0f90ea9656e734119b63c8b47945f0/cerberus/cerberus.py#L356

def _validate_keyschema(self, schema, field, value):
    for key, document in value.items():
        validator = self.__class__(schema)
        validator.validate({key: document}, {key: schema})
        if len(validator.errors):
            self._error(field, validator.errors)

0.7.2 https://github.com/nicolaiarocci/cerberus/blob/0d2486f53becf757f7824ff9c40d92635b759a86/cerberus/cerberus.py#L403

 def _validate_keyschema(self, schema, field, value):
    for key, document in value.items():
        validator = self.__class__()
        validator.validate(
             key: document}, {key: schema}, context=self.document)
        if len(validator.errors):
            self._error(field, validator.errors)

But 0.7.2 gives me the following traceback in my Eve application:

Traceback (most recent call last):
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/eve/endpoints.py", line 55, in collections_endpoint
    response = post(resource)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/eve/methods/common.py", line 229, in rate_limited
    return f(*args, **kwargs)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/eve/auth.py", line 57, in decorated
    return f(*args, **kwargs)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/eve/methods/common.py", line 675, in decorated
    r = f(resource, **combined_args)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/eve/methods/post.py", line 146, in post
    validation = validator.validate(document)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/cerberus/cerberus.py", line 160, in validate
    return self._validate(document, schema, update=update, context=context)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/cerberus/cerberus.py", line 216, in _validate
    validator(definition[rule], field, value)
  File "/Users/roessland/.virtualenvs/**********/lib/python2.7/site-packages/cerberus/cerberus.py", line 403, in _validate_keyschema
    validator = self.__class__()
TypeError: __init__() takes at least 2 arguments (1 given)

Reverting it back to validator = self.__class__(schema) makes the Eve API work correctly again.

It seems like this change was done in this commit: 7d04a1c

Is this a bug?

Allow to pass error-handlers to a `Validator`-instance

this is a follow-up to #89 and #90.

i propose to introduce error-handlers in order to allow to deal with errors more flexible.

to achieve that:

Validator.__init__ takes an optional error_handler-object
Validator._error is extended, so it stores the following data about an error:
- trail - a list that represents the path to the field in the document (eg: ['a_dict', 'a_list']); a common prefix can be specified upon calling Validator.validate
- field, value - as of now
- constraint - the constraint that failed
- message - a simple error message, like currently implemented
Validator.errors calls the format(?)-method of error_handler that may return errors in a desired format and / or do whatever its purpose is

there will be one default-handler and two as reference-implementations:

BasicErrorHandler
- returns errors as now
- but concentates trail and field
HumanReadableErrorHandler
- is targeted to end-users
- concentates trail and field
- suggests valid keys, in case of an unallowed value
- the index of list-items is increased by one
- list-items are prefixed with item #
YamlErrorHandler
- structures errors in a dictionary
- joins it into a yaml-file

allow_unknown in schema dict broken

The cerberus documentation gives an example of setting allow_unknown in your schema dictionary. However, even when following the example verbatim, cerberus chokes

https://cerberus.readthedocs.org/en/latest/#allowing-the-unknown

from cerberus import Validator
v = Validator()
schema = {
  'name': {'type': 'string'},
  'a_dict': {
    'type': 'dict',
    'allow_unknown': True,
    'schema': {
      'address': {'type': 'string'}
    }
  }
}
v.validate({'name': 'john', 'a_dict':{'an_unknown_field': 'is allowed'}}, schema)

produces

cerberus.cerberus.SchemaError: unknown rule 'allow_unknown' for field 'a_dict'

`allow_unknown` does not apply to sub-dictionaries in a list

#40 doesn't seem to apply when the sub-dictionary is in a list.

To reproduce:

import cerberus
cerberus.__version__
# '0.8'
v = cerberus.Validator(allow_unknown=True)
schema = {
    'a_dict': {
        'type': 'dict',
        'schema': {
            'address': {'type': 'string'},
            'city': {'type': 'string', 'required': True}
        }
    }
}
document = {
    'a_dict': {
        'address': 'my address',
        'city': 'my town',
        'extra': True
    }
}
v.validate(document, schema)
# True
schema_w_list = {
    'list_o_dicts': {
        'type': 'list',
        'minlength': 1,
        'schema': {
            'type': 'dict',
            'schema': {
                'address': {'type': 'string'},
                'city': {'type': 'string', 'required': True}
            }
        }
    }
}
document_w_list = {
    'list_o_dicts': [{
        'address': 'my address',
        'city': 'my town',
        'extra': True
    }]
}
v.validate(document_w_list, schema_w_list)
# False
v.errors
# {'list_o_dicts': {0: {'extra': 'unknown field'}}}

On a side-note, cerberus is awesome.

Allow maxlength/minlength on lists and dicts

For instance we have an i18n system.
The user has to post a title as a dict with as key the language code and as value the actual string. It doesn't matter which one you provide but you have to provide at least one.

We can generate the schema based on the languages we know we support, but none of them are really required. Having a minlength: 1 on the dict would be awesome.

Validator.validate_schema() not implemented

Hello!

I have installed Cerberus with 'pip install cerberus'. The version installed is Cerberus==0.7
I have a problem when i do:

document_schema = {'curr': {'maxlength': 3, 'minlength': 3, 'required': True, 'type': 'string'},
'dep': {'minlength': 1, 'required': True, 'type': 'string'}}
v = Validator(document_schema)

v.validate_schema(document)

AttributeError Traceback (most recent call last)
in ()
----> 1 v.validate_schema(document)

AttributeError: 'Validator' object has no attribute 'validate_schema'

According to the documentation, Cerberus==0.7 have implemented this function, but not is implemented. I need this function to validate the schema before to validate the documents in runtime. I can not afford a schema exception in runtime.

Any solution?

A lot of Thanks! ;-)
Cerberus is fantastic.

Request: Allow pyyaml to be used in unit tests

I use YAML to write the schema in my applications, which is one of the greatest features of cerberus in my mind. I can write a schema for a configuration file in plain-text and load it into a nested dict.

I think the documentation should list this as a feature, but I'd also like to be able to write unit tests in YAML, it is a much cleaner syntax than nested a nested dict.

Proposal: Propose possible value in case of unallowed key

ciao,

i'm thinking of adding a fuzzy string-matcher to extend the error message with a proposal which refers to an allowed key here: https://github.com/nicolaiarocci/cerberus/blob/master/cerberus/cerberus.py#L274

has anyone thoughts on how to opt-in/-out? should that behaviour be default? shall it be set per validator or per validation?

FR: Consider adding 'email' as a core type

It'd be useful if cerberus had 'email' as a core type so users don't have to find a regex to validate emails, etc. It's just a small thing to do, but given how almost every web app will want to validate email addresses, it'd speed up using the library if it was core type. Otherwise users have to learn how to add custom types, write a regex, etc. Adding a 'regex' parameter to the string type would also be useful.

Just a small thing, but I think it'd be helpful.

Validating values of dicts with arbitrary/unknown names of keys - is it possible..?

Having a document like this one:

    document = {
        'aaa': {
            'bbb': [
                {'ddd': {'xxx': 123, 'yyy': 'some string'}},
                {'eee': {'zzz': 555}},
            ],
            'ccc': [
                {'ddd': {'xxx': 789, 'yyy': 'some other string'}},
            ]
        }
    }

...I can validate it with the following schema:

    schema = {
        'aaa': {
            'type': 'dict',
            'schema': {
                'bbb': {
                    'type': 'list',
                    'schema': {
                        'type': 'dict',
                        'schema': {
                            'ddd': {
                                'type': 'dict',
                                'schema': {
                                    'xxx': {'type': 'integer'},
                                    'yyy': {'type': 'string'},
                                },
                            },
                            'eee': {
                                'type': 'dict',
                                'schema': {
                                    'zzz': {'type': 'integer'},
                                },
                            }
                        }
                    }
                },
                'ccc': {
                    'type': 'list',
                    'schema': {
                        'type': 'dict',
                        'schema': {
                            'ddd': {
                                'type': 'dict',
                                'schema': {
                                    'xxx': {'type': 'integer'},
                                    'yyy': {'type': 'string'},
                                },
                            }
                        }
                    }
                }  # /ccc
            }
        }  # /aaa
    }

...but the thing is, I need the possibility to use arbitrary names for keys bbb and ccc (and yes, there may be an arbitrary number of them on this level).
In other words, how can I validate their values (which are lists of ddd and eee dicts, and those dicts always have the same structure), without knowing their names..? Is something like that possible with Cerberus..?

default keyword

Although not part of validation, where would be best place to specify default values?

parameter-description for `context` is missing in `validate`'s docstring

it would be good to have a good description for every parameter in a docstring.

Output errors as dict

Shouldn't it permit passing/posting multiple items for a list-type field?

Given this field schema (snippet):

tels: {
  'type': 'list', 
  'items': [{
      'type': 'dict', 
      'schema': {
          'text': {'type': 'string', 'required': True},
          'ext': {'type': 'string'},
          'note': {'type': 'string', 'maxlength': 64}
      }}]}

and this test snippet:

  key = 'Apple'
  doc = {
      'cnam':  key,
      'tels': [{
          'text': self.random_string(8),
          'ext': self.random_string(3),
          'note': self.random_string(16),
      },{
          'text': self.random_string(8),
          'ext': self.random_string(3),
          'note': self.random_string(16),
      }],
    }
  payload = {}
  payload[key] = json.dumps(doc)
  r, status = self.post('/%s/' % url, data=payload)

Upon validation it returns:

"'tels': lenght of list should be 1"

Note mispelling should be length. :)

Shouldn't it permit passing/posting multiple items for a list-type field?

Specify Python versions supported

Specify which versions are supported (including the version major identifiers) using trove classifiers.

This helps show people which versions the project supports and also tools like caniusepython3.

Proposal: Validation of dictionary-keys

unless i overlooked something, it is not possible to validate the dictionary-keys of a document respectively a dictionary in it. with jsonschema this is done with the patternProperties. also, the key-types should be validatable.

it should be enough to enable checks for dicts:

>>> v = Validator({'type': 'dict', 'propertyschema': {'type': 'string', 'regex': '^[a-zA-Z]$'}})
>>> v.validate({'mapping': {'foo': 'bar'}})
True
>>> v.validate({'mapping': {'foo_bar': 'foobar'}})
False
>>> v.validate({'mapping': {1: 2}})
False
>>> v.schema = {'type': 'dict', 'propertyschema': {'type': 'int', 'min': 0, 'max': 9}}
…

and one can trick to check the document's top-level-properties, which would be worth to be noted imo:

>>> v.validate('document': document)

though i'm not very happy with the term keyschema as of now, because it's confusing; from a pythonic point of view, valueschema is much less ambigious, imo.

if there is a consent regarding the schema-design, i may be going to implement this.

Use cases

Use Case 1 - record level validation:
Say I have 2 fields. "Amount" and "Calculation Type". Amount is always a float, however when calculation type is "Percent", Amount must be bounded from 0 to 100.

Use Case 2 - table level validation:
Say I have multiple records and after grouping certain fields, no duplication can be present.

Use case 3 - Multiple coercion functions:
I need to convert a value to string, and perform a couple of other functions before validation. Any tips here?

Appreciate your help! This is probably one of the coolest python projects I've seen in a while.

custom fields on list of objects dont work properly

sample schema:

{'a': {'schema': {'b': {'oid': 'here', 'type': 'string'}}, 'type': 'list'}}

sample doc:

{'a': [ { 'b' : '33' } ] }

custom validation function for oid:

    def _validate_oid(self, *args):
        print args

output:

SchemaError: unknown rule 'b' for field '0'

If I change line 298-299 in cerberus.py to

validator = self.__class__(schema)
validator.validate(value[i])

output:

('here', 'b', '33')

as it should be.

allow_unknown does not respect custom validators

The Cerberus API allows custom validation properties and custom validation types to be added by creating a class that inherits from cerberus.Validator. It also allows validation rules to be supplied for arbitrary fields, via the allow_unknown property. However, the schema for unknown properties cannot make use of custom validation properties and custom validation types.

For example:

from cerberus import Validator

class CustomValidator(Validator):

    def _validate_type_foo(self, field, value):
        if not value == "foo":
            self.error(field, "Expected a foo")


v = CustomValidator({})
v.allow_unknown = {"type": "foo"}

v.validate( { "fred": "foo", "barney": "foo" } )

I would expect the call to .validate() to approve that document, but instead
I get a traceback:

Traceback (most recent call last):
  File "test-cerberus.py", line 13, in <module>
    v.validate( { "fred": "foo", "barney": "foo" } )
  File "cerberus/cerberus.py", line 165, in validate
    return self._validate(document, schema, update=update, context=context)
  File "cerberus/cerberus.py", line 228, in _validate
    self.allow_unknown})
  File "cerberus/cerberus.py", line 118, in __init__
    self.validate_schema(schema)
  File "cerberus/cerberus.py", line 278, in validate_schema
    errors.ERROR_UNKNOWN_TYPE % value)
cerberus.cerberus.SchemaError: unrecognized data-type 'foo'

update mode in subdocuments

How can I make that subdocuments work in update mode (aka no validate required fields)

Facilitate type conversions

This might be totally out of your intended scope of Cerberus, and the implementation I have in mind is not 100% pretty, but let me explain my use case :)

Data I want to validate can come in serialized as JSON, but also as HTTP form data, or even query string parameters. In the latter cases, everything is basically a string. You could also consider datetime values in JSON.

I'd like to cast/convert some values (e.g. integers serialized as strings in HTTP form data) in the Validator, such that only one pass over the schema and data is needed and validation rules on the converted value can be used as usual. This conversion should then be done before the other validation rules are applied.

Currently I use the following hack (example) in a Validator subclass to do any conversions in the type rules:

def _validate_type_integer(self, field, value):
    if isinstance(value, basestring):
        try:
            self.document[field] = int(value)
        except ValueError:
            pass
    super(ApiValidator, self)._validate_type_integer(field, self.document[field])

To make sure type rules are applied first, I applied this patch (subclassing or monkey-patching would basically mean duplicating the entire _validate method): martijnvermaat/cerberus@dd0de1f85

List and dictionary types is where it gets more ugly. Some hacking is necessary to make sure converted values end up in the top-level validator document.

After validation, I can then take the data from the validator document field and directly work with the converted values.

The alternative is do the conversion before validation, but that would mean duplicating Cerberus' schema parsing and document traversal.

So this is what I do now, and it works fine, but I'm not happy about having to use the patched _validate method. (In practice I have more intricate data conversions that just strings to integers.)

Other issues I see:

The naming "validate" no longer accurately describe what's being done.
You might not like the idea of the input document being modified (solution could be to generate a copy with converted values).

What are your thoughts? Do you see this as a valid use case? I could see this implemented in a cleaner way by having optional _convert_integer etc rules, which are always applied first.

Rename `keyschema` to `valueschema`

this is a seperated aspect of #83:

renaming keyschema to valueschema
- makes the terminology more pythonic and unambigious in that way
- keyschema will be an alias for valueschema
  - should somehow yell out, that it is deprecated
  - assuming that client code is tested, it could even raise an exception; that may be a step in a later major release
- the sooner the better

so far there's only been affirmative feedback.

Inline schema validation

Hello. My idea is validate the value not only by type, but also by another scheme. For Example:

schemas = {}
schemas['dog'] = {
  'name': {
     'type': 'string'       
   },
   'owner': {
     'schema': 'person' # validate not by type, but by "person" schema 
   }
}

schemas['person'] = {
  'name': {
    'type': 'string'
  }
}

v = Validator(schemas)

How about this?

Dependency on a sub-document field doesn't work

Ideally I should be able to define a dependency on field.subfield.

Proposal: option to purge unknown fields on validation

I have the need to apart from validating the data to also remove unknown fields. I couldn't see an easy way in _validate to get a direct reference to the document element being parsed so for now I subclass Validator and do a clean post validation based on the errors dict, see this gist. I was wondering, has this been discussed before, would it be something to include, and how to best handle it via _validate?

suggestion to make it easier to find this project

Looks like cerberus is exactly what I was looking for. Bad news is that I was searching for schema and validation keywords on PyPI and didnt notice your package... By lucky, because you just did a release today, I noticed your project.

I would suggest you to add the word schema somewhere in the project description and/or keywords. So people can find your project! thanks

Allow cross-field validation

See nicolaiarocci/eve#295

FR: Add 'set' as a core type

It would be useful to have a 'set' type.

Validator options not passed to "sub-schemas"

To validate a nested dict I am using the "schema" rule to define the rules for each level of the dict. The schema rule creates a new Validator object with the schema defined by the rule and passes the corresponding value in the document to the validate function.

However, the options of the main Validator instance are not passed to these sub-Validator instances. So, if the "allow_unknown" attribute is set to true in the main Validator, it does not get set for all sub-Validators.

My idea would be to add a "set_parent" method to the Validator class that would set the "parent" attribute to some Validator instance and copy all setting attributes from it (so allow_uknown, ignore_none_values, etc would be copied).

I think the idea of Validators having parents would be useful in general because then custom validator rules could then access data from an arbitrary position in the main schema no matter how far down they were created. If the documents passed into validate function also knew about their parent, then you could write validation rules that check other fields in the document to decide if the current field is valid (which is what I ultimately want to do).

Proposal: add exclusion-checks

it'd be handy to declare properties to exlude others:

{'property_a': {'excludes': 'property_b'},
 'property_b': {'excludes': ['property_a', 'property_c']}

if a property is not present in the document, excluded by a present one, but marked required, the requirement should be dismissed. this way mutual exclusive requirements are possible:

schema = \
{'property_a': {'excludes': 'property_b', 'required': True},
 'property_b': {'excludes': 'property_a', 'required': True}}

valid_documents = \
[{'property_a': 'foo'}, {'property_c': 'bar'}]

invalid_documents = \
[{'property_a': 'foo', 'property_b': 'bar'}, {'property_c': 'baz'}]

since i can make use of this and may be implementing the next days, i'd appreciate any thoughts on this.

ERROR_EMPTY_BAD_TYPE is not an error I exepct in the validator.errors

v = Validator({'field': {'required': True, 'type': 'string', 'empty': False}})

v.validate({'field': 1})

I expected v.errors to only contain:

value of field 'field' must of string type

but it also contained:

'empty' rule only applies to string fields

Which I would expect to get when my schema defined it for any other type then string, not when I validate something that should contain a string but doesn't.

Allow a linelength of 120 chars?!

i'm really much for pep8-compliance. however i think that nowadays there's hardly a need to restrict line-length to 80 characters. which also results in rather worse readable code due to lots of line-continuations.

what about setting the allowed line-length to 120 characters?

Permit min/max with floats

Currently min/max can only be applied to ints. It'd be helpful if this was extended to floats as well.

Thanks

Schema aren't validated when constructing a Validator

This issue is more a matter of philosphy than a bug, but I thought I'd write it down anyway.

Impact:

If a schema contains an error then the validating program using the schema will only crash if it receives data pertaining to the erroneous part of the schema. This can make debugging hard.

Reproduction:

import cerberus
schema = {
    'ok': {'type': 'string'},
    'poop': {'not_valid_key_for_schema': 'wrong'}}
validator = cerberus.Validator(schema)  # Works fine even though validation of 'poop' will fail
validator.validate({'ok': 'everything is awesome'}
validator.validate({'poop': 'ah, it broke'})

Suggested behaviour:

I wonder whether when creating a validator, or updating it's schema, cerberus should validate the schema itself against an internal schema for what schema definitions can look like. This would avoid only hitting mistakes in schema definition when runtime data hits that part of the schema, which could be quite rare.

Penny for you thoughts?

Proposal: custom coerce-methods

i think it's better to include this feature before 0.9 is released. and as there is a reliable pattern for types, it wouldn't be much effort i guess.

one could also add an example to the docs and tests that illustrates a subclassed Validator that makes use of coercing in conjunction with contextual instance-properties.

Why does the list type exclude strings?

I don't understand the rationale, and I'm hoping you might explain why you did this.

Thank you for your work!

edit
I'm creating an endpoint in eve, and I need to reference several other resources, but I don't see the point of telling the consumer that they are consuming IDs like this:

    related:
        [ {id: "1axZ"}, {id: "F2mA"}, {id: "uM5q"} ]

I would like to do this:

    related:
        [ "1axZ", "F2mA", "uM5q" ]

Maybe I'm wrong, though, in how this should be constructed.

Request: Allow list of schema to be validated against

It would be useful to allow the schema key and to be a list, which would be interpreted as a set of possible schemas that should be applied and the entry should validate if any of the schemas validate. It is already possible to give a list of types, but if I want to make sure that a value is a dict that has one of two possible formats, it is not possible without creating validate_type* functions.

validictory allows this because it handles custom types in a way to the proposal in Issue #96

Can't install with pip (Python 3.3)

I'm unable to install cerberus either from pypi or git. I'm actually trying to install eve, but cerberus seems to be the culprit (something in the LICENSE file?):

Downloading/unpacking cerberus
  Downloading Cerberus-0.3.0.tar.gz
  Running setup.py egg_info for package cerberus
    Traceback (most recent call last):
      File "", line 16, in 
      File "C:\dev\misc\eve\env\build\cerberus\setup.py", line 17, in 
        license=open('LICENSE').read(),
      File "C:\dev\misc\eve\env\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 287: character maps to undefined>
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "", line 16, in 

  File "C:\dev\misc\eve\env\build\cerberus\setup.py", line 17, in 

    license=open('LICENSE').read(),

  File "C:\dev\misc\eve\env\lib\encodings\cp1252.py", line 23, in decode

    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 287: character maps to undefined>

Pre-processing and collecting valid data

During validation of a GET parameters I'd like to preprocess them first.
For example we have the following query

GET /resource?type=foo,bar,baz&relation_ids=[1,2,3]&count=true

Parsed parameters become a dictionary like this

parameters = {
    "type": "foo,bar,baz",
    "relation_ids": "[1,2,3]",
    "count": "true"
}

So validating against schema like {"type": {"type": "string"}, "count": {"type": "string"}, "relation_ids": {"type": "string"}} doesn't make much sense.
This is why I'd suggest to enable preprocessing rule for each field before validation.

Collecting valid data may be also useful. Because otherwise we would perform the same processing of input parameters executing exactly the same code twice.

Pseudo code may describe the idea better

schema = {
    "type": {
        "type": "string_list",  # custom type
        "prepare": True  # invokes self._prepare_type() before any validations
        "store_to": "types"  # new field name in processed document
        "empty": False
    },
    "relation_ids": {
        "type": "ints_list",  # custom type
        "prepare": True,  # self._prepare_relation_ids()
        "store_to": "relation_ids",
        "empty": False
    },
    "count": {
        "type": "boolean",
        "prepare": True,  # self._prepare_count()
        "store_to": "count",
        "empty": False
    }
}

class HTTPAPIValidator(cerberus.Validator):
    def __init__(self, *args, **kwargs):
        self.prepare = kwargs.pop('prepare', False)  # global switch, off by default
        super(HTTPAPIValidator, self).__init__(*args, **kwargs)
        self._processed_document = {}

    @property
    def processed_document(self)
        return self._processed_document

    def _prepare_type(self, field, value):
        try:
            return value.split(',').strip('[]')
        except (TypeError, ValueError):
            self._error(field, 'cannot process field {0}'.format(field))

    def  validate(self, *args, **kwargs):
        # add logic for storing to self._processed_document after validation

Currently, the same may be implemented subclassing cerberus.Validator and the schema:

schema = {
    "type": {
        "type": "string", "empty": False,
        "split_to_field": "types"  # invokes self._validate_split_to_field(self, target_field, field, value)
    },
    "relation_ids": {
        "type": "ints_list", "empty": False,
        "int_split_to_field": True,  # self._validate_ints_split_to_field
    },
    "count": {
        'type': 'string', 'empty': False,
        'bool_to_field': 'count',
        'allowed': ['true', 'false']
    }
}

default values for resource items

Please consider adding support for default values.

Thank you. :)

Add unique in list validator

I use this custom validator a lot - it might be useful for others as well. It is most useful for objects with embedded lists of objects.

    def _validate_unique_in_list(self, unique_in_list, field, value):
        """ Enforce uniqueness of fields listed in unique_in_list against a
        list of objects in value.
        """
        # init error object
        errors = {}

        # force input to list
        unique_fields = unique_in_list
        if type(unique_fields) is not list:
            unique_fields = [unique_fields]

        for unique_field in unique_fields:
            # build hash set
            hashes = []
            for i, channel in enumerate(value):
                if isinstance(channel[unique_field], dict):
                    h = hash(frozenset(channel[unique_field].items()))
                else:
                    h = hash(channel[unique_field])
                hashes.append(h)

            # log duplicates
            for i, h in enumerate(hashes):
                if hashes.count(h) > 1:
                    if str(i) not in errors:
                        errors[str(i)] = {}
                    errors[str(i)][unique_field] = \
                        "value '%s' must be unique in list" % \
                        channel[unique_field]

        # report errors
        if len(errors) > 0:
            self._error(field, errors)

This can be used like this where unique_in_list is a single field name or a list of field names:

'field_name': {
    'type': 'list',
    'unique_in_list': '_id',
    'schema': { ... }
}

Another improvement that could be made would be modifying the self._error() function to extend the errors object instead of assigning field to errors. This would allow me to have multiple self._error() calls without superfluous structure in the resulting errors object.

Proposal: allow explicit rules per type

atm it is possible to test a value to be one of multiple types, eg:

'some_field': {'type': ['a_type', 'b_type']}

for simple cases this is fine. however:

in case it's a dict, you can't test schema if list is also allowed
any other rule can only be same for each possible type
schema-definitions become more or less unclear:

{'a_field': {'type': ['dict', 'list', 'string'],
             'schema': {'type': ['dict', 'string'],
                        'regex': 'foo.*',
                        'keyschema': {'type': 'string',
                                      'regex': '.*'}},
             'keyschema': {'type': 'string',
                           'regex': '.*'},
             'regex': 'foo.*'}}

so, i propose to extend and also allow this notation:

{'a_field': {'type': {'dict': {'keyschema': {'type': 'string', 'regex': '.*'}},
                      'list': {'schema': {'type': {'dict': {'keyschema': {'type': 'string', 'regex': '.*'}},
                                                   'string': {'regex': 'foo.*'}}}},
                      'string': {'regex': 'foo.*'}}}}

so, allowed types can also be mappings whose value is a schema that will be validated against value if the key is recognized as valid type of value. thus (not in the example), rules could be very different depending on the actual type. it seems quiet easy to implement and shouldn't break anything.

though that idea came up while debugging as a more-or-less-workaround at first, i still find it's not a bad idea. imo, its sexyness is caused by its explicit nature. atm i don't need it, but that may change any moment.

Required if another field is present

Is this supported by any chance? I am sorry, could not find a mailing list make this question so I posted it here.

Return doc/sub_doc and/or offset to failing items in a list.

When a list item fails, how would you know WHICH item(s) in the list failed? Would it help to return the offending item with the error? Perhaps list offsets pointing to failed items?

This may also relate to results returned when posting a list of base docs.

why wrap _validate with validate and validate_update ?

if all the validate and validate_update methods do is just pass the update parameter, why not just let the user do it ?

this is much better :

validate(document)
validate(document, update=True)

than this:

validate(document)
validate_update(document)

% string formatting doesn't allow missing parameters

Consider using str.format() methods so that if someone wants to override the error.* messages you don't get an error if the %s is missing new error message.

>>> 'some message' % 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting
>>> 'some message'.format('x')
'some message'
>>>```

Propose: allow_unknown should be configurable separately per each scheme like

Something like this:

schema = {
    'a_dict': {
        'type': 'dict',
        'allowed_unknown': True,
        'schema': {
            'address': {'type': 'string'},
            'city': {'type': 'string', 'required': True}
        }
    }
}

Why? Our use case:
We write validation of big json with mixed data. Some parts of this structure are the REST response from the external service which is under active development and often change data format, other parts is our data (they more or less stable). It's very handy to disable/enable "allowed_unknown" for some parts of this struct during "step by step" api stabilization.

[BUG] Nested document validation is broken

Nested document validation seems to be broken in master branch. After a bit of debugging it seems that this is due to the fact that self.document attribute of subdocument validators gets assigned with a copy of the contents of the parent document (context) causing further validation steps to fail.

To reproduce

import cerberus

schema = {
    'info': {
        'type': 'dict',
        'schema': {
            'name': {'type': 'string', 'required': True}
        }
    }
}

validator = cerberus.Validator(schema)
res = validator.validate({'info': {'name': 'my name'}})
if not res:
    print validator._errors

`readonly` validation should happen before any other validation

Currently, when a field is marked as readonly and has a custom validation rule, that validation rule gets executed before readonly is checked. Ideally, if 'readonly': True is provided, that should happen before any other validation since it's pointless to validate a value that isn't allowed.