Code Monkey home page Code Monkey logo

formic's Introduction

formic's People

Contributors

boldewyn avatar darobin avatar mathiasbynens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

formic's Issues

merge algorithm output depends on input key order

It looks like the algorithm for merging conflicting usages will produce different results depending on key order:

<form enctype='application/json'>
  <input name='mix' value='scalar'>
  <input name='mix[0]' value='array 1'>
  <input name='mix[key]' value='key key'>
</form>

produces

{
    "mix":  {
        "":     "scalar"
    ,   "0":    "array 1"
    ,   "key":  "key key"
    }
}

But:

<form enctype='application/json'>
  <input name='mix[0]' value='array 1'>
  <input name='mix' value='scalar'>
  <input name='mix[key]' value='key key'>
</form>

produces

{
    "mix":  {
        "0":    "array 1"
    ,   "1":     "scalar"
    ,   "key":  "key key"
    }
}

This is probably fine when relying on the DOM order on the client, but on the server it can produce seemingly random inconsistencies when using dictionary / hash structures with arbitrary key ordering.

Should the spec be updated to handle this situation?

Enforce CORS (was: Security section)

Is it worth including a security section? This opens up the possibility of attacking a JSON service with CSRF. If you rely on the likes of cookies for your service without any CSRF protection you likely deserve whatever's coming your way, but it's still a new attack.

set values

Is the example in the spec on how to set the values?

Lets say I have data in json, and a blank form and I want to set the forms fields values?

Allow for Escaped Characters and Numeric Names through Bracket Quotation

The bracket notation is unquoted which means restricted characters are unavailable and numerals can not be used as JSON names. The JSON spec is flexible in this regard so it would be preferred if HTML JSON encoding can produce the full range of allowed JSON names to avoid creating a subset of JSON encoding.

To remedy this, the bracket notation can be updated to allow a quotation syntax using either single or double quotation marks for non-numeric values. The values within the quotations can be escaped using the backslash character to allow restricted characters. Numerical names can be quoted to avoid creating sparse arrays.

To retain the conciseness of simple names, the addition of "dot notation" would allow for deep object structures to be created concisely. For the set of allowed dot notation first characters it would be preferable to use the javascript syntax rules for uniformity with javascript property accessors.

Note that HTML entity resolution has to be resolved prior to the JSON name parsing and the entities need to be escaped if they resolve to a restricted character which would break the name parsing algorithm. The final stage would be to encode the name as a JSON string with escaping as necessary.

The set of restricted characters for the quoted JSON name should be the same as the JSON specification with the explicit inclusion of the single quote character to allow for the flexibility of HTML quotation characters. This character should be unescaped during encoding to conform exactly with the JSON specification and ensure interoperability with all JSON implementations.

Examples:

<input name="object['abc']">

  { "object" : { "abc": "" } }


<input name="object['123']">

  { "object" : { "123": "" } }


<input name="object['abc[]']">

  { "object" : { "abc[]": "" } }


<input name="object['abc\\']">

  { "object" : { "abc\\": "" } }


<input name="object['abc\'']">

  { "object" : { "abc'": "" } }


<input name="object[&quot;abc\&quot;&quot;]">

  { "object" : { "abc\"": "" } }


<input name="object['abc&#x0022;']">

  { "object" : { "abc\"": "" } }


<input name="object['abc\\&#x0022;']">

  { "object" : { "abc\\\"": "" } }


<input name="object['abc\u0022']">

  { "object" : { "abc\u0022": "" } }


<input name="object['abc\\u0022']">

  { "object" : { "abc\\u0022": "" } }


<input name="wow.such.deep[3].much.power['!']">

  {
    "wow":  {
        "such": {
            "deep": [
                null
            ,   null
            ,   null
            ,   {
                    "much": {
                        "power": {
                            "!":  "Amaze"
                        }
                    }
                }
            ]
        }
    }
}

Scalar to Array/Object casting

Without having fully understood the encoding path algorithm, I'm wondering how the following converts to JSON:

<input name="mix" value="alpha">
<input name="mix[0]" value="bravo">
<!--
  { mix: ["bravo"] } OR
  { mix: {"": "alpha", "0": "bravo"} }
  mix=alpha&mix[0]=bravo
-->

considering that two name="mix" elements make an array, do two name="mix[0]" make an array as well? If so, this kind of breaks my model of accessing arrays

<input name="mix[0]" value="alpha">
<input name="mix[0]" value="bravo">
<!--
  { mix: ["bravo"] } OR
  { mix: [["alpha", "bravo"]] } OR
  { mix: {"": ["alpha", "bravo"]} }
  mix=alpha&mix[0]=bravo
-->

JSON root types

Currently we only support JSON objects at the root, which seems sufficient. But JSON can also support other root types. Are there strong use cases to add support for these (knowing that it would likely complicate matters a fair bit).

Empty-String As Key

I can't find the words to provide a proper argument and I can't think of a better solution, either. Using the empty-string as a key simply looks like a hack

<input name="mix" value="alpha">
<input name="mix[er]" value="bravo">
<!--
  { mix: {"": "alpha", "er": "bravo"} }
  mix=alpha&mix[er]=bravo
-->

Ambiguous Output, A One-Way Street

With application/x-www-form-urlencoded any of the following declarations can be identified (because data is serialized in document order and names are not touched):

<input name="mix" value="alpha">
<input name="mix" value="bravo">
<!--
  { mix: ["alpha", "bravo"] }
  mix=alpha&mix=bravo
-->

<input name="mix[]" value="alpha">
<input name="mix[]" value="bravo">
<!--
  { mix: ["alpha", "bravo"] }
  mix[]=alpha&mix[]=bravo
-->

<input name="mix[0]" value="alpha">
<input name="mix[1]" value="bravo">
<!--
  { mix: ["alpha", "bravo"] }
  mix[0]=alpha&mix[1]=bravo
-->

<input name="mix[1]" value="bravo">
<input name="mix[0]" value="alpha">
<!--
  { mix: ["alpha", "bravo"] }
  mix[1]=alpha&mix[0]=bravo
-->

Clarification in steps to set a JSON encoding value

In "steps to set a JSON encoding value", bullet 8.3.2.3:

Otherwise, set the context's property named by the step's key to object.

What it the word Otherwise referring to? I think it's unclear and as far as I can understand there isn't any condition on wether to set the context's property named by the step's key to object. I think that it should always be done.

Context:

Else if current value is an Array, then rub the following subsubsteps:

  1. If step's next type is "array", return current value.
  2. Otherwise, run the following
  3. Let object be a new empty Object.
  4. For each item and zero-based index i in current value, if item is not undefined then set a property of object named i to item.
  5. Otherwise, set the context's property named by the step's key to object.
  6. Return object.

Base64

Is the representation for files the best, knowing that base64 incurs an overhead?

Example 10 erroneously talks about files

Looks like example 10 was copied and pasted from 9, and talks about files in its title and comment, when it shouldn't.

// EXAMPLE 10: Files
<form enctype='application/json'>
  <input name='error[good]' value='BOOM!'>
  <input name='error[bad' value='BOOM BOOM!'>
</form>

// assuming the user has selected two text files, produces:
{
    "error": {
        "good":   "BOOM!"
    }
,   "error[bad":  "BOOM BOOM!"
}

Should be something like:

// EXAMPLE 10: Invalid Syntax
<form enctype='application/json'>
  <input name='error[good]' value='BOOM!'>
  <input name='error[bad' value='BOOM BOOM!'>
</form>

// produces:
{
    "error": {
        "good":   "BOOM!"
    }
,   "error[bad":  "BOOM BOOM!"
}

Sparse array security

It's possible to create humongous payloads with sparse arrays. We should flag this for implementations to have a limit.

Is append needed?

We have an "append" construct (foo[]) that is used for cases in which the developer wishes to indicate that even if there is only one instance of a given field name it should nevertheless be captured as an array. It is possible to achieve the same effect by carefully generating array indices, though it is slightly cumbersome. The feature is not costly in complexity, but given that it basically desugars to arrays it could be dropped with no loss of functionality.

Sparse Arrays

<input name="mix[0]" value="alpha">
<input name="mix[5]" value="bravo">
<!--
  { mix: ["alpha", null, null, null, null, "bravo"] }
  mix[0]=alpha&mix[5]=bravo
-->

This has been pointed out before: This is not only "ugly", it's a trivial DoS waiting to happen.

While this was trying to be developer friendly, a simple "fix" can be found in PHP's json_encode: sequential vs. non-sequential array example. If the keys of a map do not fulfill the following condition, the map is not converted to array, but serialized to object:

var data = {"0": "alpha", "1": "bravo", "2": "charlie"};
// all indexes must be integer
var _notInteger = /[^0-9]/
var _invalidKeys = Object.keys(data).some(_notInteger.test, _notInteger);
var keys = Object.keys(data).map(Number);
var _invalidKeys = keys.some(function(value){ return isNaN(value) });
// lowest index must be 0
var _lowerBound = Math.min.apply(Math, keys) !== 0;
// highest index must be exactly 
var _upperBound = Math.max.apply(Math, keys) !== keys.length - 1;

if (_invalidKeys || _lowerBound || _upperBound) {
  // serialize to Object
} else {
  // serialize to Array
}

Path syntax

The path syntax used in name attributes was selected to match that often seen in the wild in order to capture structure in forms, the idea being that it is simple and likely to be close to what is already supported, thereby enabling easy fallback to existing software during the transition and reusing developer habits. But if there are compelling reasons to use another syntax we can investigate them.

Expose encoding algorithm API

The steps required to implement the encoding algorithm can get complex.

http://darobin.github.io/formic/specs/json/index.html#h2_the-application-json-encoding-algorithm

formic/json/json.js

Lines 13 to 99 in 8bddf9f

function parseSteps (name) {
var steps = []
, orig = name // keep in case parsing fails
, ok = false
;
name = name.replace(/^([^\[]+)/, function (m, p1) {
steps.push({ type: "object", key: p1 });
ok = true;
return "";
});
if (!ok) return [{ type: "object", key: orig, last: true }];
if (!name.length) {
steps[0].last = true;
return steps;
}
while (name.length) {
var ok = false;
name = name.replace(/^\[\]/, function () {
steps[steps.length - 1].append = true;
ok = true;
return "";
});
// we had a match, but appends can only occur at the end
if (ok) {
if (name.length) return [{ type: "object", key: orig, last: true }];
break;
}
name = name.replace(/^\[(\d+)\]/, function (m, p1) {
steps.push({ type: "array", key: (1 * p1) });
ok = true;
return "";
});
if (ok) continue;
name = name.replace(/^\[([^\]]+)\]/, function (m, p1) {
steps.push({ type: "object", key: p1 });
ok = true;
return "";
});
if (ok) continue;
return [{ type: "object", key: orig, last: true }];
}
for (var i = 0, n = steps.length; i < n; i++) {
var step = steps[i];
if (i + 1 < n) step.next = steps[i + 1].type;
else step.last = true;
}
return steps;
}
function setValue (context, step, current, value, isFile) {
if (step.last) {
// there is no key, just set it
if (current === undefined) context[step.key] = step.append ? [value] : value;
// there are already multiple keys, push it
else if (isArray(current)) context[step.key].push(value);
// we're trying to set a scalar on an object
else if (isObject(current) && !isFile) {
return setValue(current, { type: "object", key: "", last: true }, current[""], value, isFile);
}
// there's already a scalar, pimp to array
else context[step.key] = [current, value];
return context;
}
else {
// there is no key, just define a new object
if (current === undefined) return context[step.key] = (step.next === "array" ? [] : {});
// it's already an object
else if (isObject(current)) return context[step.key];
// there is an array, we convert its defined items to an object
else if (isArray(current)) {
if (step.next === "array") return current;
else {
var obj = {};
for (var i = 0, n = current.length; i < n; i++) {
var item = current[i];
if (item !== undefined) obj[i] = item;
}
return context[step.key] = obj;
}
}
// there is a scalar
else {
return context[step.key] = { "": current };
}
}
}

It'd be wonderful if this algorithm was exposed in the browser somehow.

Why

  • If a developer wants to build an application/json but wants to do something just a little different, maybe append data from a custom widget/web component or post with a different content type. Applying validations to a nice object structure is a bit nicer than having to interpret the form fields yourself.
  • Helps define a standardized API for client and server side implementations to replicate. Related josh/parse-url-encoded.
  • It makes the browser algorithm more test friendly.

How

A new function could be introduced somewhere. I don't even know what to call it, lets just say encodeJSONParameters. It would accept an Array of Array pairs.

encodeJSONParameters([
  [ "name", "Bender" ],
  [ "shiny", true ],
  [ "bottles", 1 ],
  [ "kids[1]", "Thelma" ],
  [ "kids[0]", "Ashley" ]
]);
// =>
{
  "name": "Bender",
  "shiny": true,
  "bottles": 1,
  "kids": [ "Ashley", "Thelma" ]
}

I'm not sure how this would support files.

It might be possible to add something to the FormData interface.

var data = new FormData();
data.append('name', "Bender");
data.append('shiny', true);
// ...
data.toJSON()

Though, I don't know how FormData would work with non-string types. FormData also isn't something thats available on the server side, so it would be an awkward API to mirror in Node.js.

Also related this thread about adding custom form participants.

http://lists.w3.org/Archives/Public/public-webapps/2014JanMar/0448.html

What do you think @darobin?

Boolean radio/checkbox

With the current algorithm, radio and checkbox inputs that are checked and have no value set are represented in JSON using a boolean true value instead of the on string used in other encodings. This seems like a better fit for a new encoding, but the departure from tradition may surprise some. (Note that unchecked inputs will still be absent, not false.)

Potential incompatibility with multiple values

Ruby on Rails generates this when you render a checkbox for a field:

  <input name="book[title]" type="hidden" value="0">
  <input id="book_title" name="book[title]" type="checkbox" value="1">

It does this so there would always be a default value, even if the checkbox wasn't checked. Both values are sent and on the server, the latter value overrides the former. With JSON form submit, two values would be sent and on the server, they would become an array.

/cc @tobie

Using arbitrary ID as key in an array

Many web applications use a notation like mymodel[141253][name] where 141253 is the identifier of the described mymodel object.

In the current spec, this will create a JSON array with the key mymodel. In this array, 141252 explicit nullitems will be created, which will be suboptimal both for the size of the JSON message and for the processing time of the recipient.

AFAIK, JSON won't allow to use an array structure with arbitrary keys without explictly creating all keys in the sequence.

One solution would be to always convert mymodel[141253][name] to hash and never to array, generating

{
  "mymodel": {
    "141253": {
      "name": ""
    }
  }
}

This seems less smart than the current spec, but won't bug on this common use case.

File Upload can be a memory problem

Inlining files like that may work if the files are small enough to fit into available buffers. The specification does not mention the words "asynchronous" and "streaming". Therefore I assume that a file

  1. has to be read to memory as a whole,
  2. transformed to base64 encoding in memory,
  3. pushed onto the composing data model in memory,
  4. has it's memory released after the composing data model was serialized to string.

("in memory" can also mean "in memory file", "swapping", "whatever an implementor does to deal with scalable memory demand")

Being the naive type, I'd assume validating preconditions (exists, read-permissions, …) are done where the spec says, but actually reading, converting and pushing the file is done in step 4. This would assume that the serialized data is written directly onto network buffers and flushed appropriately.

The short question is what happens If I were to upload a 100MB file:

  1. will my browser need to allocate 100+ MB of memory?
  2. will the serialization block the event queue, making my UI unresponsive (when submitting into an iframe, which is common practice for file uploads)?

application/form+json instead of application/json

I think it would be more appropriate to introduce application/form+json media type instead of using application/json media type directly.

As long as serialized JSON directly depends on the structure of HTML forms it would be nice if request processors could distinguish from submitted payload format by their Content-Type and substitute appropriate parsers instead of "guessing"...

Potential benefits

  • Reduced coupling
  • It will be easier to refine serialization format according to HTML forms spec
  • It will be possible to apply schema validators

Be more restrictive about keys

This contradicts the common HTML approach of "try to parse even the most obvious junk" but I'd like the following example to just fail:

<form enctype='application/json'>
  <input name='error[good]' value='BOOM!'>
  <input name='error[bad' value='BOOM BOOM!'>
</form>

I don't think it really helps anyone to allow a key like error[bad. Especially since this allows even more subtle error conditions like obj[this is meant ] to all be one key]. Maybe optional quoting could help here?

Unhandled case : a[1] and a[b]

What is the behaviour expected when having one input with name a[1] and one with name a[b] ? One dictate an hash, the other dictate an array.

Ignore charsets

The other existing encodings respect accept-charset (and other charset selecting methods). For JSON we blithely ignore them and just use UTF-8. This is of course the better thing to do, but we are open to considering that there could be issues with this approach.

Complicated Value Retrieval

As a receiver I can never know if a given data point is a string, an array or an object. Because you never trust incoming data, You'd have to type-check everything.

<input name="mix" value="the-value">
<!-- { mix: "the-value" } -->

<input name="mix" value="default-value">
<input name="mix" value="the-value">
<!-- { mix: ["default-value", "the-value"] } -->

<input name="mix" value="the-value">
<input name="mix[0]" value="nested-value">
<!-- { mix: {"": "the-value", "0": "nested-value"} } -->

To retrieve "the-value" from mix properly, my reader would have to do the following:

var mix;
if (typeof data.mix === "string") {
    mix = data.mix;
} else if (Array.isArray(data.mix) && data.mix.length) {
    mix = data.mix[data.mix.length -1];
} else if (typeof data.mix === "object" && "" in data.mix) {
    mix = data.mix[""];
}
console.log("the value of mix", mix);

Combine that with the thought of having to destructure the generated JSON for Example 6: Such Deep and you made absolutely sure that any data sent has to be accessed through a "middleware" component.

minor ambiguities in JSON encoding steps

We're working on implementing this spec in Python (see encode/django-rest-framework#2148 and encode/django-rest-framework#2682).

Overall I found everything to be very straightforward and well-documented, though I did come across a few minor misspellings and ambiguities in the steps to set a JSON encoding value:

  • 8.1.1/8.1.2: Does "return it" refer to the context or to the newly created array/object? (The implementation suggests that "it" refers to the new object, but it would be helpful if this was more explicit)
  • 8.3: rub -> run
  • 8.3.3: The "Otherwise" seems out of place - it appears this action should always happen whenever step 8.3 is reached. (At first I thought 8.3.3 was an "else" for the "if item is not undefined then..." clause in 8.3.2)

Another minor spelling question is in Example 4 - should "hearbeat" be "heartbeat"?

Key/vals vs named values?

Disclaimer: I have no strong opinion either way and am not advocating one approach over the other, just very curious about the thought process used. Assuming others may want to know as well.

Robin, I was wondering if you have considered serializing form values as:

{
  "name" : {"value" : "Bender"}
, "hind"   : {"value" : "Bitable"}
, "shiny" : {"value" : true}
}

instead of

{
  "name":   "Bender"
, "hind":   "Bitable"
, "shiny":  true
}

the arguable benefit being: extensibility. With the alternative format, you could do things like:

{
"born" : {"type" : "date", "value" : "..." }
}

Refer to JSON as defined in RFC 7159

I believe we can agree it is widely accepted that RFC 7159 1 defines what JSON is, so it should at least be mentioned in this spec.

My suggestion is that the spec defines that any outcome of the JSON encoding conforms to RFC 7159 1 and also defines edge cases when it wouldn't. (Although I think the outcome should always conform no matter what.)

JSON Submission parse in JavaScript

well, see the code below:

a['b']
a[b]

The first b is a string, but the second is a variable.
And as the rule of JSON Submission, b should be a string.
This error will effect implementation in javascript possible.

Grammar-based approach?

I think it'd be interesting, at the very least, to have a formal grammar for the names. Once you had that, you could move from the current algorithmic-parsing to one that applies different semantics to each syntactic goal. Perhaps I am too brain-damaged from reading ES specs but to me that would be easier to parse than the algorithm, and probably has other benefits in terms of guaranteed unambiguity etc.

Multiple "append" constructs

First, I want to say that an input with such a name seems completely useless to me. That said, going along with the whole "don't lose any data" goal, how should we handle it?

<form enctype='application/json'>
  <input name='foo[][]' value='0'>
  <input name='foo[][]' value='1'>
  <input name='foo[][][]' value='2'>
  <input name='foo[1][]' value='3'> <!-- append/merge? -->
</form>

Based on my understanding of the spec, I'm guessing it would result in

{
    "foo":
    [
        [
            0
        ],
        [
            [
                1,
                3
            ]
        ],
        [
            [
                2
            ]
        ]
    ]
}

Is this correct?

Note on supported HTTP methods.

Although the issue of HTTP methods supported by form submission is strictly speaking, orthogonal to the encoding of the request, it would be worth having this document at least making a note of the current status of support for HTTP methods in form submission.

It's likely that introduction of this encoding would lead to users expecting to be able to make JSON encoded PUT requests, so it'd be helpful if the document noted that JSON ecnodings will always besent as POST requests.

Also what is the behavior if the "method" attribute is present on the form? Should it always be silently ignored and treated as POST?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.