Code Monkey home page Code Monkey logo

mira's People

Contributors

davbre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mira's Issues

datapackage.json with tab delimiter not working

When "\t" is speficied as the delimiter in datapackage.json it fails to save in the datapackage_resources tables. This is because it fails the ActiveRecord validation:

validates :delimiter, presence: true

PostgreSQL error when trying to index large text columns

Currently everything is indexed by default. When there are large text columns, this type message is returned during the data import:

PG::ProgramLimitExceeded: ERROR: index row size 3432 exceeds maximum 2712 for index "index_xy36_69_on_provenance_text"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

One option would be to add an additional key-value pair for each field indicating whether or not it should be indexed, e.g. "index: false". Otherwise the max length could be added to the field's "constraints" object, and this information then used to decide to index of not. Inclined to go with first option.

text blank endpoint not working

Blank strings are being stored as either "" or NULL depending on the form of the upload. Want to always set to NULL when empty.

columns names with spaces breaking import

This bug was introduced in the last 6 weeks. For example, the refugee deaths dataset was failing to import as it has a column "cause of death". This was not the case previously.

Error Uploading CSV

Hello,

thanks for mira. Looks really promising.
Was trying it out but ran into a problem.
After uploading a datapackage.json which is valid according to the validator.
But when I upload a csv file the worker fails with:

Job ProcessCsvUpload (id=8) FAILED (0 prior attempts) with ActiveRecord::UnknownAttributeError: unknown attribute 'DATE' for Xy2_2.

Here is the excerpt from the datapackage.json

"schema": {
        "fields": [
          {
            "name": "DATE",
            "type": "datetime"
          }
....

the api from mira for datapackage/fields returns

[
{
"name": "DATE",
"order": 1,
"big_integer": null,
"add_index": true,
"format": null,
"type": "datetime"
},
...

Any help would be appreciated.
Thanks.

Does not handle datapackage.json files with schema lookups

Was trying this with our schema from the Carnegie Museums of Pittsburgh (datapackage.json). Unfortunately, it does not validate.

We have two different files that both use the same schema, so we are using the schemas proprty to use a single schema definition for both files.

However, when trying to upload the datapackage.json file, I get the following error:

Datapackage ["Resource 'schema' must be a Hash.", "Path: cmoa.csv.", "Resource 'schema' must be a Hash.", "Path: teenie.csv."]```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.