bigmlcom / bigmler Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 38.0 20.52 MB

A higher-level API to BigML's API

License: Apache License 2.0

Python 92.26% HTML 1.18% R 3.21% JavaScript 3.30% Jupyter Notebook 0.04%

bigmler's People

Contributors

Stargazers

Watchers

bigmler's Issues

can't whitelist fields AND blacklist auto-generated date fields

let's say i have a dataset with fields a, b, c. note that a is a date. i create a model excluding c using a whitelist:

--model-fields a,b

then i decide i don't want a.month or a.day-of-month which are auto-generated, so i try

--model-fields a,b,-a.day-of-month,-a.month

this doesn't work; it says there's no such field.

excluding non-existent fields with `--model-fields` blows up

If you try to run

--model-fields " -foo,-bar"

and foo doesn't exist, then bigmler raises an exception.

For exclusions, I suggest only a warning be logged.

(For inclusions, on the other hand, a missing field should still raise an exception.)

KeyError: 'fields' when using bigmler analyze

Hi,
I have been using bigmler analyze with the following options:

bigmler analyze --features \
    --dataset "${DATASET_ID}" \
    --penalty 0.002 --staleness 3 --k-folds 2 \
    --predictions-csv \
    --optimize-category='True' \
    --optimize precision

And I'm getting the following error:

Creating the best features set..........
Traceback (most recent call last):
  File ".../.env/pyenv-2.7.10-default/bin/bigmler", line 11, in <module>
    sys.exit(main())
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/bigmler.py", line 97, in main
    analyze_dispatcher(args=new_args)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/dispatcher.py", line 117, in analyze_dispatcher
    resume=resume)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/k_fold_cv.py", line 239, in create_features_analysis
    objective_name=objective_name, resume=resume)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/k_fold_cv.py", line 586, in best_first_search
    fields = Fields(dataset)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigml/fields.py", line 175, in __init__
    resource_info = get_fields_structure(resource_or_fields, True)
  File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigml/fields.py", line 116, in get_fields_structure
    fields = resource['fields']
KeyError: 'fields'

Any idea what is causing this? If you want to reproduce this, the dataset ID is dataset/574f4a0c7e0a8d09ab0093ae.

Best,
Aurélien

uses 3gb+ memory running predictions

here you can see an ensemble running

it seems to load half the models (1gb ram), run predictions, load the next half of models (2gb), run predictions and finally combine (3gb).

every time you load new models, can't you clear references to the old ones and let them be garbage collected?

when running a batch prediction, predictions.csv can be incomplete

Ran a batch prediction on 50k rows. This usually works. Got an incomplete predictions.csv file back, despite no complaints from bigmler. It exited with code 0 and appeared to work fine. The file was cut off after about 10k records, halfway through a line.

My guess is that it should catch any errors and retry.

Adding ~ expansion to paths

The options that point to paths do not include user home directory expansion or ~ and ~user. We should add it.

`--output` seems to be ignored

the latest version of bigmler seems to ignore the --output option...

for example, I specify /tmp/foo/ensemble and it outputs to /tmp/foo/ensembles

Little render error in docs: `bigmler delete`

the example on the option --older-than doesn't render correctly

bigmler delete --older-than 2014-03-20 --newer-than 2014-03-19

Adding links to developers docs in --[resource-type]-attributes

As suggested by a customer, some links to the attributes section of each resource could be added to the description on the --[resource-type]-attributes options to clarify which attributes are available and their JSON syntax.

`--test-split` and `--json-filter` don't work together

I've got a master dataset with lots of labels and features.

I want to manually generate a split for every label with --test-split so that I can create multiple ensembles and evaluations against it (different feature sets).

I want to use --json-filter to make sure that a split only contains labeled records - in other words, ["not",["missing", "labelX"]]. That way I can ensure that train and test really are 70%/30%.

Currently, --test-split and --json-filter don't work together - the filter is ignored.

--resources-log and --clear-logs fail in Python 3[.7]

To reproduce:

Install bigmler for Python3
Run

bigmler execute \
                --code '(+ 1 1)' \
                --output-dir tmp \
                --resources-log foo

bigmler execute \
                --code '(+ 1 1)' \
                --output-dir tmp \
                --clear-logs

Expected behavior:
Both succeed, and the first command writes to the log file foo

Actual behavior:
Fails with stack trace

Traceback (most recent call last):
  File "/usr/local/bin/bigmler", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/bigmler/bigmler.py", line 97, in main
    bd.subcommand_dispatcher(subcommand, new_args)
  File "/usr/local/lib/python3.7/site-packages/bigmler/dispatchers.py", line 55, in subcommand_dispatcher
    return globals()["%s_dispatcher" % subcommand.replace("-", "_")](args)
  File "/usr/local/lib/python3.7/site-packages/bigmler/execute/dispatcher.py", line 63, in execute_dispatcher
    execute_whizzml(command_args, api, session_file)
  File "/usr/local/lib/python3.7/site-packages/bigmler/execute/dispatcher.py", line 78, in execute_whizzml
    clear_log_files([log])
  File "/usr/local/lib/python3.7/site-packages/bigmler/dispatcher.py", line 105, in clear_log_files
    open(log_file, 'w', 0).close()
ValueError: can't have unbuffered text I/O

Suggested fix:

This line

bigmler/bigmler/dispatcher.py

Line 105 in 919c0b7

open(log_file, 'w', 0).close()

open(log_file, 'w', 0).close()

This works in Python2 but not Python3. Internet seems to suggest replacing with

open(log_file, 'wb', 0).close()

Which works in both (tested on Python 2.7.16)

Little render error in docs

Here, where you explain Flatline, there is a missing link. It reads like this

Flatline expression <https://github.com/bigmlcom/flatline>

can't set `ordering` when creating evaluations

we want to perform multiple evaluations using the same dataset, so we need to be able to set ordering:

Specifies the type of ordering followed to pick the instances of the dataset to evaluate the model. There are three differnt types that you can specify:
0 Deterministic
1 Linear
2 Random

i believe bigmler currently doesn't support this.

bigmlcom / bigmler Goto Github PK

bigmler's People

Contributors

Stargazers

Watchers

Forkers

bigmler's Issues

Recommend Projects

Recommend Topics

Recommend Org