bigmlcom / bigmler Goto Github PK
View Code? Open in Web Editor NEWA higher-level API to BigML's API
License: Apache License 2.0
A higher-level API to BigML's API
License: Apache License 2.0
let's say i have a dataset with fields a
, b
, c
. note that a
is a date. i create a model excluding c
using a whitelist:
--model-fields a,b
then i decide i don't want a.month
or a.day-of-month
which are auto-generated, so i try
--model-fields a,b,-a.day-of-month,-a.month
this doesn't work; it says there's no such field.
If you try to run
--model-fields " -foo,-bar"
and foo
doesn't exist, then bigmler
raises an exception.
For exclusions, I suggest only a warning be logged.
(For inclusions, on the other hand, a missing field should still raise an exception.)
Hi,
I have been using bigmler analyze
with the following options:
bigmler analyze --features \
--dataset "${DATASET_ID}" \
--penalty 0.002 --staleness 3 --k-folds 2 \
--predictions-csv \
--optimize-category='True' \
--optimize precision
And I'm getting the following error:
Creating the best features set..........
Traceback (most recent call last):
File ".../.env/pyenv-2.7.10-default/bin/bigmler", line 11, in <module>
sys.exit(main())
File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/bigmler.py", line 97, in main
analyze_dispatcher(args=new_args)
File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/dispatcher.py", line 117, in analyze_dispatcher
resume=resume)
File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/k_fold_cv.py", line 239, in create_features_analysis
objective_name=objective_name, resume=resume)
File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigmler/analyze/k_fold_cv.py", line 586, in best_first_search
fields = Fields(dataset)
File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigml/fields.py", line 175, in __init__
resource_info = get_fields_structure(resource_or_fields, True)
File ".../.env/pyenv-2.7.10-default/lib/python2.7/site-packages/bigml/fields.py", line 116, in get_fields_structure
fields = resource['fields']
KeyError: 'fields'
Any idea what is causing this? If you want to reproduce this, the dataset ID is dataset/574f4a0c7e0a8d09ab0093ae
.
Best,
Aurélien
Ran a batch prediction on 50k rows. This usually works. Got an incomplete predictions.csv file back, despite no complaints from bigmler. It exited with code 0 and appeared to work fine. The file was cut off after about 10k records, halfway through a line.
My guess is that it should catch any errors and retry.
The options that point to paths do not include user home directory expansion or ~ and ~user. We should add it.
the latest version of bigmler
seems to ignore the --output
option...
for example, I specify /tmp/foo/ensemble
and it outputs to /tmp/foo/ensembles
the example on the option --older-than
doesn't render correctly
bigmler delete --older-than 2014-03-20 --newer-than 2014-03-19
As suggested by a customer, some links to the attributes section of each resource could be added to the description on the --[resource-type]-attributes options to clarify which attributes are available and their JSON syntax.
I've got a master dataset with lots of labels and features.
I want to manually generate a split for every label with --test-split
so that I can create multiple ensembles and evaluations against it (different feature sets).
I want to use --json-filter
to make sure that a split only contains labeled records - in other words, ["not",["missing", "labelX"]]
. That way I can ensure that train and test really are 70%/30%.
Currently, --test-split
and --json-filter
don't work together - the filter is ignored.
To reproduce:
bigmler
for Python3bigmler execute \
--code '(+ 1 1)' \
--output-dir tmp \
--resources-log foo
or
bigmler execute \
--code '(+ 1 1)' \
--output-dir tmp \
--clear-logs
Expected behavior:
Both succeed, and the first command writes to the log file foo
Actual behavior:
Fails with stack trace
Traceback (most recent call last):
File "/usr/local/bin/bigmler", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/bigmler/bigmler.py", line 97, in main
bd.subcommand_dispatcher(subcommand, new_args)
File "/usr/local/lib/python3.7/site-packages/bigmler/dispatchers.py", line 55, in subcommand_dispatcher
return globals()["%s_dispatcher" % subcommand.replace("-", "_")](args)
File "/usr/local/lib/python3.7/site-packages/bigmler/execute/dispatcher.py", line 63, in execute_dispatcher
execute_whizzml(command_args, api, session_file)
File "/usr/local/lib/python3.7/site-packages/bigmler/execute/dispatcher.py", line 78, in execute_whizzml
clear_log_files([log])
File "/usr/local/lib/python3.7/site-packages/bigmler/dispatcher.py", line 105, in clear_log_files
open(log_file, 'w', 0).close()
ValueError: can't have unbuffered text I/O
Suggested fix:
This line
Line 105 in 919c0b7
open(log_file, 'w', 0).close()
This works in Python2 but not Python3. Internet seems to suggest replacing with
open(log_file, 'wb', 0).close()
Which works in both (tested on Python 2.7.16)
Here, where you explain Flatline, there is a missing link. It reads like this
Flatline expression
<https://github.com/bigmlcom/flatline>
we want to perform multiple evaluations using the same dataset, so we need to be able to set ordering
:
Specifies the type of ordering followed to pick the instances of the dataset to evaluate the model. There are three differnt types that you can specify:
0 Deterministic
1 Linear
2 Random
i believe bigmler
currently doesn't support this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.