Comments (23)
FYI, see this recent update about Titanic scoring: https://www.kaggle.com/c/titanic/discussion/177265
I wondered why your new version was giving a .79 score when the old version gave me a .80 score 5 weeks go. This Kaggle change probably accounts for the difference.
from carefree-learn.
That sounds strange, could you please tell me what's the content of line:954 of your cflearn/__init__.py
?
And maybe the old problem occurred once again: one of your files was named cflearn
, which overrode my cflearn
module! 🤣
from carefree-learn.
This was a new installation on a new server. I didn't modify any files this time, the first thing I did was run test_titanic.py.
Line 954 of cflearn/init.py is this:
Line 954: class Ensemble:
def init(self, task_type: TaskTypes, config: Dict[str, Any] = None):
self.task_type = task_type
if config is None:
config = {}
self.config = config
from carefree-learn.
I solved the problem by doing this:
git clone https://github.com/carefree0910/carefree-learn.git
cd carefree-learn
pip install -e .
from carefree-learn.
So there might be something wrong with your first installation 😉
BTW: the AdaBoost solution will not provide a better result, I guess the reason is that the titanic dataset is so small that even AdaBoost overfits it too hard 🤣
from carefree-learn.
How do I get it to give the .80 Titanic score like it did with the previous version?
If I edit your test_titanic file to use fcnn instead of tree_dnn, it only gets a score of around .77 when I submit to Kaggle. And the tree_dnn version gets much lower.
When I try to run the same file I used in the old version, which got .80, I get this error:
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:22<00:00, 2.30s/it]
Traceback (most recent call last):
File "titanic.py", line 41, in
test()
File "titanic.py", line 31, in test
x_te, _ = results.transformer.data.read_file(test_file, contains_labels=False)
AttributeError: 'RepeatResult' object has no attribute 'transformer'
root@ns544446:~/carefree-learn/examples/titanic#
I guess what I am saying is that it would help in your Titanic example if you made it give the best score possible.
from carefree-learn.
- Yes indeed, I've found out that the
cflearn.tune_with
is somehow worse than before, and that's why I'm not releasing a new version 😢 - The current version is not quite backward-compatible 😆
I'm still able to produce a .80 result with the current version (with much more attempts than before), but the average performance dropped. I'm looking into it and will fix it ASAP!
from carefree-learn.
After some struggling I finally realized that the drawback came from different initialization of the nn.Embedding
.
Previous:
self.embedding = nn.Embedding(num_values, self._dim)
Now:
self.embedding = nn.Embedding(num_values, self._dim)
embedding_initializer = Initializer({"mean": self._mean, "std": self._std})
embedding_initializer.truncated_normal(self.embedding.weight)
Where
self._mean = self.config.setdefault("embedding_mean", 0.0)
self._std = self.config.setdefault("embedding_std", 0.02)
It's hard to tell which is better, since titanic
is just a small dataset and maybe current embeddings' initialization simply overfits it too hard. So I'll leave the code as is 😉
However it might be helpful to provide a better configuration with comment in
test_titanic.py
, and I'm planning to do so 😆
from carefree-learn.
I looked at the files now and couldn't find where to change the nn.Embedding embedding code.
from carefree-learn.
Maybe change the configuration from the outside is a more elegant way. That is, in test_titanic.py
, change _hpo_core
from:
data_config = {"label_name": "Survived"}
hpo = cflearn.tune_with(
train_file,
model="tree_dnn",
temp_folder="__test_titanic1__",
task_type=TaskTypes.CLASSIFICATION,
data_config=data_config,
num_parallel=0,
)
to
data_config = {"label_name": "Survived"}
model_config = {"default_encoding_configs": {"embedding_std": 1.0}}
hpo = cflearn.tune_with(
train_file,
model="tree_dnn",
temp_folder="__test_titanic1__",
task_type=TaskTypes.CLASSIFICATION,
model_config=model_config,
data_config=data_config,
num_parallel=0,
)
from carefree-learn.
These are the results I got from that:
tree_dn:
submissions_hpo.csv - 0.75598
submissions_adaboost.csv - 0.75598
fcnn:
submissions_hpo.csv - 0.76794
submissions_adaboost.csv - 0.73444
from carefree-learn.
The adaboost results are meant to be worse, however the hpo results seem to be strange...
I've tried it myself just now, and got .79 😕
Could you provide me the searched params of your cflearn.tune_with
(As shown below)?
from carefree-learn.
for that .79, are you using tree dnn or fcnn?
----------------------------------------------------------------------------------------------------
acc (21542b40) (0.881706 ± 0.014418)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0035916670593215895}}
----------------------------------------------------------------------------------------------------
auc (21542b40) (0.936365 ± 0.016136)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0035916670593215895}}
----------------------------------------------------------------------------------------------------
best (21542b40)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0035916670593215895}}
----------------------------------------------------------------------------------------------------
from carefree-learn.
I am running it again now with the new code you uploaded to https://github.com/carefree0910/carefree-learn/blob/dev/examples/titanic/test_titanic.py . I will let you know the results when it finishes.
from carefree-learn.
In case you missed my comment, that code needs to be uncomment something before running 🤣
from carefree-learn.
I ran your new version of test_titanic.py (with the various # lines uncommented) and it got:
submissions_hpo.csv - 0.75358
The only other changes I made were:
num_parallel=8,
num_jobs=8,
Here's the output:
----------------------------------------------------------------------------------------------------
acc (c70268a7) (0.874972 ± 0.024462)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0025180681577456265}}
----------------------------------------------------------------------------------------------------
auc (c70268a7) (0.925608 ± 0.025753)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0025180681577456265}}
----------------------------------------------------------------------------------------------------
best (c70268a7)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0025180681577456265}}
from carefree-learn.
Hmmm, I'll dive into it and see where goes wrong 😢
from carefree-learn.
Was your .79 today using tree_dn or fcnn?:
from carefree-learn.
I used tree_dnn.
from carefree-learn.
Also, I tried it now with fcnn and got .77. In the old version it got .80 most of the times I tried it with fcnn.
from carefree-learn.
I've updated some codes and now it should be fine!
from carefree-learn.
Yes, that works now. For tree_dnn it got 0.78947
I also tried it for fcnn and it also got 0.78947.
from carefree-learn.
Great! I'll close this issue now, and feel free to re-open it at anytime if needed 😉
from carefree-learn.
Related Issues (20)
- Introduce callbacks to `Trainer`
- Enhance `LossBase`
- Make sure that models are always in eval mode in inference
- Support specifying `resource_config` of `Parallel` in `Experiment`
- Try to accelerate `DNDF` with `Function`
- Text fields support? HOT 8
- Depend `batch_size` on dataset size
- Plateau monitoring should depend on running mean HOT 1
- Should log checkpoints earlier when using `warmup`
- Support `use_final_bn` in `FCNNHead`
- Fix `cflearn.deepspeed`
- Try to support customizing `Pipeline`.
- Newly registered models could not be used in distributed training.
- Support directly registering an `nn.Module`
- Beautify `tqdm` display in distributed training.
- Add unittest for #78
- Bugs may occur when ctrl+c at inference stage.
- nb HOT 1
- Examples won't run HOT 2
- What version is working? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from carefree-learn.