Code Monkey home page Code Monkey logo

Comments (23)

 avatar commented on May 21, 2024 1

FYI, see this recent update about Titanic scoring: https://www.kaggle.com/c/titanic/discussion/177265
I wondered why your new version was giving a .79 score when the old version gave me a .80 score 5 weeks go. This Kaggle change probably accounts for the difference.

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

That sounds strange, could you please tell me what's the content of line:954 of your cflearn/__init__.py?

And maybe the old problem occurred once again: one of your files was named cflearn, which overrode my cflearn module! 🤣

from carefree-learn.

 avatar commented on May 21, 2024

This was a new installation on a new server. I didn't modify any files this time, the first thing I did was run test_titanic.py.
Line 954 of cflearn/init.py is this:

Line 954: class Ensemble:
def init(self, task_type: TaskTypes, config: Dict[str, Any] = None):
self.task_type = task_type
if config is None:
config = {}
self.config = config

from carefree-learn.

 avatar commented on May 21, 2024

I solved the problem by doing this:
git clone https://github.com/carefree0910/carefree-learn.git
cd carefree-learn
pip install -e .

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

So there might be something wrong with your first installation 😉

BTW: the AdaBoost solution will not provide a better result, I guess the reason is that the titanic dataset is so small that even AdaBoost overfits it too hard 🤣

from carefree-learn.

 avatar commented on May 21, 2024

How do I get it to give the .80 Titanic score like it did with the previous version?
If I edit your test_titanic file to use fcnn instead of tree_dnn, it only gets a score of around .77 when I submit to Kaggle. And the tree_dnn version gets much lower.

When I try to run the same file I used in the old version, which got .80, I get this error:
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:22<00:00, 2.30s/it]
Traceback (most recent call last):
File "titanic.py", line 41, in
test()
File "titanic.py", line 31, in test
x_te, _ = results.transformer.data.read_file(test_file, contains_labels=False)
AttributeError: 'RepeatResult' object has no attribute 'transformer'
root@ns544446:~/carefree-learn/examples/titanic#

I guess what I am saying is that it would help in your Titanic example if you made it give the best score possible.

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024
  1. Yes indeed, I've found out that the cflearn.tune_with is somehow worse than before, and that's why I'm not releasing a new version 😢
  2. The current version is not quite backward-compatible 😆

I'm still able to produce a .80 result with the current version (with much more attempts than before), but the average performance dropped. I'm looking into it and will fix it ASAP!

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

After some struggling I finally realized that the drawback came from different initialization of the nn.Embedding.

Previous:

self.embedding = nn.Embedding(num_values, self._dim)

Now:

self.embedding = nn.Embedding(num_values, self._dim)
embedding_initializer = Initializer({"mean": self._mean, "std": self._std})
embedding_initializer.truncated_normal(self.embedding.weight)

Where

self._mean = self.config.setdefault("embedding_mean", 0.0)
self._std = self.config.setdefault("embedding_std", 0.02)

It's hard to tell which is better, since titanic is just a small dataset and maybe current embeddings' initialization simply overfits it too hard. So I'll leave the code as is 😉

However it might be helpful to provide a better configuration with comment in test_titanic.py, and I'm planning to do so 😆

from carefree-learn.

 avatar commented on May 21, 2024

I looked at the files now and couldn't find where to change the nn.Embedding embedding code.

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

Maybe change the configuration from the outside is a more elegant way. That is, in test_titanic.py, change _hpo_core from:

data_config = {"label_name": "Survived"}
hpo = cflearn.tune_with(
    train_file,
    model="tree_dnn",
    temp_folder="__test_titanic1__",
    task_type=TaskTypes.CLASSIFICATION,
    data_config=data_config,
    num_parallel=0,
)

to

data_config = {"label_name": "Survived"}
model_config = {"default_encoding_configs": {"embedding_std": 1.0}}
hpo = cflearn.tune_with(
    train_file,
    model="tree_dnn",
    temp_folder="__test_titanic1__",
    task_type=TaskTypes.CLASSIFICATION,
    model_config=model_config,
    data_config=data_config,
    num_parallel=0,
)

from carefree-learn.

 avatar commented on May 21, 2024

These are the results I got from that:

tree_dn:
submissions_hpo.csv - 0.75598
submissions_adaboost.csv - 0.75598

fcnn:
submissions_hpo.csv - 0.76794
submissions_adaboost.csv - 0.73444

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

The adaboost results are meant to be worse, however the hpo results seem to be strange...

I've tried it myself just now, and got .79 😕

Could you provide me the searched params of your cflearn.tune_with (As shown below)?

image

from carefree-learn.

 avatar commented on May 21, 2024

for that .79, are you using tree dnn or fcnn?

----------------------------------------------------------------------------------------------------
acc  (21542b40) (0.881706 ± 0.014418)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0035916670593215895}}
----------------------------------------------------------------------------------------------------
auc  (21542b40) (0.936365 ± 0.016136)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0035916670593215895}}
----------------------------------------------------------------------------------------------------
best (21542b40)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0035916670593215895}}
----------------------------------------------------------------------------------------------------

from carefree-learn.

 avatar commented on May 21, 2024

I am running it again now with the new code you uploaded to https://github.com/carefree0910/carefree-learn/blob/dev/examples/titanic/test_titanic.py . I will let you know the results when it finishes.

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

In case you missed my comment, that code needs to be uncomment something before running 🤣

from carefree-learn.

 avatar commented on May 21, 2024

I ran your new version of test_titanic.py (with the various # lines uncommented) and it got:
submissions_hpo.csv - 0.75358
The only other changes I made were:
num_parallel=8,
num_jobs=8,

Here's the output:

----------------------------------------------------------------------------------------------------
acc  (c70268a7) (0.874972 ± 0.024462)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0025180681577456265}}
----------------------------------------------------------------------------------------------------
auc  (c70268a7) (0.925608 ± 0.025753)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0025180681577456265}}
----------------------------------------------------------------------------------------------------
best (c70268a7)
----------------------------------------------------------------------------------------------------
{'optimizer': 'rmsprop', 'optimizer_config': {'lr': 0.0025180681577456265}}

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

Hmmm, I'll dive into it and see where goes wrong 😢

from carefree-learn.

 avatar commented on May 21, 2024

Was your .79 today using tree_dn or fcnn?:

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

I used tree_dnn.

from carefree-learn.

 avatar commented on May 21, 2024

Also, I tried it now with fcnn and got .77. In the old version it got .80 most of the times I tried it with fcnn.

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

I've updated some codes and now it should be fine!

from carefree-learn.

 avatar commented on May 21, 2024

Yes, that works now. For tree_dnn it got 0.78947
I also tried it for fcnn and it also got 0.78947.

from carefree-learn.

carefree0910 avatar carefree0910 commented on May 21, 2024

Great! I'll close this issue now, and feel free to re-open it at anytime if needed 😉

from carefree-learn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.