dreamquark-ai / tabnet Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
Home Page: https://dreamquark-ai.github.io/tabnet/
License: MIT License
PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
Home Page: https://dreamquark-ai.github.io/tabnet/
License: MIT License
From the experiment section of the TabNet paper:
"Adam optimization algorithm (Kingma & Ba, 2014) and Glorot uniform initialization are used for training of all models."
Also, from the TensorFlow implementation provided by the authors, they used tf.layers.dense
which seems to use glorot_uniform
by default.
However, in the tab_network.py
:
def initialize_non_glu(module, input_dim, output_dim):
gain_value = np.sqrt((input_dim+output_dim)/np.sqrt(4*input_dim))
torch.nn.init.xavier_normal_(module.weight, gain=gain_value)
# torch.nn.init.zeros_(module.bias)
return
def initialize_glu(module, input_dim, output_dim):
gain_value = np.sqrt((input_dim+output_dim)/np.sqrt(input_dim))
torch.nn.init.xavier_normal_(module.weight, gain=gain_value)
# torch.nn.init.zeros_(module.bias)
return
So my questions are:
Why use Glorot normal initialization instead of Glorot uniform initialization as described in the paper?
What are the reasons behind the formulas used here to calculate the gain value? Is there any reference for this? The recommended gain value for a linear layer should be the default value 1.
Thanks!
Describe the bug
What is the current behavior?
It's possible without error to train with n_independent=0
and n_shared=0
and looking at the code it seems that zero is actually 1, so minimal value is 1 and this should not be the case.
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Well I guess 0 and 0 should throw a clear error, but 0 should mean 0.
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
According to the paper, it seems that in the feature transformer in Figure.4(a),
all fc-bn-glu are shared. However, your implementation only shares fc.
is there a reason for this implementation?
class Roberta(transformers.BertPreTrainedModel):
def __init__(self, conf):
super(TweetModel, self).__init__(conf)
self.roberta = transformers.RobertaModel.from_pretrained(ROBERTA_PATH, config=conf)
self.dropout = nn.Dropout(0.1)
self.l0 = nn.Linear(768, 2)
torch.nn.init.normal_(self.l0.weight, std=0.02)
torch.nn.init.normal_(self.l1.weight, std=0.02)
I want to do something like this with Tabnet and have my own custom model so that I have all the liberties of using a neural net and I don't have to do it like scikit learn again
Describe the bug
new() received an invalid combination of arguments - got (list, int), but expected one of:
What is the current behavior?
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
Currently, the library can't be used as simply as a scikit model. It would be great to be fully scikit compatible
What is the expected behavior?
We need new classes for TabNetRegressor, TabNetClassifier.
We also need to get scikit compatible global explainations.
What is motivation or use case for adding/changing the behavior?
How should this be implemented in your opinion?
Are you willing to work on this yourself?
yes
Describe the bug
What is the current behavior?
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
Currently in tabnet architecture, a part of the output of Feature Transformer is used for the predictions (n_d) and the rest (n_a) as input for the next Attentive Transformer.
But I see a flaw in this design, the Feature Transformer (let's call it FT_i) sees masked input from the previous Attentive Transformer (AT_{i-1}), so the input feature of FT_i don't contain all the initial information. How can this help to select other useful features for the next step?
I think that attentive transformer should take as input the raw features to select the next step features, using the previous mask as prior to avoid selecting always the same feature as each step would still work.
So an easy way to try this idea would be to use the feature transformer only for predictions. The attentive transformer could be preceded by it's own feature transformer if necessary, but inputs of at attentive block would be initial data + prior of the previous masks.
This could potentially improve the attentive transformer part.
If you find this interesting, don't hesitate to share your ideas in the comment section or open a PR to propose a solution!
Describe the bug
Models don't accept model_name
, saving_path
as initialization arguments.
What is the current behavior?
See above.
If the current behavior is a bug, please provide the steps to reproduce.
clf: TabNetClassifier = TabNetClassifier(saving_path="/home/user123/dev/", device_name="cpu")
Expected behavior
Models should accept model_name
, saving_path
as initialization arguments as specified in the documentation.
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
On a related note: How can models be persisted? The mentioned init parameters strongly suggest that it is possible, but I couldn't find any information on this - either in the documentation nor in the code.
Describe the bug
I'm having this CUDA error when fitting the classifier. I googled it and find out that this is a common PyTorch error so I have tried to solve this by explicitly setting the gpu device (I have only one GPU Tesla T4) but it didn't work. Although when setting the classifier with parameter : device_name: 'auto' it recognises my GPU devise.
I also tried different batch sizes but without success.
It runs nicely with CPUs though and I'm really not sure on how to make it work with GPU. Would appreciate any help if you have encountered this issue already.
Also, have check my dataset multiple times to ensure they were no NaNs or Inf values in it.
What is the current behavior?
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
The details of the error:
RuntimeError Traceback (most recent call last)
in
7 batch_size=16384, virtual_batch_size=1024,
8 num_workers=0,
----> 9 drop_last=False
10 )
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_model.py in fit(self, X_train, y_train, X_valid, y_valid, loss_fn, weights, max_epochs, patience, batch_size, virtual_batch_size, num_workers, drop_last)
133 virtual_batch_size=self.virtual_batch_size,
134 momentum=self.momentum,
--> 135 device_name=self.device_name).to(self.device)
136
137 self.reducing_matrix = create_explain_matrix(self.network.input_dim,
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_network.py in init(self, input_dim, output_dim, n_d, n_a, n_steps, gamma, cat_idxs, cat_dims, cat_emb_dim, n_independent, n_shared, epsilon, virtual_batch_size, momentum, device_name)
250 device_name = 'cpu'
251 self.device = torch.device(device_name)
--> 252 self.to(self.device)
253
254 def forward(self, x):
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
423 return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
424
--> 425 return self._apply(convert)
426
427 def register_backward_hook(self, hook):
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
199 def _apply(self, fn):
200 for module in self.children():
--> 201 module._apply(fn)
202
203 def compute_should_use_set_data(tensor, tensor_applied):
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
199 def _apply(self, fn):
200 for module in self.children():
--> 201 module._apply(fn)
202
203 def compute_should_use_set_data(tensor, tensor_applied):
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
221 # with torch.no_grad():
222 with torch.no_grad():
--> 223 param_applied = fn(param)
224 should_use_set_data = compute_should_use_set_data(param, param_applied)
225 if should_use_set_data:
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in convert(t)
421
422 def convert(t):
--> 423 return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
424
425 return self._apply(convert)
RuntimeError: CUDA error: an illegal memory access was encountered
Describe the bug
What is the current behavior?
So when setting verbose to a value >1 and a scheduler, the verbosities don't match :
see https://www.kaggle.com/tanulsingh077/achieving-sota-results-with-tabnet#877426
Expected behavior
Well learning rates should follow same verbosity (or potentially be hidden not sure)
Additional context
What is the expected behavior?
Same as the outcome on CPUs and GPUs
What is motivation or use case for adding/changing the behaviour?
Better training performance
How should this be implemented in your opinion?
Similar to Tensorflow/Pytorch sends the data to the TPU
Are you willing to work on this yourself?
Happy to contribute along with another experienced developer
Describe the bug
There is a problem with the way we deal with layers indexing that deals to a bug.
What is the current behavior?
You'll get an error if trying to set n_shared to 1 and n_independent to 2 for example.
Expected behavior
We should be able to put any value without error.
A fairly simple fix should be done
What is the expected behavior?
As mentioned in #102 with @hengck23 ghost batch norm implementation could probably be improved, his code here could be a good solution : https://gist.github.com/hengck23/c21b8b6f2f34634687ebd8a4e963f560
What is motivation or use case for adding/changing the behavior?
Cleaner and faster implementation
How should this be implemented in your opinion?
see above
Are you willing to work on this yourself?
why not
Would be good to have STR and repr method
Tabnet architecture is using sequential steps in order to mimic some kind of random forest paradigm.
But since boosting algorithms often outperform random forests shouldn't we try to move towards boosting methods instead of random forest?
One solution I see here would be to predict different things at each step of the tabnet to perform boosting:
This looks like it could work quite easily for regression problems but I'm not sure how it could work for classification tasks, you can't stay in the classification paradigm and try to predict residuals. If anyone knows about a specific loss function that would make that happen I think it's worth a try!
If you feel like this is interesting and would like to contribute, please share your ideas in comments or open a PR!
What is the expected behavior?
No behaviour changes. Rather add examples to the docs or the examples section of the repo.
What is motivation or use case for adding/changing the behavior?
Make it easy to allow users to adapt it into their ML workflow. As explainability is an important topic in the current atmosphere.
How should this be implemented in your opinion?
No implementation needed, just docs and examples either as a python code snippet, a Jupyter notebook or a Kaggle kernel will be sufficient.
Are you willing to work on this yourself?
yes
Save/load/average checkpoints.
What is the expected behavior?
What is motivation or use case for adding/changing the behavior?
Smarter early stopping and possibly better generalization on predictions.
How should this be implemented in your opinion?
Good source of inspiration here: https://github.com/Qwicen/node/blob/master/lib/trainer.py
Are you willing to work on this yourself?
yes
What is the expected behavior?
New example in the examples section of the repo.
What is motivation or use case for adding/changing the behavior?
Adding a new application area.
How should this be implemented in your opinion?
Just docs and examples of using tabnet with openai and small data.
Are you willing to work on this yourself?
Maybe. Not sure.
#Abhishek-eBook
Hello and thank you for your great work!
What was the idea behind passing device
parameter to the constructor of nn.Module
and storing it? I've never seen that pattern before in Pytorch.
Hi all,
Thanks for the clean implementation of this model!
I'm comparing tabnet to an MLP and some gradient boosted tree models on a very large (~terabyte) dataset. Tabnet is several orders of magnitude slower than the MLP with a comparable parameter count. It also seems to occupy a lot of memory on the GPU. Is this expected and is there something I can do about this?
Describe the bug
What is the current behavior?
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
Currently we are plotting scores at #verbose epoch but we should incorporate call backs or at least history to avoid calling matplotlib each time
What is the expected behavior?
Something XGBoost like
What is motivation or use case for adding/changing the behavior?
Many
How should this be implemented in your opinion?
Not quite sure yet
Are you willing to work on this yourself?
yes
What is the expected behavior?
It would be very helpful to add sample weight support for regression problems. The idea would be to add a 'sample_weight' parameter to the .fit() call, and give a weighted regression.
What is motivation or use case for adding/changing the behavior?
Many datasets involve different sample weights. This is especially common with sports data (where I work), but is frequently used elsewhere.
How should this be implemented in your opinion?
The usual implementation I've seen has been to multiply the individual residuals by the sample weight, but I am not very familiar with the underlying math here, so don't know how it would work.
Are you willing to work on this yourself?
I am happy to help, but my understanding of the underlying code is lacking at the moment.
I notice that for every epoch, there will be train and valid accuracy.
Is the accuracy the metrics for the optimization? I am currently dealing with binary classification problem, and I would like to use auc or recall as an metric. May I be able to do that too?
Thank yo very much for your response.
Currently some things can be changed like scheduler or optimizer but it's not possible to do things like changing the loss function, the early stopping metrics and probably some important things for specific problems.
We should find a simple way of using callbacks in order to customize more the training process.
Something that would resemble one of these:
The easier it is and the less invasive solution for the code the better
If you feel like this is interesting and would like to contribute, please share your ideas in comments or open a PR!
It appears that the TabNetClassifier does not have a get_params
method for hyperparameter estimation.
Is this reproducible your end?
Many thanks
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-33-03d6c8d15377> in <module>()
4
5 start = time()
----> 6 randomSearch.fit(X_train, y_train)
7
8
1 frames
/usr/local/lib/python3.6/dist-packages/sklearn/base.py in clone(estimator, safe)
65 "it does not seem to be a scikit-learn estimator "
66 "as it does not implement a 'get_params' methods."
---> 67 % (repr(estimator), type(estimator)))
68 klass = estimator.__class__
69 new_object_params = estimator.get_params(deep=False)
TypeError: Cannot clone object 'TabNetClassifier(n_d=32, n_a=32, n_steps=5,
lr=0.02, seed=0,
gamma=1.5, n_independent=2, n_shared=2,
cat_idxs=[],
cat_dims=[],
cat_emb_dim=1,
lambda_sparse=0.0001, momentum=0.3,
clip_value=2.0,
verbose=1, device_name="auto",
model_name="DreamQuarkTabNet", epsilon=1e-15,
optimizer_fn=<class 'torch.optim.adam.Adam'>,
scheduler_params={'gamma': 0.95, 'step_size': 20},
scheduler_fn=<class 'torch.optim.lr_scheduler.StepLR'>, saving_path="./")' (type <class 'pytorch_tabnet.tab_model.TabNetClassifier'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
What is the expected behavior?
During training, masks don't need to be available for users. We could skip some computations as discussed in #102
What is motivation or use case for adding/changing the behavior?
This should speed things up
How should this be implemented in your opinion?
not sure yet
Are you willing to work on this yourself?
yes
Creating an external module for embeddings generation would make code clearer.
Some improvement to skip this part if no embeddings are needed would also make the training faster (see #97 ).
What is the expected behavior?
Nothing would change, just code optimization
What is motivation or use case for adding/changing the behavior?
Code clearer and faster.
How should this be implemented in your opinion?
Are you willing to work on this yourself?
yes
What is the expected behavior?
What is motivation or use case for adding/changing the behavior?
How should this be implemented in your opinion?
Are you willing to work on this yourself?
yes
What is the expected behavior?
What is motivation or use case for adding/changing the behavior?
How should this be implemented in your opinion?
Are you willing to work on this yourself?
yes
Hi,
ytorch-tabnet 1.0.4,
ON windwos got this error:
OSError: [Errno 22] Invalid argument: './DreamQuarkTabNet_13-03-2020_12:47:25.pt'
In tab_model.py:
Lines 112-113
model_name is defined with:
dt_string = now.strftime("%d-%m-%Y%H:%M:%S")
self.model_name += dt_string
once this is run it produces above error on windows:
torch.save(self.network, self.saving_path+f"{self.model_name}.pt")
--> Please change
line 113 to
dt_string = now.strftime("%d-%m-%Y%H_%M_%S")
Describe the bug
What is the current behavior?
If you try to set cat_emb_dim
to a value bigger than 1 you'll get an DimensionError due to explain and embeddings.
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
This should work and return sum of importances for embedded dimensions
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
Describe the bug
What is the current behavior?
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
Describe the bug
I tried to run on gpu and then on cpu and with different embedding sizes.
Still I get a dimension error.
Here is the link to the notebook:
https://colab.research.google.com/drive/1wDQ28PNxtEJA1XZyN2eVA6iTSd6ctf-E?usp=sharing
Maybe related to #94
Describe the bug
I get this CUDA error when trying to fit the classifier (with GPU).
I've also tried switching to CPU and got a different error => "RuntimeError: Invalid index in gather at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:657" where now the error seems to be related to an index tensor that has invalid indices and I'm not sure on how to solve this.
What is the current behavior?
This error happen when fitting a classifier with exactly the same parameters as in the "census_examples" notebook but on the different dataset.
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Screenshots
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
RuntimeError Traceback (most recent call last)
in
7 batch_size=512, virtual_batch_size=128,
8 num_workers=0,
----> 9 drop_last=False
10 )
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_model.py in fit(self, X_train, y_train, X_valid, y_valid, loss_fn, weights, max_epochs, patience, batch_size, virtual_batch_size, num_workers, drop_last)
165 self.patience_counter < self.patience):
166 starting_time = time.time()
--> 167 fit_metrics = self.fit_epoch(train_dataloader, valid_dataloader)
168
169 # leaving it here, may be used for callbacks later
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_model.py in fit_epoch(self, train_dataloader, valid_dataloader)
222 DataLoader with valid set
223 """
--> 224 train_metrics = self.train_epoch(train_dataloader)
225 valid_metrics = self.predict_epoch(valid_dataloader)
226
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_model.py in train_epoch(self, train_loader)
487
488 for data, targets in train_loader:
--> 489 batch_outs = self.train_batch(data, targets)
490 if self.output_dim == 2:
491 y_preds.append(torch.nn.Softmax(dim=1)(batch_outs["y_preds"])[:, 1]
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_model.py in train_batch(self, data, targets)
530 self.optimizer.zero_grad()
531
--> 532 output, M_loss = self.network(data)
533
534 loss = self.loss_fn(output, targets)
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_network.py in forward(self, x)
254 def forward(self, x):
255 x = self.embedder(x)
--> 256 return self.tabnet(x)
257
258 def forward_masks(self, x):
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_network.py in forward(self, x)
130
131 for step in range(self.n_steps):
--> 132 M = self.att_transformers[step](prior, att)
133 M_loss += torch.mean(torch.sum(torch.mul(M, torch.log(M+self.epsilon)),
134 dim=1))
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/tab_network.py in forward(self, priors, processed_feat)
290 x = self.bn(x)
291 x = torch.mul(x, priors)
--> 292 x = self.sp_max(x)
293 return x
294
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/sparsemax.py in forward(self, input)
89
90 def forward(self, input):
---> 91 return sparsemax(input, self.dim)
92
93
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/sparsemax.py in forward(ctx, input, dim)
41 max_val, _ = input.max(dim=dim, keepdim=True)
42 input -= max_val # same numerical stability trick as for softmax
---> 43 tau, supp_size = SparsemaxFunction._threshold_and_support(input, dim=dim)
44 output = torch.clamp(input - tau, min=0)
45 ctx.save_for_backward(supp_size, output)
/opt/conda/lib/python3.7/site-packages/pytorch_tabnet/sparsemax.py in _threshold_and_support(input, dim)
74
75 support_size = support.sum(dim=dim).unsqueeze(dim)
---> 76 tau = input_cumsum.gather(dim, support_size - 1)
77 tau /= support_size.to(input.dtype)
78 return tau, support_size
RuntimeError: Invalid index in gather at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:657
Running out of the box the forest_example
, the results differ significantly from the ones in the original paper. Specifically, I get the following:
preds = clf.predict_proba(X_test)
y_true = y_test
test_acc = accuracy_score(y_pred=np.argmax(preds, axis=1), y_true=y_true)
print(f"BEST VALID SCORE FOR {dataset_name} : {clf.best_cost}")
BEST VALID SCORE FOR EPIGN : -0.8830427851320214
print(f"FINAL TEST SCORE FOR {dataset_name} : {test_acc}")
FINAL TEST SCORE FOR EPIGN : 0.0499728922661205
Do you get similar results? Many thanks.
Tabnet architecture is using sparsemax function in order to perform instance-wise feature selection, and this is one of the important feature of TabNet.
One of the interesting properties of sparsemax is that it's outputs sum to 1, but do we really want this?
Is it the role of the mask to perform both selection (0s for unused features) and importance (a value between 0 and 1)?
I would say that the feature transformer should be used to create importance (by summing values of the relu outputs as it's done in the paper) and the masks should output binary masks that would not sum to 1.
On problem I see with non binary maks is that they change the values for the next layers, if someone is 50 year old, and the attention layer think that age is half of the solution then attention for age would be 0.5, and the next layer would see age=25. But how can the next layers differentiate from 75 / 3, 50 /2 and 25? They can't really, so it seems that some information is lost along the way because of the masks, that's why I would be interested to see how binary masks perform!
I'm not quite sure if there are known solutions for this, would thresholding a softmax works? Would you add this threshold as a parameter? or would it be learnt by the model itself? I'm not even sure that it would
If you feel like this is interesting and would like to contribute, please share your ideas in comments or open a PR!
I created some Research Issues that would be interesting to work on. But it's hard to tell if an idea is a good idea without having a clear benchmark on different dataset.
So it would be great to have a few notebooks that could run on different datasets in order to monitor performances uplift of a new implementation.
What is the expected behavior?
The idea would be to run this for each improvement proposal and see whether it helped or not.
How should this be implemented in your opinion?
This issue could be closed little by little by adding new notebooks that each perform a benchmark on one well known dataset.
Or maybe it's a better a idea to incorporate tabnet to existing benchmarks like Catboost Benchmark : https://github.com/catboost/benchmarks
Are you willing to work on this yourself?
yes of course, but any help would be appreciated!
In order to improve speed, user could change num_workers directly in model parameters or fit parameters (probably better on fit parameters).
What is the expected behavior?
This could ease users to try to use as many thread as possible using torch Dataloaders num_workers
What is motivation or use case for adding/changing the behavior?
See #97
How should this be implemented in your opinion?
Are you willing to work on this yourself?
yes
I am trying to use ReduceOnplateau lr scheduler with TabnetRegressor and I am getting the following error:
step() missing 1 required positional argument: 'metrics'
I don't find any argument to pass in the metrics or something I even went through the code of Tabnet Help would be appreciated
Thanks in advance
Currently output of explain is of tensor format, should be of numpy.
What is the expected behavior?
Should be numpy array
What is motivation or use case for adding/changing the behavior?
Everyone expects numpy arrays
How should this be implemented in your opinion?
.detach().numpy()
Are you willing to work on this yourself?
yes
Hi there! Could you please help verify that I've made sure to do attention right? I'm working off the fastai
implementation, and so it would be faster to read up here but essentially I made a modification to his model that can return the masks. So it currently looks like so:
learn.model.eval()
for batch_nb, data in enumerate(dl):
with torch.no_grad():
out, M_loss, M_explain, masks = learn.model(data[0], data[1], True)
for key, value in masks.items():
masks[key] = csc_matrix.dot(value.numpy(), matrix)
if batch_nb == 0:
res_explain = csc_matrix.dot(M_explain.numpy(),
matrix)
res_masks = masks
else:
res_explain = np.vstack([res_explain,
csc_matrix.dot(M_explain.numpy(),
matrix)])
for key, value in masks.items():
res_masks[key] = np.vstack([res_masks[key], value])
From here to plot, I do:
fig, axs = plt.subplots(1, 3, figsize=(20,20))
for i in range(3):
axs[i].imshow(np.expand_dims(res_masks[0][i], 0))
Now I chose to do the np.expand_dims
as it let's us visualize on an indivudal item level what is going on. Is this the correct way to do this sort of analysis? Or should I have included it at a batch level (or does it really not make a difference in the end).
Thanks!
Add CI to enforce conventional commit : https://www.conventionalcommits.org/en/v1.0.0/
Describe the bug
If the list of cat_idx is unordered the corresponding cat_dims used into embeddings will not match.
What is the current behavior?
The bug appear into the forward of EmbeddingGenerator
.
A for loop walk througth features and take embedding corresponding to each categorical feature from the self.embeddings list wich is build in the same order as cat_idx.
If the current behavior is a bug, please provide the steps to reproduce.
Provide an unordered cat_idx list with corresponding cat_dims.
Solution
Sort the cat_dims and the corresponding emb_dims with respect to cat_idx
self.embeddings = torch.nn.ModuleList()
# Sort dims by cat_idx
sorted_idxs = np.argsort(cat_idxs)
cat_dims = [cat_dims[i] for i in sorted_idxs]
self.cat_emb_dims = [self.cat_emb_dims[i] for i in sorted_idxs]
for cat_dim, emb_dim in zip(cat_dims, self.cat_emb_dims):
self.embeddings.append(torch.nn.Embedding(cat_dim, emb_dim))
The original tabnet classifier by google is hard-coded to pass predictions in a multi-class format, regardless of whether num_classes
is 2.
Would you know if the above means
Is your implementation similar in this aspect?
What is the expected behavior?
The network attribute should be created as soon as a model classifier or regressor is instantiated.
What is motivation or use case for adding/changing the behavior?
The network's existence is independent of the fit
function and this will help with saving/loading features. None of the network parameters depend on any fit-only information.
How should this be implemented in your opinion?
Are you willing to work on this yourself?
yes
When training with large embedding dimensions, the mask size goes up.
One problem I see is that sparsemax does not know about which columns come from the same embedded columns, this could create something a bit difficult for the model to learn:
It's an open problem but one way I see as promising is to create embedding aware attention.
The idea would be to mask all dimensions from a same embedding the same way, either by using the mean or the max of the initial mask.
I implemented a first version here : #92
If you feel like this is interesting and would like to contribute, please share your ideas in comments or open a PR!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.