Code Monkey home page Code Monkey logo

bert-toxic-comments-multilabel's People

Contributors

kaushaltrivedi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert-toxic-comments-multilabel's Issues

AttributeError: 'BertForMultiLabelSequenceClassification' object has no attribute 'module'

when i run the notebook to

model.module.unfreeze_bert_encoder()

got this error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-48-e5502767395c> in <module>
----> 1 model.module.unfreeze_bert_encoder()

c:\users\jiang\.conda\envs\python3.6\lib\site-packages\torch\nn\modules\module.py in __getattr__(self, name)
    946                 return modules[name]
    947         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 948             type(self).__name__, name))
    949 
    950     def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:

AttributeError: 'BertForMultiLabelSequenceClassification' object has no attribute 'module'

where i missed ?

What is in the classes.txt file?

Great work, thanks for sharing, @kaushaltrivedi . I am trying to run this code, but having issues related to the input data. I downloaded the dataset from Kaggle. But, it looks like there are missing files like the classes.txt. Could you please explain the format of those?

Thanks.

Re: Installation of apex

Hi,

I am getting issue in installation of apex

1

As given in the blog, I tried the following commands:

!git clone https://github.com/NVIDIA/apex
cd apex
!pip install -v --no-cache-dir --global-option="--pyprof" --global-option="--cpp_ext" --global-option="--cuda_ext"

But the installation ends up with message :
2

Kindly help as I am stuck.

Thanks,
Deepti

Unable to freeze/unfreeze

model.module.freeze_bert_encoder() and model.module.unfreeze_bert_encoder() produce an error. Calling those methods from model works fine.

What is in the 'val.csv' file?

Hi, I am trying to run the BERT multilabel classification and was wondering what is contained in the 'val.csv' file? Thanks :)

Change `PreTrainedBertModel` to `BertPreTrainedModel`

Hi, thanks for sharing this wonderful work
I recently try to re-run the code and cannot import the pytorch_pretrained_bert correctly
and I figure out the module in huggingface/pytorch-pretrained-BERT have changed from PreTrainedBertModel to BertPreTrainedModel, just a reminder for those who facing the same issue

Wrong number of labels

The function get_labels is used to get the labels from the source csv files, and the length of this is used to get the size of the last layer in the model. However, the first column is the ID of the document, not a label, so using this results in a size mismatch in the model, which is unable to train.

if self.labels == None: self.labels = list(pd.read_csv(os.path.join(self.data_dir, "classes.txt"),header=None)[0].values)

Removing the first value (or saying num_labels = len(labels - 1)) fixes this problem.

FileNotFoundError: File val.csv does not exist

While running the notebook I'm stuck at the above mentioned error. The code is:

Eval Fn

eval_examples = processor.get_dev_examples(args['data_dir'], size=args['val_size'])
def eval():
......

Error:
FileNotFoundError Traceback (most recent call last)
in
1 # Eval Fn
----> 2 eval_examples = processor.get_dev_examples(args['data_dir'], size=args['val_size'])
3 def eval():
4 args['output_dir'].mkdir(exist_ok=True)
5

in get_dev_examples(self, data_dir, size)
22 filename = 'val.csv'
23 if size == -1:
---> 24 data_df = pd.read_csv(os.path.join(data_dir, filename))
25 # data_df['comment_text'] = data_df['comment_text'].apply(cleanHtml)
26 return self._create_examples(data_df, "dev")

/anaconda/envs/py36/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
676 skip_blank_lines=skip_blank_lines)
677
--> 678 return _read(filepath_or_buffer, kwds)
679
680 parser_f.name = name

/anaconda/envs/py36/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
438
439 # Create the parser.
--> 440 parser = TextFileReader(filepath_or_buffer, **kwds)
441
442 if chunksize or iterator:

/anaconda/envs/py36/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, f, engine, **kwds)
785 self.options['has_index_names'] = kwds['has_index_names']
786
--> 787 self._make_engine(self.engine)
788
789 def close(self):

/anaconda/envs/py36/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1012 def _make_engine(self, engine='c'):
1013 if engine == 'c':
-> 1014 self._engine = CParserWrapper(self.f, **self.options)
1015 else:
1016 if engine == 'python':

/anaconda/envs/py36/lib/python3.6/site-packages/pandas/io/parsers.py in init(self, src, **kwds)
1706 kwds['usecols'] = self.usecols
1707
-> 1708 self._reader = parsers.TextReader(src, **kwds)
1709
1710 passed_names = self.names is None

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'kaggle_data/toxic_comments/tmp/val.csv' does not exist

Confused details about label_ids

when i read the code,i found u pass the label_ids as a list, though u define a dict named 'label_map', u don't convert label_ids to float number ,anything wrong in that???

Target size is not matching with the input size

I have loaded the data (Toxic dataset) and tried to run the model using batch_size_per_gpu = 4
but i am getting the below error.

ValueError: Target size (torch.Size([0, 6])) must be the same as input size (torch.Size([4, 6]))

Could you please help here.

CUDA out of memory. What can I do to improve model performance?

I have a Tesla GPU which has only 16 Gb -- much less than what you used for your experiment described in the Medium article. As a result, I had to reduce the max sequence length from 512 to 128, and the batch size from 32 to 16. After 4 epochs, the validation accuracies of the various toxic comment categories were around 0.6 to 0.65. I wonder if increasing the number of epochs would help increase the performance.

In addition, is there a way to continue training a model -- say after 4 epochs, if the validation results are not good, can I continue the training rather than restart the training with a larger number of epochs? Is it sufficient to just rerun fit()`?

Thanks !

Problem Loading Bert Weights

I am having trouble with this line:

model = BertForMultiLabelSequenceClassification.from_pretrained(bert_model_path, num_labels=num_labels)

Where bert_model_path is a path to a pytorch_model.bin.tar.gz file.

First, I get a complaint that the bert_config.json file (in the same folder) is not in the new temp folder. If I move it there manually, I get an error (an INFO message really) saying:

Weights of MultiLabelBert not initialized from pretrained model

Is this a bug or am I missing something?

Issue in importing Apex

I am having issues while importing apex. I get an error similar to the ones posted in run_classifier repository.

in
----> 1 import apex
2 import pandas as pd
3 import numpy as np
4 import torch
5

/anaconda3/lib/python3.7/site-packages/apex/init.py in
16 from apex.exceptions import (ApexAuthSecret,
17 ApexSessionSecret)
---> 18 from apex.interfaces import (ApexImplementation,
19 IApex)
20 from apex.lib.libapex import (groupfinder,

/anaconda3/lib/python3.7/site-packages/apex/interfaces.py in
8 pass
9
---> 10 class ApexImplementation(object):
11 """ Class so that we can tell if Apex is installed from other
12 applications

/anaconda3/lib/python3.7/site-packages/apex/interfaces.py in ApexImplementation()
12 applications
13 """
---> 14 implements(IApex)

/anaconda3/lib/python3.7/site-packages/zope/interface/declarations.py in implements(*interfaces)
481 # the coverage for this block there. :(
482 if PYTHON3:
--> 483 raise TypeError(_ADVICE_ERROR % 'implementer')
484 _implements("implements", interfaces, classImplements)
485

TypeError: Class advice impossible in Python3. Use the @Implementer class decorator instead.

Could anyone help me with this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.