The charmanteau-camready from vgtomahawk

harmful slurs present in dataset

The presence of harmful slurs severely hampers the usefulness of the dataset—I am unable to easily use it as a teaching resource, because I don't want to expose students to words that might harm them. It's useless to me as a resource for training or evaluating models, since I don't want my models to perpetuate harmful language, nor do I want to positively evaluate my models on their ability to perpetuate such language. (There are slurs in the dataset that have been used in violent ways against me and people like me in particular, and I'm not eager to see those pop up in the output of my own experiments.)

The paper indicates that these portmanteau words were "manually" collected, which makes it seem like they were hand-picked—if that's the case, then it should not affect the integrity of the research if you simply choose to include some examples and leave others out.

(Note that I'm not saying that it's illegitimate to study words with offensive, violent, harmful content—but the dataset should be clearly labeled as containing such content. Moreover, in my opinion there should be well-reasoned, published criteria for deciding what is included in the dataset, so that other researchers can make judgments on whether the dataset is appropriate for their uses, or propose and compare different criteria for creating their own datasets.)

AttributeError: 'list' object has no attribute 'encode'

After downgrading some libraries and fixing some pathing errors, I'm able to run python barebones_enc_dec.py HOLDOUTTEST --dynet-mem 7000 --dynet-seed 786786 as described in README_CODE.txt, however after constructing and training the network, it fails when attempting to save the model:

[dynet] random seed: 786786
[dynet] allocating memory: 7000MB
[dynet] memory allocation done.
Training fold  0
321
321
40
40
['/Users/Macbook/code/Charmanteau-CamReady/Code/language_model/', '/Users/Macbook/code/Charmanteau-CamReady/Code', '/usr/local/lib/python2.7/site-packages', '/Users/Macbook/code/Charmanteau-CamReady/Code', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python27.zip', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-darwin', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-mac', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-tk', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-old', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-dynload', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pgmpy-0.1.6-py2.7.egg', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/dyNET-0.0.0-py2.7-macosx-10.12-x86_64.egg']
MAX_SEQUENCE_LENGTH=  60
MAX_VOCAB_SIZE =  1500
embeddings_dim =  50
Using TensorFlow backend.
--- Loading CMU data
length of cmu data  133784
A couple of samples...
a
a(1)
a's
------------
Ignoring MAX_VOCAB_SIZE
Found vocab size =  48
Printing few sample sequences...
[1 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 1  3 13 27 19 26 15  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0]
[ 1  3 13 21 14 27  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0]
params['embeddings_dim'] =  50
lstm_cell_size=  100
/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/keras/layers/core.py:1206: UserWarning: `TimeDistributedDense` is deprecated, And will be removed on May 1st, 2017. Please use a `Dense` layer instead.
  warnings.warn('`TimeDistributedDense` is deprecated, '
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
inp (InputLayer)                 (None, 59)            0
____________________________________________________________________________________________________
embedding_1 (Embedding)          (None, 59, 50)        2550        inp[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 59, 100)       60400       embedding_1[0][0]
____________________________________________________________________________________________________
timedistributeddense_1 (TimeDist (None, 59, 50)        5050        lstm_1[0][0]
====================================================================================================
Total params: 68,000
Trainable params: 68,000
Non-trainable params: 0
____________________________________________________________________________________________________
None
2018-10-11 22:11:23.911843: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911864: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911870: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911876: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Loaded cache
Initializing Blind Cache
Initialized Blind Cache
Size of Blind Cache: 1624
The dy.parameter(...) call is now DEPRECATED.
        There is no longer need to explicitly add parameters to the computation graph.
        Any used parameter will be added automatically.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Saving Model
Traceback (most recent call last):
  File "barebones_enc_dec.py", line 1545, in <module>
    predictor.train(interEpochPrinting=False)
  File "barebones_enc_dec.py", line 1235, in train
    self.save_model()
  File "barebones_enc_dec.py", line 942, in save_model
    self.model.save(self.modelFile,[self.encoder,self.revcoder,self.decoder,self.encoder_params["lookup"],self.decoder_params["lookup"],self.decoder_params["R"],self.decoder_params["bias"]])
  File "_dynet.pyx", line 1448, in _dynet.ParameterCollection.save
  File "_dynet.pyx", line 1505, in _dynet.ParameterCollection.write_to_textfile
AttributeError: 'list' object has no attribute 'encode'
Dumped cache

vgtomahawk / charmanteau-camready Goto Github PK

charmanteau-camready's People

Contributors

Stargazers

Watchers

Forkers

charmanteau-camready's Issues

harmful slurs present in dataset

AttributeError: 'list' object has no attribute 'encode'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent