Code Monkey home page Code Monkey logo

charmanteau-camready's People

Contributors

harsh19 avatar vgtomahawk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

charmanteau-camready's Issues

harmful slurs present in dataset

The presence of harmful slurs severely hampers the usefulness of the dataset—I am unable to easily use it as a teaching resource, because I don't want to expose students to words that might harm them. It's useless to me as a resource for training or evaluating models, since I don't want my models to perpetuate harmful language, nor do I want to positively evaluate my models on their ability to perpetuate such language. (There are slurs in the dataset that have been used in violent ways against me and people like me in particular, and I'm not eager to see those pop up in the output of my own experiments.)

The paper indicates that these portmanteau words were "manually" collected, which makes it seem like they were hand-picked—if that's the case, then it should not affect the integrity of the research if you simply choose to include some examples and leave others out.

(Note that I'm not saying that it's illegitimate to study words with offensive, violent, harmful content—but the dataset should be clearly labeled as containing such content. Moreover, in my opinion there should be well-reasoned, published criteria for deciding what is included in the dataset, so that other researchers can make judgments on whether the dataset is appropriate for their uses, or propose and compare different criteria for creating their own datasets.)

AttributeError: 'list' object has no attribute 'encode'

After downgrading some libraries and fixing some pathing errors, I'm able to run python barebones_enc_dec.py HOLDOUTTEST --dynet-mem 7000 --dynet-seed 786786 as described in README_CODE.txt, however after constructing and training the network, it fails when attempting to save the model:

[dynet] random seed: 786786
[dynet] allocating memory: 7000MB
[dynet] memory allocation done.
Training fold  0
321
321
40
40
['/Users/Macbook/code/Charmanteau-CamReady/Code/language_model/', '/Users/Macbook/code/Charmanteau-CamReady/Code', '/usr/local/lib/python2.7/site-packages', '/Users/Macbook/code/Charmanteau-CamReady/Code', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python27.zip', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-darwin', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-mac', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-tk', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-old', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/lib-dynload', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pgmpy-0.1.6-py2.7.egg', '/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/dyNET-0.0.0-py2.7-macosx-10.12-x86_64.egg']
MAX_SEQUENCE_LENGTH=  60
MAX_VOCAB_SIZE =  1500
embeddings_dim =  50
Using TensorFlow backend.
--- Loading CMU data
length of cmu data  133784
A couple of samples...
a
a(1)
a's
------------
Ignoring MAX_VOCAB_SIZE
Found vocab size =  48
Printing few sample sequences...
[1 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 1  3 13 27 19 26 15  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0]
[ 1  3 13 21 14 27  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0]
params['embeddings_dim'] =  50
lstm_cell_size=  100
/Users/Macbook/.pyenv/versions/2.7.10/lib/python2.7/site-packages/keras/layers/core.py:1206: UserWarning: `TimeDistributedDense` is deprecated, And will be removed on May 1st, 2017. Please use a `Dense` layer instead.
  warnings.warn('`TimeDistributedDense` is deprecated, '
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
inp (InputLayer)                 (None, 59)            0
____________________________________________________________________________________________________
embedding_1 (Embedding)          (None, 59, 50)        2550        inp[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM)                    (None, 59, 100)       60400       embedding_1[0][0]
____________________________________________________________________________________________________
timedistributeddense_1 (TimeDist (None, 59, 50)        5050        lstm_1[0][0]
====================================================================================================
Total params: 68,000
Trainable params: 68,000
Non-trainable params: 0
____________________________________________________________________________________________________
None
2018-10-11 22:11:23.911843: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911864: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911870: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-10-11 22:11:23.911876: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Loaded cache
Initializing Blind Cache
Initialized Blind Cache
Size of Blind Cache: 1624
The dy.parameter(...) call is now DEPRECATED.
        There is no longer need to explicitly add parameters to the computation graph.
        Any used parameter will be added automatically.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see https://github.com/clab/dynet/pull/695 for details.
Saving Model
Traceback (most recent call last):
  File "barebones_enc_dec.py", line 1545, in <module>
    predictor.train(interEpochPrinting=False)
  File "barebones_enc_dec.py", line 1235, in train
    self.save_model()
  File "barebones_enc_dec.py", line 942, in save_model
    self.model.save(self.modelFile,[self.encoder,self.revcoder,self.decoder,self.encoder_params["lookup"],self.decoder_params["lookup"],self.decoder_params["R"],self.decoder_params["bias"]])
  File "_dynet.pyx", line 1448, in _dynet.ParameterCollection.save
  File "_dynet.pyx", line 1505, in _dynet.ParameterCollection.write_to_textfile
AttributeError: 'list' object has no attribute 'encode'
Dumped cache

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.