Code Monkey home page Code Monkey logo

neural_name_tagging's People

Contributors

limteng-rpi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

neural_name_tagging's Issues

对这个论文效果的验证

我认为这个论文无效:
在中文数据集上的试验,使用 jieba 分词,代码tf 如下:

        with tf.variable_scope('word_char_embedding_combine'):
            with tf.variable_scope('word_freq_conversion'):
                
                word_freq_compress = self.word_freq_placeholder * 0.005
                word_freq_conversion = tf.tanh(word_freq_compress, name='tanh_word_freq')
                self.word_freq_conversion = word_freq_conversion

            #with tf.variable_scope("word_embedding_gate"):
                # word 级别的各项参数
                #word_w = tf.layers.dense(inputs=self.x_word_embedding, units=1, activation=None)
                #char_w = tf.layers.dense(inputs=x_char_cnn_embedding, units=1, activation=None)
            #    freq_w = tf.layers.dense(inputs=word_freq_conversion, units=1, activation=None)
            #    b_w = tf.get_variable(shape=[1], name="b_w")
            #    word_gate = tf.sigmoid(freq_w + b_w + word_w + char_w, name='word_embedding_gate')
            self.word_gate = word_freq_conversion

可以看一下上面的代码是否有误

隐藏层的那一部分同理

试验效果

['同时', ',']
['同时<pad_char><pad_char>', ',<pad_char><pad_char><pad_char>']
word_freq: [1. 1. ]
word_gate: [0.98816895 0.9967069 ]
char_gate: [0.89430475 0.8514535 ]

['至此', ',']
['至此<pad_char><pad_char>', ',<pad_char><pad_char><pad_char>']
word_freq: [0.9715941 1. ]
word_gate: [0.99966156 0.9967069 ]
char_gate: [0.7587908 0.8514535 ]

['<unk_word>', '…']
['多大仇<pad_char>', '…<pad_char><pad_char><pad_char>']
word_freq: [0.00499996 1. ]
word_gate: [0.9449244 0.9953596 ]
char_gate: [0.7027143 0.87556285 ]

['<unk_word>', ',']
['鳞次栉比', ',<pad_char><pad_char><pad_char>']
word_freq: [0.00499996 1. ]
word_gate: [0.96310204 0.9967069 ]
char_gate: [0.68618333 0.8514535 ]

['<unk_word>', ',']
['正巧<pad_char><pad_char>', ',<pad_char><pad_char><pad_char>']
word_freq: [0.00499996 1. ]
word_gate: [0.7656248 0.9967069 ]
char_gate: [0.7440602 0.8514535 ]

可以看一下,这里的门概率取值和本身词汇 的词频没任何线性关系,完全是随机的。
想问一下,既然假设前提是,词频低的词向量不够可靠,那决定因素是词频,和不准确的词向量本身有什么关系呢?

然后我将词向量全部剔除,只保留词频做参数,添加一个线性变换,得到的结论是:

  • word-gate 的值范围在 0.77~0.93 之间,
  • 若 tanh 为 1,即词向量非常可靠,则门取值 0.93,
  • 若tanh 为 0.004 ,即词向量非常不可靠,则门去 0.77

即 门控的影响很小,在训练过程中被压缩了。
隐藏层的影响范围更加小,在 0.72~0.77 之间。

最终的在人民日报、MSRA、boson等数据集上的测试效果均不理想,不如直接将 char 和 word 的向量直接 concat。

所以,论文方法无效。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.