Code Monkey home page Code Monkey logo

foolnltk's People

Contributors

alanoooaao avatar forin-xyz avatar rockyzhengwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

foolnltk's Issues

analysis进行实体识别并对于自定义加载的字典无效。

博主,您好。最近在用您的代码进行中文分词时,在加载了自定义词典后,分词部分能被准确分出,但当进行实体识别时发现这个过程并没有加载自定义词典。。
以及想问博主加载的字典之后能否支持自定义的词性。

非常handy的工具~几点建议

未来的改动大么?希望保持现在的简单实用
还有
能否实现自定义词性字典?
能否实现自定义实体字典?
谢谢~

关于demo中dev.txt、test.txt、train.txt生成

    您好!用word2vec能变异将词料分好维度,但关于源码中dev.txt、test.txt、train.txt这几个文件是通过什么工具生成的?4词标注法(这就是词料么?)?这个是如何扩展加字?

研究了您的源代码,发现/fool/lexical.py文件中LexicalAnalyzer中ner方法中,返回的索引值的右侧值似乎是多加了1

如题。
跑test中ner方法调用,得到的结果是
ners: [[(2, 8, 'location', '北京***')], [(2, 5, 'location', '北京'), (9, 12, 'location', '非洲')], [], [(2, 8, 'location', '北京***')]]
==>但是实际上'北京***'等实体的在text中的文本索引应该是(2,7)才对。
后面还尚未研究是否有别的用意?
仅以记录单词在句子中的索引值的概念来理解的话,此处似乎是多加了1.

./mian export

image

I've got this error when run './main export', how can I fixed it?

UnicodeDecodeError in Windows

I'm trying it in Windows and get UnicodeDecodeError, wondering if we can replace "open(path)" with "codecs.open(path, encoding='utf-8')", e.g. line 21 in dictionary.py, to avoid this. Thanks

安装不了foolnltk

出现ERROR: Cannot uninstall 'wrapt'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
安装不了,求救

訓練失敗

您好
在我使用自己的语料训练模型时出现以下错误
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [6299,100] rhs shape= [1104,100]
(在步骤 train时,前面tfrecord都通过了)

请问该如何解决..

CRF

做词性标注任务时CRF用到的特征有哪些?
如果是我自己训练模型默认的CRF又用了哪些特征?

支持python2.7

仅需将fool目录下model.py中的
def load_map(path):
with open(path, 'rb') as f:
char_to_id, tag_to_id, id_to_tag = pickle.load(f)
return char_to_id, id_to_tag
改为:
def load_map(path):
with open(path) as f:
char_to_id, tag_to_id, id_to_tag = pickle.load(f)
return char_to_id, id_to_tag
即可

char_embeding 是否有错位

我看到你在char_embeding中增加了一行0向量,但我发现这会导致look_up的时候错位。

        # zero_pad = tf.constant(0.0, dtype=tf.float32, shape=[1, config["char_dim"]])
        # self.char_embeding = tf.concat(axis=0, values=[zero_pad, tf.get_variable(name="char_embeding", initializer=embeddings)])
        self.char_embeding = tf.get_variable(name="char_embeding", initializer=embeddings)

@rockyzhengwu

NER种类有哪些,分别是什么含义

试了下感觉准确率蛮高的,点个赞!
但是有个疑问,就是不知道此项目识别出的NER都有哪些,分别有什么具体的含义,比如product这个实体,是指一个真正的产品如一款汽车,还是说包含了所有人类智慧的产物,例如一本书,希望能有明确的定义,谢谢!

Please upload map.zip

工程执行中有爆出如下错误:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/fool/map.zip'`
查看代码,另外还有可能需要load这些文件:
def _load_seg_model(self):
self.seg_model = self._load_model("seg.pb", "char_map", "seg_map")

def _load_pos_model(self):
    self.pos_model = self._load_model("pos.pb", "word_map", "pos_map")

def _load_ner_model(self):
    self.ner_model = self._load_model("ner.pb", "char_map", "ner_map")

劳烦作者提供相关下载包,不胜感激。

What is the license of FoolNLTK source code , model and corpus ?

hi all ,

I want to integrate FoolNLTK into my project for personal use or may be in business product .

So , what is the license of FoolNLTK source code , model and corpus ?

Can we separate the source code , model and corpus with different license ?

Thanks in advance .

user dict has no effect in some cases

user dict in most cases has obvious effect, but not always, e.g.:
contents of user_dict.txt:
二十 10000
四百 10000
一千 10000
When run the command:
echo "一一千四百二十九" | python3 -m fool -d -u user_dict.txt
the result is "一一千四百二十九", not be cut apart.
But if the sentence to be cut is "一千四百二十九", it will be cut to "一千 四百 二十 九". That's what I want.
How to explain the results and what can be done if I want the result be: "一 一千 四百 二十 九"?

TypeError: the JSON object must be str, not 'bytes'

I ran the example in the README to segment a Chinese sentense , but a TypeError occured as follow

$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fool
>>> text="一个傻子在北京"
>>> print(fool.cut(text))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/fool/__init__.py", line 65, in cut
    all_words = LEXICAL_ANALYSER.cut(text)
  File "/usr/local/lib/python3.5/dist-packages/fool/lexical.py", line 98, in cut
    self._load_seg_model()
  File "/usr/local/lib/python3.5/dist-packages/fool/lexical.py", line 39, in _load_seg_model
    self.seg_model = self._load_model("seg.pb", "char_map", "seg_map")
  File "/usr/local/lib/python3.5/dist-packages/fool/lexical.py", line 35, in _load_model
    char_to_id, id_to_seg = _load_map_file(self.map_file_path, word_map_name, tag_name)
  File "/usr/local/lib/python3.5/dist-packages/fool/lexical.py", line 18, in _load_map_file
    data = json.load(myfile)
  File "/usr/lib/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

How to handle this error? Many thanks !

FYI:

  • foolnltk version: 0.1.1 ( installed by sudo pip3 install foolnltk --no-cache-dir )
  • python35
  • ubuntu16.4(x64)

词性对照表

你好,请问词性标注时你用的是哪个标准的词性对照表?

加载用户字典不起作用以及实体未识别出来的情况

博主好,foolnltk使用时发现加载用户字典不起作用,不知道是什么原因导致的,具体如下:
环境:win10+python3.6

fool.analysis('阿里收购饿了么')
返回:([[('阿里', 'nz'), ('收购', 'v'), ('饿', 'v'), ('了', 'y'), ('么', 'y')]], [[(0, 3, 'company', '阿里')]])

用户字典格式:
饿了么 10

fool.load_userdict(path)
fool.analysis('阿里收购饿了么')
返回:([[('阿里', 'nz'), ('收购', 'v'), ('饿', 'v'), ('了', 'y'), ('么', 'y')]], [[(0, 3, 'company', '阿里')]])

加载用户字典似乎不起作用?分词时“饿了么”还是被拆开了,实体识别中也没识别出来

./main train

When I train the model myself, how to assign one GPU to use.

屏蔽warning

怎样屏蔽这些碍眼的warning,另外fool也警告了,需要注意哦。

np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING: Logging before flag parsing goes to stderr.
W0804 17:59:34.962490 140736280798080 deprecation_wrapper.py:119] From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fool/predictor.py:32: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W0804 17:59:34.962846 140736280798080 deprecation_wrapper.py:119] From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fool/predictor.py:33: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

W0804 17:59:35.032593 140736280798080 deprecation_wrapper.py:119] From /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fool/predictor.py:53: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-08-04 17:59:35.033006: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Run error:text = ""

import fool

text = ""
print(fool.cut(text))

/Users/howie/anaconda3/envs/python36/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
starting load model 
2017-12-28 21:16:02.363773: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
loaded model cost : 1.258011s
Traceback (most recent call last):
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnimplementedError: TensorArray has size zero, but element shape [?,100] is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
	 [[Node: prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArrayStack/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:@prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArray"], dtype=DT_FLOAT, element_shape=[?,100], _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArray, prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArrayStack/range, prefix/char_BiLSTM/bidirectional_rnn/bw/bw/while/Exit_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/howie/Documents/programming/python36/hi/01.py", line 10, in <module>
    print(fool.cut(text))
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/fool/__init__.py", line 37, in cut
    words, _, _ = LEXICAL_ANALYSER.cut(text)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/fool/lexical.py", line 121, in cut
    seg_path = self.seg_model.predict(input_chars)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/fool/predictor.py", line 59, in predict
    logits, trans = self.sess.run([self.logits, self.trans], feed_dict=feed_dict)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnimplementedError: TensorArray has size zero, but element shape [?,100] is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
	 [[Node: prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArrayStack/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:@prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArray"], dtype=DT_FLOAT, element_shape=[?,100], _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArray, prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArrayStack/range, prefix/char_BiLSTM/bidirectional_rnn/bw/bw/while/Exit_1)]]

Caused by op 'prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArrayStack/TensorArrayGatherV3', defined at:
  File "/Users/howie/Documents/programming/python36/hi/01.py", line 10, in <module>
    print(fool.cut(text))
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/fool/__init__.py", line 36, in cut
    _check_model()
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/fool/__init__.py", line 26, in _check_model
    LEXICAL_ANALYSER.load_model()
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/fool/lexical.py", line 78, in load_model
    self.seg_model = Predictor(os.path.join(data_path, "seg.pb"), self.map.num_seg)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/fool/predictor.py", line 42, in __init__
    self.graph = load_graph(model_path)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/fool/predictor.py", line 36, in load_graph
    tf.import_graph_def(graph_def, name="prefix")
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 313, in import_graph_def
    op_def=op_def)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/Users/howie/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

UnimplementedError (see above for traceback): TensorArray has size zero, but element shape [?,100] is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
	 [[Node: prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArrayStack/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:@prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArray"], dtype=DT_FLOAT, element_shape=[?,100], _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArray, prefix/char_BiLSTM/bidirectional_rnn/bw/bw/TensorArrayStack/range, prefix/char_BiLSTM/bidirectional_rnn/bw/bw/while/Exit_1)]]

模型文件

您好,我想问一下, 如果我暂时不训练模型,您的文件里有训练好的模型么,我想直接利用模型进行分词测试。

Error when trying to load exported model

I trained the model and exported it by instructions.

I wrote a script to load the model like "load model section "
An error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 1: invalid start byte"
occurred when executing to this line
fool.load_model(map_file=map_file, model_file=checkpoint_ifle)

I fixed it by change load_graph of fool/model.py
with tf.gfile.GFile(path) as f:
to
with tf.gfile.GFile(path,'rb') as f

Please fix it, thank you!

请教个问题

@rockyzhengwu
请教个问题

        with tf.variable_scope("crf_loss" if not name else name):
            small = -1000.0
            start_logits = tf.concat(
                [small * tf.ones(shape=[self.batch_size, 1, self.num_tags]), tf.zeros(shape=[self.batch_size, 1, 1])],
                axis=-1)

            pad_logits = tf.cast(small * tf.ones([self.batch_size, self.num_steps, 1]), tf.float32)
            logits = tf.concat([project_logits, pad_logits], axis=-1)
            logits = tf.concat([start_logits, logits], axis=1)
            targets = tf.concat(
                [tf.cast(self.num_tags * tf.ones([self.batch_size, 1]), tf.int32), self.targets], axis=-1)

            self.trans = tf.get_variable(
                "transitions",
                shape=[self.num_tags + 1, self.num_tags + 1],
                initializer=self.initializer)

            log_likelihood, self.trans = crf_log_likelihood(
                inputs=logits,
                tag_indices=targets,
                transition_params=self.trans,
                sequence_lengths=lengths + 1)

在做CRF层的时候,在预测值和实际值周围加上一个-1000的维度有什么作用吗?

TypeError: 'NoneType' object is not callable

你好,我在测试的时候有时会报:
TypeError: 'NoneType' object is not callable
Exception ignored in: <bound method BaseSession.del of <tensorflow.python.client.session.Session object at 0x0000000019C8A208>>
再次运行的时候又正常,我使用的平台为tensorflow 1.4.0 win7。这是我的版本问题吗?

run error in mac ox

/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/altman/IdeaProjects/gopath/src/labs/fool_test/test.py
/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
starting load model
Traceback (most recent call last):
File "/Users/altman/IdeaProjects/gopath/src/labs/fool_test/test.py", line 4, in
print(fool.cut(text))
File "/usr/local/lib/python3.6/site-packages/fool/init.py", line 36, in cut
_check_model()
File "/usr/local/lib/python3.6/site-packages/fool/init.py", line 26, in _check_model
LEXICAL_ANALYSER.load_model()
File "/usr/local/lib/python3.6/site-packages/fool/lexical.py", line 75, in load_model
self.map = DataMap(os.path.join(data_path, "maps.pkl"))
File "/usr/local/lib/python3.6/site-packages/fool/lexical.py", line 20, in init
self._load(path)
File "/usr/local/lib/python3.6/site-packages/fool/lexical.py", line 26, in _load
with open(path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/fool/maps.pkl'

Process finished with exit code 1

关于分词和NER结果的结构问题

fool.analysis 和fool.cut 的结果之前版本都是返回一个list结构,现在(0.1.3)为什么是一个嵌套的list呢?有什么特殊作用吗?

2018-01-22 9 31 16

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.