Comments (6)
噢 似乎self.tokenizer.tokenize(triple[0])[1:-1]是为了规避[SEP]与[CLS]token,那么这种tokenize表现就是bug?但是如果换一个tokenizer, 代码中预测部分,预测出来的结果也会少一个字,所以不是很清楚到底是出于何种考虑。
from casrel.
问题1:代码中的tokenizer貌似是针对英文token的,对每个单词wordpiece,把单词之间的空格替换成[unused1]。如果是中文会出现你描述的情况,中文的tokenizer还需要改写下。
from casrel.
同意 @Phoeby2618 的说法,我试了(1)把中文分割成带空格的类似英文的格式,用代码里面的HBTokenizer(2)中文用原文,tokenizer用原生的Tokenier加上[unused1],metric函数中把' '.join(sub.split('[unused1]'))
也改过来了。(3)中文用原文,tokenizer用原生的Tokenier不加[unused1],metric同上。
前2者结果差不多。最后一种情况,pred的关系实体总是为0。应该是[unused1]不能随便去掉,暂时没搞清楚咋回事。
from casrel.
我现在也发现了这个问题,打算试试您上面说的方法(2),不知您现在有没有更好的办法。
from casrel.
这里的self.tokenizer.tokenize(triple[0])[1:-1]确实是为了规避开头的[CLS]标签和末尾的[SEP],这是函数内部拼接上去的,但是有个问题就是,如果实体token不在词典,那么该实体token就会被细分成多个token。
from casrel.
各位大佬,能分享下处理中文数据的代码嘛,或者怎么修改
from casrel.
Related Issues (20)
- 测试时是一条一条数据进行的吗
- 能否提供一份requirements.txt HOT 1
- AttributeError HOT 12
- 环境问题 HOT 1
- 内存占用问题
- python run.py 时出现 AttributeError: 'tuple' object has no attribute 'layer' HOT 2
- 运行run.py文件时显示tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files HOT 3
- tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[6] = [6, -1] does not index into param shape [8,105,768] [[{{node lambda_3/GatherNd}}]] HOT 2
- if not a['relationMentions']: TypeError: list indices must be integers or slices, not str HOT 1
- Tensorflow and keras versions not compatible
- 'Node' object has no attribute 'output_masks'
- Prepare my own text dataset HOT 1
- 关于更换数据集的格式问题 HOT 1
- 运行run.py报错, HOT 1
- Result is very poor
- Need Help for same problem #48
- 求环境 HOT 2
- 显示无法找到该文件,saved_weights/NYT/best_model.weights' HOT 3
- Haha, I use the GPU on my laptop, but I have to set BATCH_SIZE to 1 so I don't run out of memory,there are too many Total params HOT 1
- 求一份训练好的best_model.weights
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from casrel.