Comments (3)
求解释下一个文章截成k端后怎么输入训练的,没有找到是在哪个地方“分别输入语言模型”的?如果是这样,理论上是不是不管多长的文章都可以通过切成很多端,分别输入处理了,不用截断文章了
BertForSequenceClassification中的forward:
def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None,
position_ids=None, head_mask=None):flat_input_ids = input_ids.view(-1, input_ids.size(-1)) flat_position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None flat_token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
比如k=2,input_ids中就包含文章划分的两端,通过view又展平了,那输入的长度还是没有变短?和不划分一样?
https://github.com/guoday/CCF-BDCI-Sentiment-Analysis-Baseline/blob/master/pytorch_transformers/modeling_bert.py
操作在995-1004行,将k段reshape回来,然后经过gru
from ccf-bdci-sentiment-analysis-baseline.
“操作在995-1004行,将k段reshape回来,然后经过gru”,是reshape成之前k端,再经过gru吧,这块我可以理解。主要不清楚的是,一个样本截成k端后,怎么输入bert模型处理的?是相当于变成k个样本吗(label都一样)? output = pooled_output.reshape(input_ids.size(0),input_ids.size(1),-1).contiguous(),这里的pooled_output是k段的输出?
from ccf-bdci-sentiment-analysis-baseline.
“操作在995-1004行,将k段reshape回来,然后经过gru”,是reshape成之前k端,再经过gru吧,这块我可以理解。主要不清楚的是,一个样本截成k端后,怎么输入bert模型处理的?是相当于变成k个样本吗(label都一样)? output = pooled_output.reshape(input_ids.size(0),input_ids.size(1),-1).contiguous(),这里的pooled_output是k段的输出?
一开始假设有m个样本,每个样本截成k段,那么输入就是(mk,max_len),然后输入到bert,得到(mk,max_len,dim),取cls的向量,得到(m*k,dim)。然后reshape成(m,k,dim),经过gpu,得到(m,dim)
from ccf-bdci-sentiment-analysis-baseline.
Related Issues (16)
- RuntimeError: set_storage is not allowed on Tensor created from .data or .detach() HOT 4
- bert输出层对截断部分怎么融合 HOT 2
- 我看分类的demo上,是在bert之后接了一个lstm层吗, HOT 6
- 咨询一个问题:这些模型无显卡的笔记本能跑嘛? HOT 1
- 可否共享一下数据? 谢谢. 现在网站无法下载了 HOT 1
- /home/ming/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:1439: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples. 'recall', 'true', average, warn_for) test 0.06457949662369551 HOT 5
- 如果不想切分文本,想整个都放进去,应该怎么改,是不是只改参数就好了?
- 您好,既然test集的全是假标签0为什么还要导进去dataloader里面? HOT 5
- 运行robera-english报错的问题
- XLNet_zh_Large上的效果 HOT 1
- 可以用cpu么,怎么设置 HOT 2
- Roberta-large 与Roberta-mid的差别 HOT 3
- 为什么robert.sh 用的是run_bert.py呢 HOT 2
- 我不理解这里的cls是什么,还有这里的return super(BertTokenizer, cls)._from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
- run_bert.py eval_loss计算错误
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ccf-bdci-sentiment-analysis-baseline.