Code Monkey home page Code Monkey logo

Comments (15)

nitkannen avatar nitkannen commented on September 3, 2024 1

Hi,
@Tribleave Thanks for your reply. What is your recommendation for the optimal number of epochs to train BERT or T5? Looks like the performance is high after a certain number of epochs, but decreases if the contrastive loss is allowed to decrease further. Any idea why this happens? Thanks in advance!

from scapt-absa.

Tribleave avatar Tribleave commented on September 3, 2024

你好,关于预训练过程中可能产生的问题,我想了解你的实验环境和batch size。如果有调整需要确认下loss是否下降并收敛。
对于checkpoint的问题,我们选用后4个epoch中所有checkpoint微调来选择模型,我猜想这是和结果不同的原因之一。在实验中我们也发现由于微调的数据集比较小,因此不同checkpoint间性能会出现波动。

from scapt-absa.

CreaterLL avatar CreaterLL commented on September 3, 2024

from scapt-absa.

Tribleave avatar Tribleave commented on September 3, 2024

你好,我们选取模型的策略是这样的。我们在实验中也发现并不是最后一个(或若干个的checkpoint)是最好的。
代码问题我已经更新了,感谢你反馈。这是我在整理代码时的疏忽!

from scapt-absa.

CreaterLL avatar CreaterLL commented on September 3, 2024

from scapt-absa.

StudentWorker avatar StudentWorker commented on September 3, 2024

这个配置能跑出来是吧,我现在是不断提示显存不足

from scapt-absa.

StudentWorker avatar StudentWorker commented on September 3, 2024

我想知道这个显存要吃到多少,才扛得住,我这会是1060 3G

from scapt-absa.

CreaterLL avatar CreaterLL commented on September 3, 2024

我想知道这个显存要吃到多少,才扛得住,我这会是1060 3G

目前我还跑不到论文中的效果。
这个配置不建议你跑,3G的显存应该跑不了。

from scapt-absa.

Tribleave avatar Tribleave commented on September 3, 2024

我想知道这个显存要吃到多少,才扛得住,我这会是1060 3G

我们的实验是在3090上跑的,3G显存预训练是不太可行了。

from scapt-absa.

ljzsky avatar ljzsky commented on September 3, 2024

我没有改代码里的任何配置。请问要在yelp上训练多少个step,在res14的数据集上微调时能达到论文里的效果?我每20000个step的模型都试了一下,最好的是在140000步的时候到了86.7%,比不预训练高1%左右

from scapt-absa.

Tribleave avatar Tribleave commented on September 3, 2024

我没有改代码里的任何配置。请问要在yelp上训练多少个step,在res14的数据集上微调时能达到论文里的效果?我每20000个step的模型都试了一下,最好的是在140000步的时候到了86.7%,比不预训练高1%左右

我们的实验中,BERT在SCAPT下的预训练速度都是比较快的,Yelp上释出的模型是我们在第5个epoch得到的结果。比不预训练提高1%感觉有些问题。

from scapt-absa.

nitkannen avatar nitkannen commented on September 3, 2024

Hi @Tribleave,
Hope you are doing good! I have a query about how the sentiment labels of MAMS dataset was decided during contrastive pre-training. Asking because the MAMS data sentence is not necessarily only positive or negative, but can contain multiple sentiments within the sentence for different aspects right? So how was the contrastive loss minimized by deciding a sentence label in such cases? Would love to know your view. Thanks!

from scapt-absa.

Tribleave avatar Tribleave commented on September 3, 2024

Hi @Tribleave, Hope you are doing good! I have a query about how the sentiment labels of MAMS dataset was decided during contrastive pre-training. Asking because the MAMS data sentence is not necessarily only positive or negative, but can contain multiple sentiments within the sentence for different aspects right? So how was the contrastive loss minimized by deciding a sentence label in such cases? Would love to know your view. Thanks!

Hi @nitkannen
It's an interesting problem because our pretraining method is applied on sentence level corpus, but our models still work well in MAMS. We guess that there are several source of aspect-level knowledge:

  1. Masked aspect predicting objective. The model pretrained with this task will be sensitive to the aspect in reviews.
  2. Aspect-aware finetuning. The finetuning method can bring the aspect information the classification, and we think it is the major cause that model can deal multi-aspect scenario.

from scapt-absa.

nitkannen avatar nitkannen commented on September 3, 2024

Hi @Tribleave
Thanks for your reply! It makes sense how the model has learnt aspect level information. Could you specify how you collected sentence-level sentiment corpus for pre-training? As in the paper, it is mentioned that Semeval-14 and MAMS data was used, but these are not sentence-level corpus, right? Were the sentences with a single sentiment picked to construct the corpus? Thanks.

from scapt-absa.

Tribleave avatar Tribleave commented on September 3, 2024

Hi @nitkannen,
Sorry for the late reply.

Our pretraining corpus for SemEval Restaurant and MAMS is collected from YELP, and use the corpus collected from Amazon for SemEval Laptop (see https://github.com/Tribleave/SCAPT-ABSA#for-pre-training). And the ABSA datasets are not used in pretraining.

As you mentioned, the original reviews in YELP and Amazon are document-level, and we preprocessed them into sentence-level data, you can see the details in paper Sec.4 Retrieved External Corpora part. (I think they are clear enough).

Thanks for your question and be free to ask if you still have some puzzles.

from scapt-absa.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.