Code Monkey home page Code Monkey logo

instructabsa's People

Contributors

him1411 avatar kevinscaria avatar siddharth2011 avatar srsawant34 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

instructabsa's Issues

Experimental reproduction results

I tried to build datasets for Laptop and Rest14, and then I conducted experiments on aspect term extraction on Laptop, different conditions were used to determine true positive in the get_metrics function, and the results are as follows:
Complete:

if pred_val.lower() == gt_val.lower():  

Inomplete:

if pred_val.lower() in gt_val.lower() or gt_val.lower() in pred_val.lower():  

code: untils.py

for gt_val in gt_list:
    for pred_val in pred_list:
        # if pred_val.lower() == gt_val.lower() or gt_val.lower() == pred_val.lower():  
        if pred_val.lower() in gt_val.lower() or gt_val.lower() in pred_val.lower():        
            tp+=1
            break

image

Unable to reproduce the results

Hello, I was not able to reproduce your results since your data is not open-sourced. I have transformed the data following the format you requested in your README, as shown in the attached image.

image

However, the results I obtained were far from the results reported in your paper. Would it be possible for you to open-source the data or help point out if there is any mistake in the data format transformation that might have caused this issue?

image

Partial results for long text blobs

Hi Kevin,

Such a great repo, thank you so much for the work. I'm wondering for this generative model, is there a parameter I can set to generate for longer sequences rather one sentence? I found the model is giving partial answers when I'm giving it a paragraph. I tried splitting the paragraph into sentences, but it's way too slow for an API standard. Do you have a better idea on how to do this?

Many thanks,
Bowen

The resource of datasets

Hello,

I have some confusion regarding the datasets provided in your Git repository. In your paper, you mentioned that the benchmarks are sourced from the original SemEval 14, 15, and 16. However, I noticed that you included Peng's datasets in your repository. While Peng's datasets are refined versions of the SemEval data, there are still some differences between the two. Therefore, I would like to know which resources your test results are based on. Thank you for your attention to this matter.

Dataset during pretrained.

Thanks for great work, I wonder is there other dataset that is not english-based you used during training?

requirements.txt?

Hi! Great repo :). Would you mind adding instructions on how to make this work?

For example, a requirements.txt file specifying which versions each library should have, and the python version.

I can help you if you don't know how to do it.

Thank you!

Missing create_data_in_joint_task_format method at DatasetLoader

Hi. Thanks for sharing the code of your papers.

I'm trying to reproduce the training for the joint task, but this error appears:
AttributeError: 'DatasetLoader' object has no attribute 'create_data_in_joint_task_format'
There is any way to solve it with the other methods in the class?

Thanks!

I am getting this error: ValueError: Trainer: evaluation requires an eval_dataset.

ValueError Traceback (most recent call last)
in <cell line: 4>()
2 get_ipython().system(' pip install -U accelerate')
3 get_ipython().system(' pip install -U transformers')
----> 4 model_trainer = t5_exp.train(id_tokenized_ds, **training_args)

6 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in get_eval_dataloader(self, eval_dataset)
886 """
887 if eval_dataset is None and self.eval_dataset is None:
--> 888 raise ValueError("Trainer: evaluation requires an eval_dataset.")
889 eval_dataset = eval_dataset if eval_dataset is not None else self.eval_dataset
890 data_collator = self.data_collator

ValueError: Trainer: evaluation requires an eval_dataset.

dataset

Hello!
May I share your data set? I want to try to reproduce the experimental data in your paper. If I can, I will be very grateful!

The ignorance of "conflict" label

Hi, I would like to know if you do any dataset processing in the joint task scenario. You mentioned in your paper that you have ignored the "conflict" label in ATSC task, so I want to know if you have deleted those sentences with "conflict" labels in the evaluation of the joint task. To be more specific, if the model did not predict the polarity as "conflict" in the joint task, would you ignore this wrong prediction? Because there are only examples for positive, neutral, and negative cases in the instruction, I was thinking about how you treat the "conflict" case in the joint task. Thank you.

ATE Training Script is not working

i tried to train your ate_train.sh script on SemEval16 dataset but it's not working.

My given parameters on colab
Screenshot from 2023-08-25 21-07-59

The output shows:
Screenshot from 2023-08-25 21-08-46

can you please provide instructions in the Readme.md file to run this model properly?

Thanks

Miss match seperater in `get_metrics` function

I think there is a small mismatch seperater between create_data_in_joint_task_format function and get_metrics function:

  • In create_data_in_joint_task_format function, data is joined by ','

      df['labels'] = df[aspect_col].apply(lambda x: ','.join([f"{i[key]}:{i[label_key]}" for i in x]))
    
  • In get_metrics function, data is splitted by ', ':

       gt_list = gt.split(', ')
    

Can you help me check if there is a typo?

Many thanks,
Dan

Converting Huggingface Seq2SeqLMOutput object to (aspect, sentiment) form

Hello,

Using Huggingface pre-trained transformer, I am getting the output of:
'The cab ride was amazing but the service was pricey' in the form of Seq2SeqLMOutput. I want to convert this in the form of (aspect, sentiment).
I tried using the model.decode(tokenizer.decode(predicted_output.logits[0], skip_special_tokens=True), but I am getting this error:
argument 'ids': 'list' object cannot be interpreted as an integer

Google Colab link where I have implemented the model:
https://colab.research.google.com/drive/1gcHaM4ehqccX2zGIe8RbCeN6Q-hZh0bb?usp=sharing

Please help in this regard.

Limiting labels

This package is great, thanks!

  1. For production, it would be desirable to limit the number of topic-sentiment pairs which can be output for each sample.
  2. The length of the output string is too short and cannot handle long topic labels well.
  3. It would be desirable to limit the total number of topics that can be output to avoid having the model generate rare labels which complicate post-processing.
  4. There could be a setting to enforce that only labels in the training data could be assigned.

Are the checkpoints missing?

Hey there. I am not that sure, how to load the model for inference. Is it possible that the model checkpoints are missing?

Greetings from Germany! :D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.