Code Monkey home page Code Monkey logo

sarc's Issues

subsets to SARC Files

Hi, ~~ Thanks a million if you can help me clarify a data source question!
I visited the link to SARC Files. There are only two folders - main and pol for sarcasm evaluation, right?

However, I read two papers [1] [2], they all claimed they used two subsets: /r/movies and /r/technology from your SARC dataset. [1] claimed they used 8188 samples from /r/movies and 22510 samples from /r/technology. [2] claimed they used said they used 15019 samples from r/movies and 13485 samples from r/technology. [2] gave a link to SARC-2.0.

I checked that in the SARC-2.0 main folder, there are
train-balanced r/movies=1414, train-unbalanced r/movies=121595
test-balanced r/movies=364, test-unbalanced r/movies=27930
train-balanced r/technology=1652, train-unbalanced r/technology=73641
test-balanced r/technology=408, test-unbalanced r/technology=20104

It's impossible to get a balanced dataset for both [1] and [2], right?.. the sarc samples are very few compared with non-sarc samples...

could you help me check if there is any way for the two papers to get such data?
Thanks a million! This confused me quite a lot.

[1] https://www.aclweb.org/anthology/P18-1093.pdf
[2] https://dl.acm.org/doi/pdf/10.1145/3308558.3313735

raw/sarc.csv find user comments

Hi~~~

I'm investigating sarcasm detection using your dataset and particularly I'm collecting users' information now. Please correct where I misunderstand.

  1. from SARC/2.0/README.txt, it reads that raw/sarc.csv contains sarcastic and non-sarcastic comments of authors in authors.json. I read the raw/sarc.csv dataset, the first example shows [0, "Yousa guys didn't upvote nothing!", 'BritishEnglishPolice',
    'worldpolitics', 3, 3, 0, '2009-01', 1233446126,
    "Mafia business 'equal' to 9% of Italian GDP", 'c07e6gg',
    '7tvvp']
    I guess the sentence "Yousa guys didn't upvote nothing!" is a post, "BritishEnglishPolice" is an author made this post, "worldpolitics" is subrredit, "3, 3, 0" correspond to scores/ups/downs respectively, "2009-01" and "1233446397" are date and UTC. "Mafia business 'equal' to 9% of Italian GDP" is a comment on this post. And "c07e6gg" and "7tvvp" are sarc and non-sarc responses to this comment. But when I search the comments.json for "c07e6gg" and "7tvvp", it returned nothing. Beside. what does "0" at the beginning mean? I see 0 appeared in many example. Could you help me understand this sarc.csv file? As my goal is to acquire some sarc and non-sarc comments or comments for a given author.

  2. I'm using SARC/2.0 main and pol datasets. I think SARC/2.0 should contain all those in SARC/1.0 and SARC/0.0, right?

Thank you very much!
Best regards

IndexError: list index out of range

hey all,

I'm running your code, and I'm running into an issue with running the bag of words model

So I've made a slight modifications to the instructions, so maybe that's playing a role, but I'm running the command line within the SARC directory:
`directory stuff/SARC> python eval.py main -l

Load SARC data
Traceback (most recent call last):
File "eval.py", line 119, in
main()
File "eval.py", line 45, in main
load_sarc_responses(train_file, test_file, comment_file, lower=args.lower)
File "directory stuff\SARC\utils.py", line 34, in load_sarc_responses
responses = row[1].split(' ')
IndexError: list index out of range`

The reason I'm running this within the SARC directory is in eval.py I've added a few lines of code(listed below) before everything else in eval.py so it would include the text_embeddings module (it shares a parent directory of SARC), and was the only way I could figure out importing the text_embedding within module eval.py so if there's a better way I'm all ears!
`import sys

sys.path.append('../')`

Final note, this error has occurred with both the data provided in the README (https://nlp.cs.princeton.edu/SARC/2.0/) and from the SARC data on Kaggle (https://www.kaggle.com/danofer/sarcasm), as the data from the readme looks odd when viewed in excel and I initially thought is messing with the csv parsing

Thank you,

Matt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.