Code Monkey home page Code Monkey logo

Comments (14)

nlpjoe avatar nlpjoe commented on August 19, 2024

from ccf-bdci-automotive-field-asc-2018.

hu-chia avatar hu-chia commented on August 19, 2024

同样卡在了这个步骤,最后生成的结果集的内容完全依赖这个676.csv文件,但是并没有在任何步骤生成这个文件。

from ccf-bdci-automotive-field-asc-2018.

nlpjoe avatar nlpjoe commented on August 19, 2024

@Oldmoooon 这是另一个队友做的单分类结果,主要是补齐空行,不要他也可以的

from ccf-bdci-automotive-field-asc-2018.

hu-chia avatar hu-chia commented on August 19, 2024

pack_sub_dt2.py里,最终导出的dt3_stacking_submission.csv这个文件的内容,完全来自这个文件,我检查了所有的变量,train_oof_pred这个变量似乎是主要结果?如果可以简单直接修改pack_sub_dt2.py文件可以跳过676这个文件,请告知正确方法。

from ccf-bdci-automotive-field-asc-2018.

nlpjoe avatar nlpjoe commented on August 19, 2024

@Oldmoooon 第53行出现的lost_df就是676文件内容,后面删掉就行了

from ccf-bdci-automotive-field-asc-2018.

hu-chia avatar hu-chia commented on August 19, 2024
# pad_file = '../data/submit/676.csv'
# soft_df = pd.read_csv(pad_file)
# lost_df = soft_df[soft_df['content_id'].isin(lost_ids)]
submit_df = pd.DataFrame(
    {
        'content_id': content_ids,
        'subject': subjects,
        'sentiment_value': sentiment_values,
        'sentiment_word': '' * len(subjects)
    }
)

print('n_none:', n_none)
# print('n_pad:', len(lost_df))

我把这段代码改成了这样,但是sentiment_word显然不对,sentiment_value导出的内容似乎也有问题。

from ccf-bdci-automotive-field-asc-2018.

hu-chia avatar hu-chia commented on August 19, 2024

感觉单分类得到的结果相当重要,不使用676.csv的情况下,得到的结果绝大多数sentiment_value都是0.

from ccf-bdci-automotive-field-asc-2018.

nlpjoe avatar nlpjoe commented on August 19, 2024

@Oldmoooon 时间太长我有点记不太清了,不过应该是文本数量的原因,你需要在lost_ids(空行)的地方把subject设置为空补齐。你用pdb debug下看看应该就知道了。

单分类在这个题上提分大概在0.5-1个点左右。

单分类结果只是一种针对这个题的一种过拟合方式,它只是补齐多分类一个都没有识别出来的情况,而实际的实际情况是有可能没有subject的,但官方给的测试集都是有subject的,所以我们这样做了。

from ccf-bdci-automotive-field-asc-2018.

hu-chia avatar hu-chia commented on August 19, 2024

问题不是在subject上,问题主要存在于sentiment_valuesentiment_word这两个值,根据你的代码,导出的csv里的sentiment_word永远是空字符串,而且我一直在调试模式,找不到类似内容。至于sentiment_value大面积为0这个情况,不知道您导出的结果是不是这样的,因为这个情况和训练集的内容不太一样。

from ccf-bdci-automotive-field-asc-2018.

nlpjoe avatar nlpjoe commented on August 19, 2024

@Oldmoooon 0是挺多的哈,我这边也是

from ccf-bdci-automotive-field-asc-2018.

desktoop avatar desktoop commented on August 19, 2024

同问,遇见了和上面一样的问题,676.csv文件的确十分关键,否则会出现大面积的0现象,当然,原始数据差不多也都是0,比例非常高。另外,我替换了csvs/test_public.csv这个文件,用我自己的一些数据来做预测,发现输出的还是test_public.csv的结果,请问下,这个要怎么改?是不是要把EDA.ipynb里的过程走一遍,我没仔细查看,但是感觉并不是所有cell都是按顺序执行的,因为我顺序执行的时候发现前面的cell有些变量未定义,这里可能要修改一下是吗?

from ccf-bdci-automotive-field-asc-2018.

nlpjoe avatar nlpjoe commented on August 19, 2024

@desktoop 数据有pkl缓存的,换了数据集把缓存删了再跑试试呢

from ccf-bdci-automotive-field-asc-2018.

nlpjoe avatar nlpjoe commented on August 19, 2024

@desktoop cell不是按顺序执行的

from ccf-bdci-automotive-field-asc-2018.

desktoop avatar desktoop commented on August 19, 2024

那这个ipynb的数据处理文件还需要重新执行一下吗?

from ccf-bdci-automotive-field-asc-2018.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.