Code Monkey home page Code Monkey logo

Comments (2)

jeantirole avatar jeantirole commented on August 15, 2024

Hello, I have a same issue as Patrik encountered. " KeyError: 'doc_key' "

i am afraid that this issue still can't get proper answers or suggestions.

from gen-arg.

raspberryice avatar raspberryice commented on August 15, 2024

Hi JaeWan, the doc_key field is part of the processed (tokenized) files, which should be created through the data module's prepare_data function (line 169 of KAIROS_data_module.py).
This prepare_data function should be run automatically by PyTorch Lightning's LightningDataModule class. (If not, you can try directly running the KAIROS_data_module.py file first.) As a result, the preprocessed_KAIROS directory will be created with the tokenized files inside.
The files follow this format:

{"event_idx": 1, "doc_key": "scenario_en_kairos_65", "input_token_ids": [0, 50265, 21, 1710, 30, 50265, 634, 50265, 11, 50265, 809, 233, 19, 50265, 1131, 696, 23, 50265, 317, 2, 2, 83, 4221, 34, 57, 1348, 11, 10, 68, 112, 111, 153, 2672, 1658, 30, 10, 9955, 1393, 8601, 249, 9, 20037, 71, 37, 300, 2037, 62, 11, 5, 830, 336, 185, 111, 159, 9, 7550, 111, 19023, 6315, 4802, 16870, 479, 3460, 3001, 4832, 2448, 2936, 1060, 4767, 28299, 336, 7550, 908, 11, 4170, 161, 37, 21, 7785, 7, 244, 6840, 26546, 8363, 21, 50266, 1710, 50266, 77, 16870, 27579, 10, 17798, 8560, 11, 5, 124, 9, 39, 11148, 11, 830, 336, 479, 44, 48, 38, 33, 7, 28, 182, 7316, 142, 89, 16, 41, 1288, 7, 45, 9263, 143, 9, 5, 1110, 9, 5, 4221, 2156, 44, 46, 26546, 8363, 44, 27, 579, 2470, 2363, 381, 3494, 174, 41165, 230, 9763, 574, 479, 44, 48, 20, 445, 9, 2026, 2156, 38, 4443, 2156, 4922, 13, 1495, 11, 6203, 7, 99, 52, 1697, 479, 44, 46, 10187, 4832, 4170, 9955, 1393, 6773, 68, 112, 448, 2672, 136, 249, 132, 4832, 504, 4170, 9955, 1393, 6773, 68, 112, 448, 2672, 136, 249, 4170, 9955, 1393, 6773, 68, 112, 448, 2672, 136, 249, 280, 445, 9, 2026, 2156, 61, 1849, 491, 4756, 10, 5375, 9, 11, 628, 494, 199, 2156, 26, 249, 56, 55, 87, 615, 86, 7, 13192, 137, 16870, 300, 88, 26546, 8363, 44, 27, 579, 9955, 479, 20, 2745, 1292, 9, 896, 2156, 5, 4170, 168, 2156, 5997, 2681, 6506, 111, 1653, 625, 1975, 522, 1841, 8, 928, 522, 1841, 58, 1440, 25, 9483, 479, 3718, 1388, 874, 6859, 374, 5, 662, 9, 2049, 479, 158, 2156, 336, 2156, 121, 479, 208, 479, 1247, 9110, 5, 10376, 51, 56, 12333, 10, 98, 111, 373, 26301, 12623, 569, 11, 61, 10, 1563, 313, 26, 37, 21, 59, 7, 2883, 41, 908, 479, 20, 10376, 2006, 5, 313, 11, 5, 569, 25, 16870, 8, 10, 15714, 165, 7501, 39, 790, 11, 5997, 2681, 6506, 479, 497, 155, 4832, 2248, 181, 479, 475, 479, 2156, 16870, 373, 13, 10, 11148, 7, 185, 123, 7, 230, 4933, 11984, 11, 928, 479, 20, 2026, 1697, 14, 1135, 5, 249, 2621, 2156, 26546, 8363, 21, 45, 2294, 31, 6539, 88, 16870, 44, 27, 579, 17139, 479, 16870, 172, 376, 66, 9, 5, 790, 8, 300, 88, 5, 124, 2418, 9, 5, 11148, 479, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "input_attn_mask": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "tgt_token_ids": [0, 6840, 26546, 8363, 21, 1710, 30, 50265, 634, 50265, 11, 50265, 809, 233, 19, 50265, 1131, 696, 23, 50265, 317, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "tgt_attn_mask": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}

from gen-arg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.