eaglew / paperrobot Goto Github PK
View Code? Open in Web Editor NEWCode for PaperRobot: Incremental Draft Generation of Scientific Ideas
Home Page: https://aclanthology.org/P19-1191
License: MIT License
Code for PaperRobot: Incremental Draft Generation of Scientific Ideas
Home Page: https://aclanthology.org/P19-1191
License: MIT License
Could you please share your training parameters in the training? How much time do you use? Thanks.
hi, when I train Existing paper reading, get follow error, can you help me?
Traceback (most recent call last):
File "D:/PyCharmProjects/PaperRobot-master/Existing paper reading/train.py", line 101, in
graph, _ = load_graph(os.path.join(args.data_dir, 'train2id.txt'), num_ent)
File "D:\PyCharmProjects\PaperRobot-master\Existing paper reading\utils\utils.py", line 93, in load_graph
adj = torch.FloatTensor(nx.adjacency_matrix(graph.G, nodelist=range(num_ent)).todense())
TypeError: can't convert np.ndarray of type numpy.int32. The only supported types are: float64, float32, float16, int64, int32, int16, int8, and uint8.
Hi there,
Congrats on the excellent accomplishment of PaperRobot! I saw that you are going to release more codes in the next few days. However, just out of curiosity, I am wondering whether the following is the workflow for KG generation?
1. Perform NER with PubTator API
2. Match Gene(NCBI), Disease(MEDIC) and chemicals to MeSH IDs
3. Establish relations between entities by looking up CTD data.
FYI, when are the codes of the KG part expected to be released (if there's a plan)?
Thanks! :)
Just wondering if/how 'terms' are related to head_entity_relation_tail_entity in the underlying KB at all? Thanks!
When I got into the Quickstart step,
I typed in
python train.py
and everything went normal at the beginning, but later the terminal showed that something has been deprecated. And then it stopped.
python train.py
Found 23894 unique words (9146765 in total)
finish_dump
Finish loading train
Finish loading valid
Finish loading test
Epoch 0
/home/letsuya/miniconda3/envs/3.6env/lib/python3.7/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
已砍掉
What should I do to fix it?
Thank you guys~
Hi, thanks for your interesting work! I want to know where is the code of generating KGs?
Hi, thanks for your excellent work, there is one question making me confused:
after I trained link prediction model, I tried to run test.py in the Existing_model_reading folder, but I found it runs extremely slow that it only runs about 6000 items in test2id.txt after a week... and it also costs lots of RAM, about 200G when I discovered the problem. I wonder do you have any ideas on where the problem possibly lies? My GPU configuration shows as following picture, and I ran test.py as readme shows, except for I added nohup before the command to run in the background. Thank you very much again!
I run \Existing paper reading\train.py as described in the document
But something went wrong
Traceback (most recent call last):
File "C:/Users/Desktop/PaperRobot-master/PaperRobot-master/Existing paper reading/train.py", line 101, in
graph, _ = load_graph(os.path.join(args.data_dir, 'train2id.txt'), num_ent)
File "C:\Users\shuzip\Desktop\PaperRobot-master\PaperRobot-master\Existing paper reading\utils\utils.py", line 93, in load_graph
adj = torch.FloatTensor(nx.adjacency_matrix(graph.G, nodelist=range(num_ent)).todense())
TypeError: can't convert np.ndarray of type numpy.int32. The only supported types are: float64, float32, float16, int64, int32, int16, int8, and uint8.
Hello, how do you use the model trained in existing paper reading for new paper writing? I understand that, for each title, you extract the top 10 related entities from the enriched knowledge graph. Where is the code corresponding to this? I don't see the GATA model being used in the code for new paper writing. Did you already run the model and save the results in paper_reading.zip
? Also, can you please explain how you created the paper_reading
dataset?
Thanks for your great work!
Given that model training on both tasks are pretty slow, is it possible for you to share the trained model weights? Especially for the link prediction model.
I would like to know how do you determine when to stop the model training on each of the task? How many epochs did you use to achieve the best performance?
Looking forward to your reply. Thanks!
OS: Ubuntu16.04 _x64 , 8 core * 16G RAM
CMD: python train.py --gpu=0
......
Finish loading valid
Finish loading test
Epoch 0
/usr/local/lib/python3.6/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "train.py", line 263, in
train(start_epoch+epoch)
File "train.py", line 184, in train
ntt[0], ntt[1], ntt[2])
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/PaperRobot/Existing paper reading/model/GATA.py", line 19, in forward
graph = self.graph(node_features, adj)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/PaperRobot/Existing paper reading/model/GAT.py", line 18, in forward
x = torch.cat([att(x, adj) for att in self.attentions], dim=1)
File "/usr/PaperRobot/Existing paper reading/model/GAT.py", line 18, in
x = torch.cat([att(x, adj) for att in self.attentions], dim=1)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/PaperRobot/Existing paper reading/model/graph_attention.py", line 37, in forward
attention = F.dropout(attention, self.dropout, training=self.training)
File "/usr/local/lib/python3.6/site-packages/torch/nn/functional.py", line 830, in dropout
else _VF.dropout(input, p, training))
RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0
system
ubuntu
Linux pve-ubuntu 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
python train.py --data_path data/pubmed_abstract --model_dp abstract_model/
Epoch 0/99
Traceback (most recent call last):
File "train.py", line 236, in
batch_o_t, teacher_forcing_ratio=1)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/mnt/sync/ubuntu/PaperRobot-master/New paper writing/memory_generator/seq2seq.py", line 18, in forward
stopwords, sflag)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/mnt/sync/ubuntu/PaperRobot-master/New paper writing/memory_generator/Decoder.py", line 134, in forward
max_source_oov, term_output, term_id, term_mask)
File "/mnt/sync/ubuntu/PaperRobot-master/New paper writing/memory_generator/Decoder.py", line 68, in decode_step
term_context, term_attn = self.memory(_h.unsqueeze(0), term_output, term_mask, cov_mem)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/mnt/sync/ubuntu/PaperRobot-master/New paper writing/memory_generator/utils.py", line 32, in forward
e_t = self.vt_layers[i](torch.tanh(enc_proj + dec_proj).view(batch_size * max_enc_len, -1))
RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0
Hi! I am currently using this code (and it's really great by the way) and I started training the abstract model and there was a KeyError: 'METEOR' in the eval.py file. To get rid of the error, I commented out line 72 print (‘METEOR:\t’, final_scores [‘METEOR’])
in the eval.py file, but I was just wondering if METEOR is important and how to fix the training code and implement METEOR?
Hi! Thanks for the great work! Very enlightening tools! Looking forward to your release of old paper reading part soon.
So I am trying to understand how paperRobot generate new knowledge, and according to the Introduction section, it does so by forming new links between existing entities. Those entities are selected from publicly annotated medical literature datasets (CTD and PubTator). However in Table 1. It is noted that bold letter words like RT-PCT, western blotting represents "topically related entities". In Section 3.5 it is further noted that those two terms are the product of link prediction, But when I search RT-PCR or western blotting in CTD or PubTator though, they are not identified as previously labeled entities. So I am a bit confused whether terms like RT-PCR or western blotting are inside the enriched knowledge graph (which should only contain new links, not new entities?) as new entities or not? If they are not, why would link prediction results in the creation of new ideas like them? I am quite new to the field so I apologize if the question seems trivial but it would be great if you could shed some light on this. Thanks!这个项目现在还更新吗,还有人在使用吗?
for tri in triple_dict[head]: single1 = (head, tri[0], tri[1]) in_graph.add(single1) for tri in triple_dict[tail]: single2 = (tail, tri[0], tri[1]) in_graph.add(single2)
您好,我在阅读您的代码的时候发现,在Existing paper reading/utils/utils.py文件中的get_subgraph()函数中,即246行和249行的两个for循环好像并没有执行,我尝试进行打印,发现triple_dict[head]是空字典。
我猜想可能原因是head和tail是torch.Tensor类型,而triple_dict[head]需要传入的head是int型。
When I run
python train.py --data_path data/pubmed_abstract --model_dp abstract_model/ --gpu 1
I get this error:
21 ----------
22 Epoch 0/99
23 0 batches processed. current batch loss: 11.326438^M1 batches processed. current batch loss: 11.006483^M2 batches processed. c urrent batch loss: 10.861076^M3 batches processed. current batch loss: 10.887144^M4 batches processed. current batch loss: 11. 033303^MTraceback (most recent call last):
24 File "train.py", line 236, in <module>
25 batch_o_t, teacher_forcing_ratio=1)
26 File "/home/rongz/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
27 result = self.forward(*input, **kwargs)
28 File "/home/rongz/PaperRobot/New paper writing/memory_generator/seq2seq.py", line 18, in forward
29 stopwords, sflag)
30 File "/home/rongz/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
31 result = self.forward(*input, **kwargs)
32 File "/home/rongz/PaperRobot/New paper writing/memory_generator/Decoder.py", line 134, in forward
33 max_source_oov, term_output, term_id, term_mask)
34 File "/home/rongz/PaperRobot/New paper writing/memory_generator/Decoder.py", line 68, in decode_step
35 term_context, term_attn = self.memory(_h.unsqueeze(0), term_output, term_mask, cov_mem)
36 File "/home/rongz/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
37 result = self.forward(*input, **kwargs)
38 File "/home/rongz/PaperRobot/New paper writing/memory_generator/utils.py", line 32, in forward
39 e_t = self.vt_layers[i](torch.tanh(enc_proj + dec_proj).view(batch_size * max_enc_len, -1))
40 RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.91 GiB total capacity; 10.37 GiB already allocated; 5 .06 MiB free; 1019.61 MiB cached)
Here is my GPU infomation:
➜ New paper writing git:(master) ✗ nvidia-smi
Sat Jun 15 20:48:37 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:04:00.0 Off | N/A |
| 25% 42C P0 58W / 250W | 0MiB / 12196MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
And before I run
python train.py --data_path data/pubmed_abstract --model_dp abstract_model/ --gpu 1
, the 12196MiB GPU memory is all free.
Can you help me? Thank you very much!
Thanks for this great work! Since I have trouble regarding the memory, even when using GPU p100 and reducing the batch size.
I want to know if I can train on a small set of the dataset. ? I could not exactly understand the part where loading the train_set.
Many thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.