bjascob / amr_coref Goto Github PK
View Code? Open in Web Editor NEWA python library / model for creating co-references between AMR graph nodes.
License: MIT License
A python library / model for creating co-references between AMR graph nodes.
License: MIT License
Hello,
I am a student studying GNN and recently I have been working on a project using AMR graphs.
I came across an issue while trying to use the AMR graph library in the code uploaded to a repository.
And I tried fixing it myself, but I'd like some feedback to ensure that the changes I made are correct.
Finally, I found an error that global variables are not shared when using multiprocessing.
So, I used functools.partial to pass arguments to the worker function and resolved the issue.
I'm not sure if the uploaded code was originally correct.
Below is the code that I modified.
Could you please confirm if the code below is correct? Thank you!
import re
import logging
from multiprocessing import Pool
from tqdm import tqdm
import numpy
from functools import partial
gfeaturizer, gmax_dist = None, None # for multiprocessing
def build_coref_features(mdata, model, **kwargs):
chunksize = kwargs.get('feat_chunksize', 200)
maxtpc = kwargs.get('feat_maxtasksperchild', 200)
processes = kwargs.get('feat_processes', None) # None = use os.cpu_count()
show_prog = kwargs.get('show_prog', True)
global gfeaturizer, gmax_dist
gfeaturizer = CorefFeaturizer(mdata, model)
gmax_dist = model.config.max_dist if model.config.max_dist is not None else 999999999
# Build the list of doc_names and mention indexes for multiprocessing and the output container
idx_keys = [(dn, idx) for dn, mlist in gfeaturizer.mdata.mentions.items() for idx in range(len(mlist))]
feat_data = {}
for dn, mlist in gfeaturizer.mdata.mentions.items():
feat_data[dn] = [None]*len(mlist)
# Loop through and get the pair features for all antecedents
pbar = tqdm(total=len(idx_keys), ncols=100, disable=not show_prog)
with Pool(processes=processes, maxtasksperchild=maxtpc) as pool:
worker_with_args = partial(worker, gfeaturizer=gfeaturizer, gmax_dist=gmax_dist)
for fdata in pool.imap_unordered(worker_with_args, idx_keys, chunksize=chunksize):
dn, midx, sspans, dspans, words, sfeats, pfeats, slabels, plabels = fdata
feat_data[dn][midx] = {'sspans':sspans, 'dspans':dspans, 'words':words,
'sfeats':sfeats, 'pfeats':pfeats,
'slabels':slabels, 'plabels':plabels}
pbar.update(1)
pbar.close()
# Error check
for dn, feat_list in feat_data.items():
assert None not in feat_list
return feat_data
def worker(idx_key, gfeaturizer, gmax_dist):
global gfrozen_embeds
doc_name, midx = idx_key
mlist = gfeaturizer.mdata.mentions[doc_name]
mention = mlist[midx] # the head mention
antecedents = mlist[:midx] # all antecedents up to (not including) head mention
antecedents = antecedents[-gmax_dist:] # truncate earlier value so list is only max_dist long
# Process the single and pair data data
sspan_vector = gfeaturizer.get_sentence_span_vector(mention)
dspan_vector = gfeaturizer.get_document_span_vector(mention)
word_indexes = gfeaturizer.get_word_indexes(mention)
sfeats = gfeaturizer.get_single_features(mention)
pfeats = gfeaturizer.get_pair_features(mention, antecedents)
# Build target labels. Note that if there are no clusters in the mention data this will still
# return a list of targets, though all singles will be 1 and pairs 0
slabels, plabels = gfeaturizer.build_targets(mention, antecedents)
return doc_name, midx, sspan_vector, dspan_vector, word_indexes, sfeats, pfeats, slabels, plabels
Hi brad,
Please provide some more documentation/clarification as to how I can use a pretrained model provided by you to do coreference resolution on an AMR graph.
I also have access to AMR3.0 dataset, so if needed I can train it on my own as well.
However, I still need to understand what exactly is happening here; do you have a research paper on this approach? or perhaps a simple PDF with more specific instructions would do.
Please help; I would like to use this in my own research, and I would obviously cite you/this repo/your paper in any publication
I followed the instructions to run the inference script `` but I get the following error (below). It appears the module global variable gfeaturizer
is not being set in `amr_coref.amr_coref.coref.coref_featurizer`.
Before running I added the downloaded model (from the releases section on this repo's GitHub section) and installed the LDC2020T02 corpus in the data directory.
Loading model from data/model_coref-v0.1.0/
Loading test data
ignoring epigraph data for duplicate triple: ('w', ':polarity', '-')
Clustering
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/work/third/amr_coref/amr_coref/coref/coref_featurizer.py", line 259, in worker
mlist = gfeaturizer.mdata.mentions[doc_name]
AttributeError: 'NoneType' object has no attribute 'mdata'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/work/third/amr_coref/40_Run_Inference.py", line 67, in <module>
cluster_dict = inference.coreference(ordered_pgraphs)
File "/work/third/amr_coref/amr_coref/coref/inference.py", line 47, in coreference
self.test_dloader = get_data_loader_from_data(tdata_dict, self.model, show_prog=self.show_prog, shuffle=False)
File "/work/third/amr_coref/amr_coref/coref/coref_data_loader.py", line 41, in get_data_loader_from_data
fdata = build_coref_features(mdata, model, **kwargs)
File "/work/third/amr_coref/amr_coref/coref/coref_featurizer.py", line 243, in build_coref_features
for fdata in pool.imap_unordered(worker, idx_keys, chunksize=chunksize):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 448, in <genexpr>
return (item for chunk in result for item in chunk)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 870, in next
raise value
AttributeError: 'NoneType' object has no attribute 'mdata'
Compilation exited abnormally with code 1 at Fri Aug 5 18:19:19
Is there a straightforward way to take the output from this model and use it to merge together the multiple AMRs into one document-level / MS-AMR?
I am thinking I need to
This sort of works for me, but feels complicated! Was wondering if there is a more efficient method!
Thanks for the great work as always!
The only other system that I'm aware of is https://github.com/Sean-Blank/AMRcoref as described in the paper "End-to-End AMR Coreference Resolution". The scores in the paper are relatively similar to this project's scores.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.