bjascob / amr_coref Goto Github PK

View Code? Open in Web Editor NEW

8.0 2.0 1.0 67 KB

A python library / model for creating co-references between AMR graph nodes.

License: MIT License

Python 99.93% Shell 0.07%

python amr abstract-meaning-representation coreference-resolution

amr_coref's People

Contributors

Stargazers

Watchers

Forkers

plandes

amr_coref's Issues

Multiprocessing Error

Hello,

I am a student studying GNN and recently I have been working on a project using AMR graphs.

I came across an issue while trying to use the AMR graph library in the code uploaded to a repository.

And I tried fixing it myself, but I'd like some feedback to ensure that the changes I made are correct.

Finally, I found an error that global variables are not shared when using multiprocessing.

So, I used functools.partial to pass arguments to the worker function and resolved the issue.

I'm not sure if the uploaded code was originally correct.

Below is the code that I modified.

Could you please confirm if the code below is correct? Thank you!

import re
import logging
from   multiprocessing import Pool
from   tqdm import tqdm
import numpy
from functools import partial

gfeaturizer, gmax_dist = None, None    # for multiprocessing
def build_coref_features(mdata, model, **kwargs):
    chunksize = kwargs.get('feat_chunksize',          200)
    maxtpc    = kwargs.get('feat_maxtasksperchild',   200)
    processes = kwargs.get('feat_processes',         None)    # None = use os.cpu_count()
    show_prog = kwargs.get('show_prog',              True)
    global gfeaturizer, gmax_dist

    gfeaturizer = CorefFeaturizer(mdata, model)
    gmax_dist   = model.config.max_dist if model.config.max_dist is not None else 999999999
    # Build the list of doc_names and mention indexes for multiprocessing and the output container
    idx_keys = [(dn, idx) for dn, mlist in gfeaturizer.mdata.mentions.items() for idx in range(len(mlist))]
    feat_data = {}
    for dn, mlist in gfeaturizer.mdata.mentions.items():
        feat_data[dn] = [None]*len(mlist)
    # Loop through and get the pair features for all antecedents
    pbar = tqdm(total=len(idx_keys), ncols=100, disable=not show_prog)
    with Pool(processes=processes, maxtasksperchild=maxtpc) as pool:
        worker_with_args = partial(worker, gfeaturizer=gfeaturizer, gmax_dist=gmax_dist)
        for fdata in pool.imap_unordered(worker_with_args, idx_keys, chunksize=chunksize):
            dn, midx, sspans, dspans, words, sfeats, pfeats, slabels, plabels = fdata
            feat_data[dn][midx] = {'sspans':sspans,   'dspans':dspans, 'words':words,
                                   'sfeats':sfeats,   'pfeats':pfeats,
                                   'slabels':slabels, 'plabels':plabels}
            pbar.update(1)
    pbar.close()
    # Error check
    for dn, feat_list in feat_data.items():
        assert None not in feat_list
    return feat_data

def worker(idx_key, gfeaturizer, gmax_dist):
    global gfrozen_embeds
    doc_name, midx = idx_key
    mlist       = gfeaturizer.mdata.mentions[doc_name]
    mention     = mlist[midx]               # the head mention
    antecedents = mlist[:midx]              # all antecedents up to (not including) head mention
    antecedents = antecedents[-gmax_dist:]  # truncate earlier value so list is only max_dist long
    # Process the single and pair data data
    sspan_vector = gfeaturizer.get_sentence_span_vector(mention)
    dspan_vector = gfeaturizer.get_document_span_vector(mention)
    word_indexes = gfeaturizer.get_word_indexes(mention)
    sfeats       = gfeaturizer.get_single_features(mention)
    pfeats       = gfeaturizer.get_pair_features(mention, antecedents)
    # Build target labels.  Note that if there are no clusters in the mention data this will still
    # return a list of targets, though all singles will be 1 and pairs 0
    slabels, plabels = gfeaturizer.build_targets(mention, antecedents)
    return doc_name, midx, sspan_vector, dspan_vector, word_indexes, sfeats, pfeats, slabels, plabels

Hi brad,
Please provide some more documentation/clarification as to how I can use a pretrained model provided by you to do coreference resolution on an AMR graph.
I also have access to AMR3.0 dataset, so if needed I can train it on my own as well.
However, I still need to understand what exactly is happening here; do you have a research paper on this approach? or perhaps a simple PDF with more specific instructions would do.

Please help; I would like to use this in my own research, and I would obviously cite you/this repo/your paper in any publication

Inferencing does not work

I followed the instructions to run the inference script `` but I get the following error (below). It appears the module global variable gfeaturizer is not being set in `amr_coref.amr_coref.coref.coref_featurizer`.

Before running I added the downloaded model (from the releases section on this repo's GitHub section) and installed the LDC2020T02 corpus in the data directory.

Loading model from data/model_coref-v0.1.0/
Loading test data
ignoring epigraph data for duplicate triple: ('w', ':polarity', '-')
Clustering
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/work/third/amr_coref/amr_coref/coref/coref_featurizer.py", line 259, in worker
    mlist       = gfeaturizer.mdata.mentions[doc_name]
AttributeError: 'NoneType' object has no attribute 'mdata'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/work/third/amr_coref/40_Run_Inference.py", line 67, in <module>
    cluster_dict = inference.coreference(ordered_pgraphs)
  File "/work/third/amr_coref/amr_coref/coref/inference.py", line 47, in coreference
    self.test_dloader = get_data_loader_from_data(tdata_dict, self.model, show_prog=self.show_prog, shuffle=False)
  File "/work/third/amr_coref/amr_coref/coref/coref_data_loader.py", line 41, in get_data_loader_from_data
    fdata   = build_coref_features(mdata, model, **kwargs)
  File "/work/third/amr_coref/amr_coref/coref/coref_featurizer.py", line 243, in build_coref_features
    for fdata in pool.imap_unordered(worker, idx_keys, chunksize=chunksize):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 448, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
AttributeError: 'NoneType' object has no attribute 'mdata'

Compilation exited abnormally with code 1 at Fri Aug  5 18:19:19

Question: Best way to create MS-AMR from output?

Is there a straightforward way to take the output from this model and use it to merge together the multiple AMRs into one document-level / MS-AMR?

I am thinking I need to

Add a special token for each individual AMR to the beginning of each node name so they don't accidentally merge with similarly named nodes from other graphs.
Convert all mentions of the coreferent node in each graph to having the same node name.
Create a new graph, which is the union of the set of triples between each AMR after the above transformations.

This sort of works for me, but feels complicated! Was wondering if there is a more efficient method!

Thanks for the great work as always!

Comparison to other AMR Coref Systems

The only other system that I'm aware of is https://github.com/Sean-Blank/AMRcoref as described in the paper "End-to-End AMR Coreference Resolution". The scores in the paper are relatively similar to this project's scores.

bjascob / amr_coref Goto Github PK

amr_coref's People

Contributors

Stargazers

Watchers

Forkers

amr_coref's Issues

Multiprocessing Error

Clarification needed to run

Inferencing does not work

Question: Best way to create MS-AMR from output?

Comparison to other AMR Coref Systems

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent