Code Monkey home page Code Monkey logo

amr_coref's People

Contributors

bjascob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

plandes

amr_coref's Issues

Multiprocessing Error

Hello,

I am a student studying GNN and recently I have been working on a project using AMR graphs.

I came across an issue while trying to use the AMR graph library in the code uploaded to a repository.

And I tried fixing it myself, but I'd like some feedback to ensure that the changes I made are correct.

Finally, I found an error that global variables are not shared when using multiprocessing.

So, I used functools.partial to pass arguments to the worker function and resolved the issue.

I'm not sure if the uploaded code was originally correct.

Below is the code that I modified.

Could you please confirm if the code below is correct? Thank you!

import re
import logging
from   multiprocessing import Pool
from   tqdm import tqdm
import numpy
from functools import partial

gfeaturizer, gmax_dist = None, None    # for multiprocessing
def build_coref_features(mdata, model, **kwargs):
    chunksize = kwargs.get('feat_chunksize',          200)
    maxtpc    = kwargs.get('feat_maxtasksperchild',   200)
    processes = kwargs.get('feat_processes',         None)    # None = use os.cpu_count()
    show_prog = kwargs.get('show_prog',              True)
    global gfeaturizer, gmax_dist

    gfeaturizer = CorefFeaturizer(mdata, model)
    gmax_dist   = model.config.max_dist if model.config.max_dist is not None else 999999999
    # Build the list of doc_names and mention indexes for multiprocessing and the output container
    idx_keys = [(dn, idx) for dn, mlist in gfeaturizer.mdata.mentions.items() for idx in range(len(mlist))]
    feat_data = {}
    for dn, mlist in gfeaturizer.mdata.mentions.items():
        feat_data[dn] = [None]*len(mlist)
    # Loop through and get the pair features for all antecedents
    pbar = tqdm(total=len(idx_keys), ncols=100, disable=not show_prog)
    with Pool(processes=processes, maxtasksperchild=maxtpc) as pool:
        worker_with_args = partial(worker, gfeaturizer=gfeaturizer, gmax_dist=gmax_dist)
        for fdata in pool.imap_unordered(worker_with_args, idx_keys, chunksize=chunksize):
            dn, midx, sspans, dspans, words, sfeats, pfeats, slabels, plabels = fdata
            feat_data[dn][midx] = {'sspans':sspans,   'dspans':dspans, 'words':words,
                                   'sfeats':sfeats,   'pfeats':pfeats,
                                   'slabels':slabels, 'plabels':plabels}
            pbar.update(1)
    pbar.close()
    # Error check
    for dn, feat_list in feat_data.items():
        assert None not in feat_list
    return feat_data

def worker(idx_key, gfeaturizer, gmax_dist):
    global gfrozen_embeds
    doc_name, midx = idx_key
    mlist       = gfeaturizer.mdata.mentions[doc_name]
    mention     = mlist[midx]               # the head mention
    antecedents = mlist[:midx]              # all antecedents up to (not including) head mention
    antecedents = antecedents[-gmax_dist:]  # truncate earlier value so list is only max_dist long
    # Process the single and pair data data
    sspan_vector = gfeaturizer.get_sentence_span_vector(mention)
    dspan_vector = gfeaturizer.get_document_span_vector(mention)
    word_indexes = gfeaturizer.get_word_indexes(mention)
    sfeats       = gfeaturizer.get_single_features(mention)
    pfeats       = gfeaturizer.get_pair_features(mention, antecedents)
    # Build target labels.  Note that if there are no clusters in the mention data this will still
    # return a list of targets, though all singles will be 1 and pairs 0
    slabels, plabels = gfeaturizer.build_targets(mention, antecedents)
    return doc_name, midx, sspan_vector, dspan_vector, word_indexes, sfeats, pfeats, slabels, plabels

Clarification needed to run

Hi brad,
Please provide some more documentation/clarification as to how I can use a pretrained model provided by you to do coreference resolution on an AMR graph.
I also have access to AMR3.0 dataset, so if needed I can train it on my own as well.
However, I still need to understand what exactly is happening here; do you have a research paper on this approach? or perhaps a simple PDF with more specific instructions would do.

Please help; I would like to use this in my own research, and I would obviously cite you/this repo/your paper in any publication

Inferencing does not work

I followed the instructions to run the inference script `` but I get the following error (below). It appears the module global variable gfeaturizer is not being set in `amr_coref.amr_coref.coref.coref_featurizer`.

Before running I added the downloaded model (from the releases section on this repo's GitHub section) and installed the LDC2020T02 corpus in the data directory.

Loading model from data/model_coref-v0.1.0/
Loading test data
ignoring epigraph data for duplicate triple: ('w', ':polarity', '-')
Clustering
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/work/third/amr_coref/amr_coref/coref/coref_featurizer.py", line 259, in worker
    mlist       = gfeaturizer.mdata.mentions[doc_name]
AttributeError: 'NoneType' object has no attribute 'mdata'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/work/third/amr_coref/40_Run_Inference.py", line 67, in <module>
    cluster_dict = inference.coreference(ordered_pgraphs)
  File "/work/third/amr_coref/amr_coref/coref/inference.py", line 47, in coreference
    self.test_dloader = get_data_loader_from_data(tdata_dict, self.model, show_prog=self.show_prog, shuffle=False)
  File "/work/third/amr_coref/amr_coref/coref/coref_data_loader.py", line 41, in get_data_loader_from_data
    fdata   = build_coref_features(mdata, model, **kwargs)
  File "/work/third/amr_coref/amr_coref/coref/coref_featurizer.py", line 243, in build_coref_features
    for fdata in pool.imap_unordered(worker, idx_keys, chunksize=chunksize):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 448, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
AttributeError: 'NoneType' object has no attribute 'mdata'

Compilation exited abnormally with code 1 at Fri Aug  5 18:19:19

Question: Best way to create MS-AMR from output?

Is there a straightforward way to take the output from this model and use it to merge together the multiple AMRs into one document-level / MS-AMR?

I am thinking I need to

  1. Add a special token for each individual AMR to the beginning of each node name so they don't accidentally merge with similarly named nodes from other graphs.
  2. Convert all mentions of the coreferent node in each graph to having the same node name.
  3. Create a new graph, which is the union of the set of triples between each AMR after the above transformations.

This sort of works for me, but feels complicated! Was wondering if there is a more efficient method!

Thanks for the great work as always!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.