Code Monkey home page Code Monkey logo

axcell's People

Contributors

mkardas avatar piotrczapla avatar rjt1990 avatar rstojnic avatar timbmg avatar vxenomac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

axcell's Issues

permission denied of latex2html.sh

When I run the extraction.ipynb. I am facing the following issues:

docker.errors.APIError: 400 Client Error: Bad Request ("OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"/files/latex2html.sh\": permission denied": unknown")

I can run docker without sudo and can successfully run the following sample using docker-py

import docker
client = docker.from_env()
>>> client.containers.run("ubuntu:latest", "echo hello world")
'hello world\n

Could you help? Thanks.

SourceChangeWarning & WeightDropout error

Hi!
I am reproducing the results and encountered two problems as follows:
1.SourceChangeWarning
When using ResultsExtractor in evaluation.ipynb, the error was raised: 453: SourceChangeWarning: source code of class 'torch.nn.modules.loss.BCEWithLogitsLoss' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
Could you please give me some advice on how to retrieve the original source?

2.WeightDropout
When extracting the results in evaluation.ipynb, an error was raised: AttributeError: 'WeightDropout' object has no attribute 'idxs'

Looking forward to your kind response:) Thank you very much!

Metric for Table Segmentation

It looks like the following code in nbsvm.py is used to compute precision and recall for the cell type classification task:

def metrics(preds, true_y):
    
    y = true_y
    p = preds
    acc = (p == y).mean()
    tp = ((y != 0) & (p == y)).sum()
    fp = ((p != 0) & (p != y)).sum()
    fn = ((y != 0) & (p == 0)).sum()
    prec = tp / (fp + tp)
    reca = tp / (fn + tp)
    return {
        "precision": prec,
        "accuracy": acc,
        "recall": reca,
        "TP": tp,
        "FP": fp,
    }

My understanding is that you are trying to exclude OTHER. Then why fp is not calculated as fp = ((y != 0) & (p != y)).sum()? Also, why not use the standard way that treats all classes identically?

Missing file

Thanks for sharing your code. I have a quick question for you guys:

It seems to me that the file ("pwc/papers-with-abstracts.json") required to execute axcell/scripts/download_arxiv_s3_papers.sh is missing. How can I fetch this file?

Does this step (https://github.com/ymohit/axcell/blob/master/scripts/download_arxiv_s3_papers.sh#L10) list arxiv papers mentioned in csvhttps://github.com/paperswithcode/axcell/releases/download/v1.0/arxiv-papers.csv.xz or its a different list?

Thanks

Inquiry about file path in notebooks

Hello! I am working on reproducing the results in Jupyter notebooks on macOS but I have some questions on the following code in terms of the file path:
ROOT_PATH = Path('data')
PWC_LEADERBOARDS_ROOT_PATH = Path('pwc-laderboards')
How to deal with the file path on macOS? I cannot find the mentioned directory 'data' under /axcell-master/notebooks, where can I download the required file? Whether should I create the directory named data under /axcell-master/notebooks?
Looking forward to your kind reply

Dataset

Hi, as for the three datasets you are using (ArxivPapers, SegmentedTables & LinkedResults, PWCLeaderboards), are the three data files involved in the notebooks (arxiv-papers.csv.xz, segmented-tables.json.xz, pwc-leaderboards.json.xz) all that we need necessary? Do we need to manually download each paper using the get_eprint_link(paper) function in the datasets notebook? If so, it would be great if a zip file for all papers can be provided.

In addition, I have looked into the paper_collection.py, where many .json files needed but not provided neither. Could you give some guidance about how to get those files as well?

ConnectionError when calling arxiv-vanity/engrafo

Hi, I really appreciate this work and am trying to reproduce the results.
However, I failed to perform the extraction, even from a single e-print archive for paper 1903.11816v1. To be more specific, when the LatexConverter is calling the arxiv-vanity/engrafo api (at line 65 of latex_converter.py), it reports:
Exception has occurred: ConnectionError
('Connection aborted.', PermissionError(13, 'Permission denied'))
File "/mnt/zr/axcell/axcell/helpers/latex_converter.py", line 65, in latex2html
self.client.containers.run("arxivvanity/engrafo:b3db888fefa118eacf4f13566204b68ce100b3a6", command, remove=True, volumes=volumes)
File "/mnt/zr/axcell/axcell/helpers/latex_converter.py", line 84, in to_html
self.latex2html(source_dir, output_dir)
File "/mnt/zr/axcell/axcell/helpers/paper_extractor.py", line 41, in call
html = self.latex.to_html(unpack_path)

Could you share some ways to resolve this problem? Thanks very much!

ConnectionRefusedError

Hi,

When I tried to run this line in the evaluation notebook,

results = Parallel(backend='multiprocessing', n_jobs=-1)(delayed(process_single)(index) for index in range(len(pc)))

I encountered the following error:

07/03/2020 13:32:37 - WARNING - elasticsearch - PUT http://127.0.0.1:9200/paper-fragments/_doc/1207.4708v2_1000 [status:N/A request:0.001s]
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/opt/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/opt/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused

Do you have any idea to resolve this issue?

llvm module

I have encountered the following error while reproducing the result from evaluation.ipynb:
ModuleNotFoundError: No module named 'llvm'
More errors were generated while finishing up the installation and setup. I’m currently stucked with this step. After a thorough searching, I also figured that llvmpy is a relative old module. Hence, I’m wondering if there is any required settings for this part. Or, do you have any idea of how to resolve this issue.

file not found in AWS S3

I have configured my AWS CLI and i'm able to download arxiv_src_manifest.xml file and the code creates tars.txt but fails as shown below --

Capture

How do i resolve this?

Steps to run replicate the full code on windows.

Hi - thanks for the great work and sharing the code.
I am new to docker and i'm finding it difficult to understand how to exactly reproduce the code and results on windows.
Could you please provide a more detailed Readme with step by step actions to take to recreate the work on windows?

Docker Requirement?

Hi all,

Super cool tool; thanks for making this! I was wondering if I could get a liitle more information about how Docker is used and whether it is possible to get around the Docker requirement (e.g., to run axcell in Colab). My best guess is that it is creating extra nodes for elasticsearch?

Thanks for your time,
Bernie

Estimated Cost for Using AWS API

Hi,

Have you calculated the number about the cost if I use the S3 API provided by AWS to download all the related paper resources? As AWS seems charge downloading by data size so I cannot have no idea of an approximate number for the cost.

AxCell CondaEnv. creation fails on Windows

Hey,

I am planing to further elaborate and implement features for AxCell. However unfortunately I do not own a Linux or Mac OS. While trying to install AxCell on Windows10 i ran into some issues. When im trying to install the Conda Enviroment (using Anaconda 3.8 Python) via the presented file, "magic_python" and "docker-compose" seem to be only available for Linux/MacOs.

Is it possible to run AxCell on Windows?

How to use the API

Hi,

I'm trying to run the notebooks and use your pre-trained model to test it on a paper. I use the extraction.ipynb and results-extraction.ipynb notebooks but get some errors. It seems it needs to download data. For example in the extraction.ipynb notebook I get this:

image

What folders and directories should be created and what data should I download? It's not clear for me.

I also downloaded three dataset csv and json files and put them in the scripts folder and run download_arxiv_s3_papers.sh, but I get an error again:

fatal error: Unable to locate credentials
warning: failed to load external entity "arXiv_src_manifest.xml"

I cannot find the arXiv_src_manifest.xml file.

Thanks

OSError: [E053] Could not read config.cfg

When using result_extraction notebook on a paper, it crashes at
ResultExtractor(...)
with the error:

[PID 496779] Load model table-structure-classifier.pth
/home/vivoli/miniconda3/envs/arxiv-manipulation/lib/python3.7/site-packages/spacy/util.py:715: UserWarning: [W094] Model 'en_core_sci_sm' (0.2.4) specifies an under-constrained spaCy version requirement: >=2.2.1. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.5,<3.1.0
  warnings.warn(warn_msg)

OSError: [E053] Could not read config.cfg from /home/vivoli/miniconda3/envs/axcell/lib/python3.7/site-packages/en_core_sci_sm/en_core_sci_sm-0.2.4/config.cfg

Do you have some idea to solve it?
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.