paperswithcode / axcell Goto Github PK
View Code? Open in Web Editor NEWTools for extracting tables and results from Machine Learning papers
License: Apache License 2.0
Tools for extracting tables and results from Machine Learning papers
License: Apache License 2.0
When I run the extraction.ipynb
. I am facing the following issues:
docker.errors.APIError: 400 Client Error: Bad Request ("OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"/files/latex2html.sh\": permission denied": unknown")
I can run docker without sudo and can successfully run the following sample using docker-py
import docker
client = docker.from_env()
>>> client.containers.run("ubuntu:latest", "echo hello world")
'hello world\n
Could you help? Thanks.
Hi!
I am reproducing the results and encountered two problems as follows:
1.SourceChangeWarning
When using ResultsExtractor in evaluation.ipynb, the error was raised: 453: SourceChangeWarning: source code of class 'torch.nn.modules.loss.BCEWithLogitsLoss' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True
and use the patch tool to revert the changes.
Could you please give me some advice on how to retrieve the original source?
2.WeightDropout
When extracting the results in evaluation.ipynb, an error was raised: AttributeError: 'WeightDropout' object has no attribute 'idxs'
Looking forward to your kind response:) Thank you very much!
It looks like the following code in nbsvm.py
is used to compute precision and recall for the cell type classification task:
def metrics(preds, true_y):
y = true_y
p = preds
acc = (p == y).mean()
tp = ((y != 0) & (p == y)).sum()
fp = ((p != 0) & (p != y)).sum()
fn = ((y != 0) & (p == 0)).sum()
prec = tp / (fp + tp)
reca = tp / (fn + tp)
return {
"precision": prec,
"accuracy": acc,
"recall": reca,
"TP": tp,
"FP": fp,
}
My understanding is that you are trying to exclude OTHER. Then why fp
is not calculated as fp = ((y != 0) & (p != y)).sum()
? Also, why not use the standard way that treats all classes identically?
Thanks for sharing your code. I have a quick question for you guys:
It seems to me that the file ("pwc/papers-with-abstracts.json") required to execute axcell/scripts/download_arxiv_s3_papers.sh is missing. How can I fetch this file?
Does this step (https://github.com/ymohit/axcell/blob/master/scripts/download_arxiv_s3_papers.sh#L10) list arxiv papers mentioned in csvhttps://github.com/paperswithcode/axcell/releases/download/v1.0/arxiv-papers.csv.xz or its a different list?
Thanks
Hello! I am working on reproducing the results in Jupyter notebooks on macOS but I have some questions on the following code in terms of the file path:
ROOT_PATH = Path('data')
PWC_LEADERBOARDS_ROOT_PATH = Path('pwc-laderboards')
How to deal with the file path on macOS? I cannot find the mentioned directory 'data' under /axcell-master/notebooks, where can I download the required file? Whether should I create the directory named data under /axcell-master/notebooks?
Looking forward to your kind reply
Hi, as for the three datasets you are using (ArxivPapers, SegmentedTables & LinkedResults, PWCLeaderboards), are the three data files involved in the notebooks (arxiv-papers.csv.xz, segmented-tables.json.xz, pwc-leaderboards.json.xz) all that we need necessary? Do we need to manually download each paper using the get_eprint_link(paper) function in the datasets notebook? If so, it would be great if a zip file for all papers can be provided.
In addition, I have looked into the paper_collection.py, where many .json files needed but not provided neither. Could you give some guidance about how to get those files as well?
Hi, I really appreciate this work and am trying to reproduce the results.
However, I failed to perform the extraction, even from a single e-print archive for paper 1903.11816v1. To be more specific, when the LatexConverter is calling the arxiv-vanity/engrafo api (at line 65 of latex_converter.py), it reports:
Exception has occurred: ConnectionError
('Connection aborted.', PermissionError(13, 'Permission denied'))
File "/mnt/zr/axcell/axcell/helpers/latex_converter.py", line 65, in latex2html
self.client.containers.run("arxivvanity/engrafo:b3db888fefa118eacf4f13566204b68ce100b3a6", command, remove=True, volumes=volumes)
File "/mnt/zr/axcell/axcell/helpers/latex_converter.py", line 84, in to_html
self.latex2html(source_dir, output_dir)
File "/mnt/zr/axcell/axcell/helpers/paper_extractor.py", line 41, in call
html = self.latex.to_html(unpack_path)
Could you share some ways to resolve this problem? Thanks very much!
Hi!
After successfully setting up the conda environment, I tried to run result-extraction.ipynb
notebook. Unfortunately, I faced the following error 'LSTM' object has no attribute '_flat_weights_names'
while loading the 'ResultsExtractor'.
Have I done something wrong or missed some requirements/dependencies?
Hi,
When I tried to run this line in the evaluation notebook,
results = Parallel(backend='multiprocessing', n_jobs=-1)(delayed(process_single)(index) for index in range(len(pc)))
I encountered the following error:
07/03/2020 13:32:37 - WARNING - elasticsearch - PUT http://127.0.0.1:9200/paper-fragments/_doc/1207.4708v2_1000 [status:N/A request:0.001s]
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/opt/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/opt/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused
Do you have any idea to resolve this issue?
I have encountered the following error while reproducing the result from evaluation.ipynb:
ModuleNotFoundError: No module named 'llvm'
More errors were generated while finishing up the installation and setup. I’m currently stucked with this step. After a thorough searching, I also figured that llvmpy is a relative old module. Hence, I’m wondering if there is any required settings for this part. Or, do you have any idea of how to resolve this issue.
Hi - thanks for the great work and sharing the code.
I am new to docker and i'm finding it difficult to understand how to exactly reproduce the code and results on windows.
Could you please provide a more detailed Readme with step by step actions to take to recreate the work on windows?
Hi all,
Super cool tool; thanks for making this! I was wondering if I could get a liitle more information about how Docker is used and whether it is possible to get around the Docker requirement (e.g., to run axcell in Colab). My best guess is that it is creating extra nodes for elasticsearch?
Thanks for your time,
Bernie
Hi,
Have you calculated the number about the cost if I use the S3 API provided by AWS to download all the related paper resources? As AWS seems charge downloading by data size so I cannot have no idea of an approximate number for the cost.
Hey,
I am planing to further elaborate and implement features for AxCell. However unfortunately I do not own a Linux or Mac OS. While trying to install AxCell on Windows10 i ran into some issues. When im trying to install the Conda Enviroment (using Anaconda 3.8 Python) via the presented file, "magic_python" and "docker-compose" seem to be only available for Linux/MacOs.
Is it possible to run AxCell on Windows?
Hi,
I'm trying to run the notebooks and use your pre-trained model to test it on a paper. I use the extraction.ipynb
and results-extraction.ipynb
notebooks but get some errors. It seems it needs to download data. For example in the extraction.ipynb
notebook I get this:
What folders and directories should be created and what data should I download? It's not clear for me.
I also downloaded three dataset csv and json files and put them in the scripts folder and run download_arxiv_s3_papers.sh
, but I get an error again:
fatal error: Unable to locate credentials
warning: failed to load external entity "arXiv_src_manifest.xml"
I cannot find the arXiv_src_manifest.xml
file.
Thanks
When using result_extraction notebook on a paper, it crashes at
ResultExtractor(...)
with the error:
[PID 496779] Load model table-structure-classifier.pth
/home/vivoli/miniconda3/envs/arxiv-manipulation/lib/python3.7/site-packages/spacy/util.py:715: UserWarning: [W094] Model 'en_core_sci_sm' (0.2.4) specifies an under-constrained spaCy version requirement: >=2.2.1. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.5,<3.1.0
warnings.warn(warn_msg)
OSError: [E053] Could not read config.cfg from /home/vivoli/miniconda3/envs/axcell/lib/python3.7/site-packages/en_core_sci_sm/en_core_sci_sm-0.2.4/config.cfg
Do you have some idea to solve it?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.