Code Monkey home page Code Monkey logo

tabbie's Introduction

Tabbie: Tabular Information Embedding

This repository includes scripts for Tabbie(Tabular Information Embedding) model. The link to the paper is as follows. https://arxiv.org/pdf/2105.02584.pdf

(setup 1): update cuda version from 10.0 to 10.1 (for AWS deep learning ami)

# https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-base.html
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda

setup 2: create conda env (default env name: "table_emb_dev")

git clone https://github.com/SFIG611/tabbie.git
cd tabbie
conda env create --file env/env.yml
conda activate table_emb_dev
pip install torch==1.5.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

setup 3: installing apex (gcc==5.4, cuda==10.1.168, cudnn==7.6-cuda_10.1)

conda activate table_emb_dev
mkdir -p third_party
git clone -q https://github.com/NVIDIA/apex.git third_party/apex
cd third_party/apex
export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.5"
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" ./

setup 4: download model

https://drive.google.com/drive/folders/1vAMv09j-VlWHKd5djiRGuC16yb-lhJO0
mv freq.tar.gz mix.tar.gz tabbie/model

corrupt cell detection

conda activate table_emb_dev
python pretrain_pred.py  # default input data is "data/pretrain/sample.jsonl"

Input Jsonl file

  • id (str): identical value for table
  • faked_cells (2d list, row_id/col_id=0,1,2...): [[row_id1, col_id1, "corrupt_cell1"], [row_id2, col_id2, "corrupt_cell2"], ...]
  • faked_headers (2d list, col_id=0,1,2...): [[col_id1, "corrupt_header1"], [col_id2, "corrupt_header2"], ...]

finetuning tables with cell labels

conda activate table_emb_dev
cd tabbie
python train.py --train_csv_dir ./data/ft_cell/train_csv --train_label_path ./data/ft_cell/train_label.csv
python pred.py --test_csv_dir ./data/ft_cell/test_csv --model_path ./out_model/model.tar.gz

finetuning tables with column labels

conda activate table_emb_dev
cd tabbie
python train.py --train_csv_dir ./data/ft_col/train_csv --train_label_path ./data/ft_col/train_label.csv
python pred.py --test_csv_dir ./data/ft_col/test_csv --model_path ./out_model/model.tar.gz

finetuning tables with table labels

conda activate table_emb_dev
cd tabbie
python train.py --train_csv_dir ./data/ft_table/train_csv --train_label_path ./data/ft_table/train_label.csv
python pred.py --test_csv_dir ./data/ft_table/test_csv --model_path ./out_model/model.tar.gz

tabbie's People

Contributors

sfig611 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tabbie's Issues

TaBERT Evaluation

Hi @SFIG611, thank you for open-sourcing this great work! I am curious to know what inputs you feed to TaBERT in the experiments. Do you modify TaBERT to only take tables without any surrounding text?

Error installing apex

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" ./

I am trying to run TABBIE on the Adobe Sensei Cloud Computing platform but I get this error at this part during environment setup.

/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/commands/install.py:229: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
cmdoptions.check_install_build_global(options)
Using pip 21.2.2 from /opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip (python 3.6)
Processing /sensei-fs/users/raushah/tabbie/third_party/apex
DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
pip 21.3 will remove support for this functionality. You can find discussion regarding this at pypa/pip#7555.
ERROR: Could not install packages due to an OSError.
Traceback (most recent call last):
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 316, in run
reqs, check_supported_wheels=not options.target_dir
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 75, in resolve
collected = self.factory.collect_root_requirements(root_reqs)
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 473, in collect_root_requirements
requested_extras=(),
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 438, in _make_requirement_from_install_req
version=None,
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 209, in _make_candidate_from_link
version=version,
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 301, in init
version=version,
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in init
self.dist = self._prepare()
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
dist = self._prepare_distribution()
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 306, in _prepare_distribution
self._ireq, parallel_builds=True
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
return self._prepare_linked_requirement(req, parallel_builds)
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 552, in _prepare_linked_requirement
self.download_dir, hashes
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 230, in unpack_url
_copy_source_tree(link.file_path, location)
File "/opt/conda/envs/table_emb_dev/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 157, in _copy_source_tree
copy_function=_copy2_ignoring_special_files,
File "/opt/conda/envs/table_emb_dev/lib/python3.6/shutil.py", line 365, in copytree
raise Error(errors)
shutil.Error: [('/sensei-fs/users/raushah/tabbie/third_party/apex/.git/objects/pack/pack-794c949bd698b1de95f28322821c11d2b779fa42.pack', '/tmp/pip-req-build-1x5f6ltg/.git/objects/pack/pack-794c949bd698b1de95f28322821c11d2b779fa42.pack', "[Errno 13] Permission denied: '/tmp/pip-req-build-1x5f6ltg/.git/objects/pack/pack-794c949bd698b1de95f28322821c11d2b779fa42.pack'"), ('/sensei-fs/users/raushah/tabbie/third_party/apex/.git/objects/pack/pack-794c949bd698b1de95f28322821c11d2b779fa42.idx', '/tmp/pip-req-build-1x5f6ltg/.git/objects/pack/pack-794c949bd698b1de95f28322821c11d2b779fa42.idx', "[Errno 13] Permission denied: '/tmp/pip-req-build-1x5f6ltg/.git/objects/pack/pack-794c949bd698b1de95f28322821c11d2b779fa42.idx'")]
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/9f/8b/a094f5da22d7abf5098205367b3296dd15b914f4232af5ca39ba6214d08c/pip-22.0-py3-none-any.whl#sha256=6cb1ea2bd7fda0668e26ae8c3e45188f301a7ef17ff22efe1f70f3643e56a822 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/4a/ca/e72b3b399d7a8cb34311aa8f52924108591c013b09f0268820afb4cd96fb/pip-22.0.tar.gz#sha256=d3fa5c3e42b33de52bddce89de40268c9a263cd6ef7c94c40774808dafb32c82 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/89/a1/2f4e58eda11e591fbfa518233378835679fc5ab766b690b3df85215014d5/pip-22.0.1-py3-none-any.whl#sha256=30739ac5fb973cfa4399b0afff0523d4fe6bed2f7a5229333f64d9c2ce0d1933 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/63/71/5686e51f06fa59da55f7e81c3101844e57434a30f4a0d7456674d1459841/pip-22.0.1.tar.gz#sha256=7fd7a92f2fb1d2ac2ae8c72fb10b1e640560a0361ed4427453509e2bcc18605b (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/83/b5/df8640236faa5a3cb80bfafd68e9fb4b22578208b8398c032ccff803f9e0/pip-22.0.2-py3-none-any.whl#sha256=682eabc4716bfce606aca8dab488e9c7b58b0737e9001004eb858cdafcd8dbdd (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/d9/c1/146b24a7648fdf3f8b4dc6521ab0b26ac151ef903bac0b63a4e1450cb4d1/pip-22.0.2.tar.gz#sha256=27b4b70c34ec35f77947f777070d8331adbb1e444842e98e7150c288dc0caea4 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/6a/df/a6ef77a6574781a668791419ffe366c8acd1c3cf4709d210cb53cd5ce1c2/pip-22.0.3-py3-none-any.whl#sha256=c146f331f0805c77017c6bb9740cec4a49a0d4582d0c3cc8244b057f83eca359 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/88/d9/761f0b1e0551a3559afe4d34bd9bf68fc8de3292363b3775dda39b62ce84/pip-22.0.3.tar.gz#sha256=f29d589df8c8ab99c060e68ad294c4a9ed896624f6368c5349d70aa581b333d0 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/4d/16/0a14ca596f30316efd412a60bdfac02a7259bf8673d4d917dc60b9a21812/pip-22.0.4-py3-none-any.whl#sha256=c6aca0f2f081363f689f041d90dab2a07a9a07fb840284db2218117a52da800b (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/33/c9/e2164122d365d8f823213a53970fa3005eb16218edcfc56ca24cb6deba2b/pip-22.0.4.tar.gz#sha256=b3a9de2c6ef801e9247d1527a4b16f92f2cc141cd1489f3fffaf6a9e96729764 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/79/3a/d341ae105c8b49eac912bee40739d496ae80f9441efa7df6c68f4997bbc8/pip-22.1b1-py3-none-any.whl#sha256=09e9e8f8e10f2515134b59600ad3630219430eabb734336079cbc6ffb2e01a0e (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/a7/c0/794f22836ef3202a7ad61f0872278ee7ac62e8c7617e4c9a08f01b5e82da/pip-22.1b1.tar.gz#sha256=f54ab61985754b56c5589178cfd7dfca5ed9f98d5c8f2de2eecb29f1341200f1 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/f3/77/23152f90de45957b59591c34dcb39b78194eb67d088d4f8799e9aa9726c4/pip-22.1-py3-none-any.whl#sha256=802e797fb741be1c2d475533d4ea951957e4940091422bd4a24848a7ac95609d (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/99/bb/696e256f4f445809f25efd4e4ce42ff99664dc089cafa1e097d5fec7fc33/pip-22.1.tar.gz#sha256=2debf847016cfe643fa1512e2d781d3ca9e5c878ba0652583842d50cc2bcc605 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/9b/e6/aa8149e048eda381f2a433599be9b1f5e5e3a189636cd6cf9614aa2ff5be/pip-22.1.1-py3-none-any.whl#sha256=e7bcf0b2cbdec2af84cc1b7b79b25fdbd7228fbdb61a4dca0b82810d0ba9d18b (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/3e/0a/6125e67aa4d3245faeed476e4e26f190b5209f84f01efd733ac6372eb247/pip-22.1.1.tar.gz#sha256=8dfb15d8a1c3d3085a4cbe11f29e19527dfaf2ba99354326fd62cec013eaee81 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/96/2f/caec18213f6a67852f6997fb0673ae08d2e93d1b81573edb93ba4ef06970/pip-22.1.2-py3-none-any.whl#sha256=a3edacb89022ef5258bf61852728bf866632a394da837ca49eb4303635835f17 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/4b/b6/0fa7aa968a9fa4ef63a51b3ff0644e59f49dcd7235b3fd6cceb23f202e08/pip-22.1.2.tar.gz#sha256=6d55b27e10f506312894a87ccc59f280136bad9061719fac9101bdad5a6bce69 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)

Minimal working example to use pretrained embeddings

Hi,
Thanks for sharing your work!

I am trying to make use of your pretrained embeddings, but have a hard time finding the right pieces of code.
Especially, the use of allennlp makes it hard for me to debug your code to find the correct classes I am looking for, since I have never worked with allennlp.

Is there an easy way to:

  • import the model
  • load the pretrained model
  • load a simple csv file
  • get the embedding representations of a row from a csv file

I think something like a minimal working example on how to use the pretrained embeddings would help a lot of people who want to use tabbie.
I have tried to do this but I am really struggling....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.