Code Monkey home page Code Monkey logo

acl-anthology-helper's Introduction

acl-anthology-helper

License: MIT

To help search, filter, and download papers from 'acl anthology' (https://aclanthology.org/).

Main Features

  • Retrieve papers from acl anthology.
    retrieve directly from website acl anthology.
    e.g. Retriever.acl(2021, ConfConsts.LONG)
    download all papers's info to local (MySQL database).
    e.g.
    db = AnthologyMySQL(cache_enable=True)
    db.create_tables()
    db.load_data() # load data and put into database
  • Import ABuilder to support chain operations for MySQL.
    e.g.
    data = ABuilder().table('paper').where({"year": ["in", years_limit]}).where({"venue": ["in", venue_limit]}).query()
  • Filter papers with by keyword.
    e.g. filtered = papers.filter('title', 'xxx') | papers.filter('abstract', 'xxx')
    e.g. filtered = papers.and_containing_filter(attr, [keyword1, keyword2])
  • Download papers.
    e.g. downloader.multi_download(filtered, download_path)
  • Local cache available.
  • Log available.
  • Statistics available (although I only count the total number of papers).

Get Started

  • Firstly. MySQL is required. Mine is MySQL 8.
    Configurate your MySQL database and add a src/configuration/mysql_cfg.py file.
    The example of src/configuration/mysql_cfg.py is as follows:
class MySQLCFG(object):
    HOST = 'localhost'
    PORT = 3306
    USER = "root"
    PASSWORD = "xxx"
    DB = "xxx"

Meanwhile, create the corresponding database in your MySQL database.

- Secondly. If you want to use ABuilder.
You need to make a tasks/database.py with configurations of you MySQL.
You can refer to the homepage of ABuilder.

In the latest version, I made the tasks/database.py get info from the configuration. No need to make this file any more:

  • Download and decompress the code, open a terminal and checkout to the root directory.
    run
pip install requirements.txt
cd tasks
python basic_task.py

By running this code, this basic_task will firstly download all papers within a certain time span from Acl Anthology to the local disk, and then search papers by input key words.

Note

1. Comments

I develop this project by Python 3.6, and it doesn't support python 2.

2023.6.14 The code is updated to support the lastest acl anthology pages. Current python version is 3.10 . 2023.7.2 Update the README.

2. A survey paper is written with this tool

@article{tang2022recent,
  title={Recent advances in neural text generation: A task-agnostic survey},
  author={Tang, Chen and Guerin, Frank and Li, Yucheng and Lin, Chenghua},
  journal={arXiv preprint arXiv:2203.03047},
  year={2022}
}

3. Others

homepage

There are many conferences and contents belonging to them.

Choose one, and we can see papers' list.

acl-anthology-helper's People

Contributors

tangg555 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

acl-anthology-helper's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.