Code Monkey home page Code Monkey logo

arxiv-assistant's Introduction

Arxiv Assistant

Automatically fetch daily arxiv papers, filter with GPT, and send you an email.

The program will check for new papers every 6 hours, use GPT to filter papers related to your input keywords, and send an email to your email address. It also saves jsons under papers/.

image-20231225205700261

Quick Start

  1. Ensure that your network can connect to arxiv API and ChatGPT API

  2. Install packages

    pip install openai arxiv markdown2
  3. Save your OpenAI API key in openai_key.txt. If you don't want to use GPT filter or don't have an OpenAI API key, set gpt_filter=False when initializing ArxivAssistant.

  4. Set up SMTP in your email (Instructions) and save the related information in mail_info.json. An example:

    {
        "mail_host": "smtp.qq.com", // SMTP host
        "mail_user": "[email protected]", // your email address
        "mail_pass": "xxxxxxxx" // e.g. identification code of SMTP
    }
  5. Run the routine. Make sure the program runs constantly, e.g. run with tmux on a server

    import openai
    import json
    
    from assistant import ArxivAssistant
    
    with open("openai_key.txt") as f:
        openai.api_key = f.read()
        
    with open("mail_info.json") as f:
        mail_info = json.load(f)
    
    assistant = ArxivAssistant(
        mail_host=mail_info["mail_host"],
        mail_user=mail_info["mail_user"],
        mail_pass=mail_info["mail_pass"],
        
        categories=['cs.CV', 'cs.CL', 'cs.LG', 'cs.AI'], # your interested arxiv categories. See https://arxiv.org/category_taxonomy 
        keywords=['large language model', 'LLM'], # keywords describing your research interest
        negative_keywords=['medical'] # (Optional) keywords describing papers you don't want to read
    )
    
    assistant.run_routine()

Customize

  1. Configure the number of papers:

    1. max_results_per_category: If the number of papers in some category of one day exceeds this number, only the first max_results_per_category papers are kept. Defaults to 500.

    2. max_papers_per_query: The papers are divided into groups to avoid exceeding the context length of GPT, each containing this number of papers. Defaults to 50.

    3. num_filtered_papers: The maximum number of output papers for each group. Defaults to 10.

  2. Configure routine interval: Set routine_interval_hours. Defaults to 6.

Note: Arxiv publishes new papers at 20:00 EST every Sunday to Thursday. When the interval is less than 24, the routine only succeeds one time a day. When the interval is more than 24, only the last publish date (yesterday / last Thursday) is considered.

  1. Configure GPT:

    1. temperature: output temperature. Defaults to 0.7.
    2. gpt_model: gpt model to use. Please note the context length, and change max_papers_per_query accordingly. Defaults to gpt-3.5-turbo-16k.
  2. Change email receivers: mail_receivers is a list of receivers' email addresses. Defaults to the same as the mail sender.

  3. Customize prompt for GPT and email content: Update the strings in prompts.py and single_paper_info in assistant.py. The email content follows Markdown format.

Acknowledgment

This repository is partially built on wbs2788/Arxiv-Daily.

arxiv-assistant's People

Contributors

cpsxhao avatar wzk1015 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

wisematch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.