Code Monkey home page Code Monkey logo

online-semantic-parsing-for-latency-reduction's Introduction

Online Semantic Parsing

License: MIT Paper

Online semantic parsing for latency reduction in task-oriented dialogue.

Set up the online semantic parsing (OSP) task, formulate the Final Latency Reduction (FLR) evaluation metric, propose the graph-based program prediction model, and develop general frameworks to parse and execute utterances in an online fashion, exploring the balance between latency reduction and program accuracy, as described in the paper Online Semantic Parsing for Latency Reduction in Task-Oriented Dialogue (ACL 2022). If you use any source code or data included in this repository in your work, please cite the following paper:

@inproceedings{zhou-etal-2022-online,
    title = "Online Semantic Parsing for Latency Reduction in Task-Oriented Dialogue",
    author = "Zhou, Jiawei  and
      Eisner, Jason  and
      Newman, Michael  and
      Platanios, Emmanouil Antonios  and
      Thomson, Sam",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.110",
    doi = "10.18653/v1/2022.acl-long.110",
    pages = "1554--1576",
    abstract = "Standard conversational semantic parsing maps a complete user utterance into an executable program, after which the program is executed to respond to the user. This could be slow when the program contains expensive function calls. We investigate the opportunity to reduce latency by predicting and executing function calls while the user is still speaking. We introduce the task of online semantic parsing for this purpose, with a formal latency reduction metric inspired by simultaneous machine translation. We propose a general framework with first a learned prefix-to-program prediction module, and then a simple yet effective thresholding heuristic for subprogram selection for early execution. Experiments on the SMCalFlow and TreeDST datasets show our approach achieves large latency reduction with good parsing quality, with a 30{\%}{--}65{\%} latency reduction depending on function execution time and allowed cost.",
}

Getting Started

Install

Assuming Conda is the default environment management tool, run

bash install_with_conda.sh

This will set up a local conda environment in the current directory named ./cenv, with all necessary packages installed. Activate the environment with conda activate ./cenv or source activate ./cenv when you need to run the code interactively.

Code Structure

  • configs*/: configuration files for data processing, model architectures, and training setups
  • run/: scripts to build offline full-to-graph models for program/graph generation
  • run_prefix/: scripts to run experiments for prefix-to-graph model for online semantic parsing
  • run_utter_dlm/: scripts to run experiments for LM-complete pipeline: de-noising language model + full-to-graph
  • src/: core code for data structures, parsing state machine, and the action-pointer Transformer model architectures

The core program/graph generation models are developed/extended from the Action-Pointer Tranformer (APT) of a transition-based parser, and our PyTorch implementation is formulated as an extension module to the Fairseq library.

Data Download

The default root directories:

  • for all raw/processed data -> ../DATA (accessed through $DATADIR)
  • for all saved models/results -> ../SAVE (accessed through $SAVEDIR).

These are set (or can be changed) in set_default_dirs.sh.

Download the SMCalFlow2.0 data and TreeDST data from here, and decompress them in $DATADIR/smcalflow2.0/ and $DATADIR/treedst/, respectively.

For quick testing of a small sample dataset, run bash scripts/create_sample_dataset.sh. This will create a sample dataset of 100 training examples and 50 validation examples from SMCalFlow2.0 data.

Running Models & Evaluation

All of the commands take the following format

# run interactively, output logs in the console
bash run*/xxx.sh configs*/config_*/config_xxx.sh [seed, default=42]

or

# send the whole process to the background
bash run*/jbash_xxx.sh configs*/config_*/config_xxx.sh [seed] [gpu_id]

where all the data/model/optimization/decoding setups are passed through a config_xxx.sh configuration file, and all detailed logs are saved in files.

Offline Parsing: Train a Program/Graph Prediction Model


Run the following script

# GPU device must be specified if you are working on a multi-GPU machine
CUDA_VISIBLE_DEVICES=0 bash run/run_model_gap.sh configs/config_gap_data_order-top_src-ct1-npwa_tgt-cp-str_roberta-large-top24_ep50/config_model_gap_ptr-lay6-h1_lr0.0005-bsz4096x4-wm4k-dp0.2.sh

This would include data processing (use run/run_data.sh for processing the data alone), training the model, and decoding for evaluation.

If you want to send the whole process to the background and specify the GPU device, run

# random seed 42, and GPU ID 1 for example
bash run/jbash_run_model_gap.sh configs/config_gap_data_order-top_src-ct1-npwa_tgt-cp-str_roberta-large-top24_ep50/config_model_gap_ptr-lay6-h1_lr0.0005-bsz4096x4-wm4k-dp0.2.sh 42 1

The configuration files in configs/ are default to SMCalFlow2.0, whereas configurations for TreeDST are in configs_treedst/.

Note: If you are working with no Internet connection and running into the following Fairseq error,

requests.exceptions.ProxyError: HTTPSConnectionPool(host='dl.fbaipublicfiles.com', port=443): Max retries exceeded with url: /fairseq/gpt2_bpe/encoder.json

but you have previously put the necessary files in the default cache directory, check the fix here.

Trained Checkpoints

We provide trained offline parsing models for SMCalFlow2.0 and TreeDST data with top-down generation order tested with beam 1 decoding

Dataset Valid Exact Match Model
SMCalFlow2.0 80.7 Download
TreeDST 90.8 Download

Online Parsing: Train a Prefix-to-Graph Model


Run the following script

# GPU device must be specified if you are working on a multi-GPU machine
CUDA_VISIBLE_DEVICES=0 bash run_prefix/run_model_gap_prefix.sh configs/config_gap_data_prefix-last-8ps_order-top_src-ct1-npwa_tgt-cp-str_roberta-large-top24/config_model_gap_ptr-lay6-h1_lr0.0001-bsz4096x4-wm4k-dp0.2.sh

This would create data consisting of utterance prefixes of last 8 percentages of relative lengths (>= 30%) for training, with full programs masking unseen entities as the targets (use run_prefix/run_data_xxx.sh for prefix data processing alone), train the model, and evaluate on all prefix lengths using the validation set.

If you want to send the whole process to the background, use run_prefix/jbash_run_model_gap_prefix.sh similarly as in training offline models. For TreeDST dataset, run with configs_treedst/ configurations.

Note: if you are running the online parser directly from scratch without first running the offline parser, you need to process the offline data first, by running the following script with corresponding data configuration file

# taking the SMCalFlow data for top-down generation as an example
CUDA_VISIBLE_DEVICES=0 bash run/run_data.sh configs/config_data/config_data_order-top_src-ct1-npwa_tgt-cp-str_roberta-large-top24.sh

Trained Checkpoints

We provide trained Prefix-to-Graph models from the above configuration below

Dataset Online Parsing Model
SMCalFlow2.0 Download
TreeDST Download

Online Parsing: Utterance LM-Complete Model from BART


Run the following script

# GPU device must be specified if you are working on a multi-GPU machine
CUDA_VISIBLE_DEVICES=0 bash run_utter_dlm/run_model_utter-dlm.sh configs/config_utter-dlm_src-ct1-npwa/config_dlm_bart-large_data_utter_abs-prefix-all_lr0.00003-scheduler-is-bsz2048x4-wm500-dp0.1-cn0.1_ep1.sh

For TreeDST dataset, run with configs_treedst/ configurations.

Evaluation: Final Latency Reduction (FLR)


Take the Prefix-to-Graph model as an example, assume we want the following execution simulations

  • constant utterance token speaking time
  • time measured by the number of source tokens
  • iterate over different constant execution times
  • for each execution time, iterate over different subgraph selection thresholds to get the FLR-cost tradeoff curve

Run the following command (assuming the prefix-to-graph model decoding results are in place)

bash run_prefix/eval_valid-prefix-allt_latency.sh

For other utterance speaking and execution timing models, we can run similarly

  • bash run_prefix/eval_valid-prefix-allt_latency_wr-char-linear.sh: time measured by milliseconds, word speaking model is a character length linear model
  • bash run_prefix/eval_valid-prefix-allt_latency_wr-real-voice.sh: time measured by milliseconds, word speaking model fit from real voice ASR data

For running FLR evaluation with different models and/or on different datasets, check run_prefix/jbash_eval_latency_many*.sh.

For running FLR evaluation with LM-Complete + Full-to-Graph model, run scripts in run_utter_dlm/ with similar names as above.

online-semantic-parsing-for-latency-reduction's People

Contributors

microsoft-github-operations[bot] avatar microsoftopensource avatar sammthomson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.