Code Monkey home page Code Monkey logo

s2v_rc's Introduction

Speech2Vec Reality Check

Code for "Homophone Reveals the Truth: A Reality Check for Speech2Vec"
A brief version of this report has been accepted by ICASSP-2023: "A Reality Check and A Practical Baseline for Semantic Speech Embedding"

Requirements

  • Free GPU RAM >= 3GB
  • Free System RAM >= 30GB
  • Pytorch Version >= 1.8.2

Data

Download the dataset.zip (2.4GB) from Google Drive. Unzip it to the project root. Its structure is shown below:

dataset/
├── info
│   ├── 500h_word2wav_keys.pkl
│   ├── 500h_word_counter.pkl
│   ├── 500h_word_split.pkl
│   └── eval
│       ├── all_words_5846.pkl
│       ├── files
│       │   ├── EN-MC-30.txt
│       │   ├── EN-MEN-TR-3k.txt
│       │   ├── EN-MTurk-287.txt
│       │   ├── EN-MTurk-771.txt
│       │   ├── EN-RG-65.txt
│       │   ├── EN-RW-STANFORD.txt
│       │   ├── EN-SIMLEX-999.txt
│       │   ├── EN-SimVerb-3500.txt
│       │   ├── EN-VERB-143.txt
│       │   ├── EN-WS-353-ALL.txt
│       │   ├── EN-WS-353-REL.txt
│       │   ├── EN-WS-353-SIM.txt
│       │   └── EN-YP-130.txt
│       └── homophone.txt
├── split_mfcc_dict.pkl
└── split_mfcc_mean_std.pkl

We also provided some speech sentence segment exmaples in SentenceSegmentExamples.zip.

This instruction describes how we generated those files.

Training

Just run python 1_train.py

The full training process (500-epoch) takes 8.4 days on an AMD 3900XT + RTX3090 machine.


use wandb to view the training process:

  1. Create .wb_config.json file in the project root, using the following content:

    {
      "WB_KEY": "Your wandb auth key"
    }
    
  2. add --dryrun=False to the training command, for example: python 1_train.py --dryrun=False

CheckPoints & Embeddings

The checkpoints and embeddings of every epoch are in Full500EpochModelsEmbedings.zip (2.9GB)

The Rand Init model corresponds to: epoch-01_ws0.10_men0.08_loss-1.000000.pkl

The 500-Epoch model corresponds to: epoch499_ws0.15_men0.08_loss0.247943.pkl

Contact

Feel free to contact me if you have any question:
Email: [email protected]
WeChat:

s2v_rc's People

Contributors

my-yy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.