xmos / fwk_voice Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 17.0 31.69 MB

Voice Framework

License: Other

CMake 5.21% C 50.77% Shell 0.35% Python 37.34% XC 0.34% Jupyter Notebook 1.21% C++ 4.78%

fwk_voice's People

Contributors

Stargazers

Watchers

Forkers

shuchitak kevinyeungxmos uvvpavel atuxhe keithauxmos lin20121221 allan-xmos mbanth changxuding aflyingwolf openfnord jiuliguan jerrymccarthy brennangit ed-xmos lucianomartin

fwk_voice's Issues

Extmem is not even initialized when run from a flashed image on hardware

The use case for Avona is to use ddr at runtime, without data required to be loaded via the bootloader.

This is logged as a XTC bug: http://10.0.102.172/show_bug.cgi?id=18540

Implement GPIO interface

4 General Purpose Output pins. These can be configured as simple digital I/O pins, Pulse Width Modulated (PWM) outputs and rate adjustable LED flashers.
4 General Purpose Input pins. These can be used as simple logic inputs or event capture (edge detection).

Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Automatic Gain Controller. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.

600MHz core clock speed configuration is not reliable

Currently blocked waiting on fix in lib_xud

250k model no longer fits in SRAM

This branch from a commit on July 19 is the last one where the 250k mode (barely) fits in SRAM. The very next commit (489ffe2) added support for the model being stored in the filesystem. As part of this change, several buffer were increased in size.

model runner stack went from 650 words to 2500 words
decode buffer size went from 30000 bytes to 35000 bytes
RTOS heap went from 120 * 1024 bytes to 256 * 1024 bytes

The heap is the main culprit, increasing by 139400 bytes. And, perhaps the decode buffer can return to 30000. Need to investigate.

IC functionally complete

Re-design and implement the Interference Canceller to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked IC functions operate as intended.

ADEC functionally complete

Re-design and implement the Automatic Delay Estimation Controller and surrounding components to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked ADEC functions operate as intended.

Peripheral device configuration template functions

TODO: Need more definition

AGC documentation complete

Provide documentation that describes:

What the AGC does from a user's perspective,
Its mode(s) of operation, and
The purpose, parameters, return values and any constraints for each function in the component's API.

General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.

Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.

Setup continuous integration build

Checkout latest SDK
Build a couple app configs and archive firmware
Build docs and archive

Add ability to wakeup host MCU when a wakeword is detected

Wakeup signal will be a GPO pin.

We will need to buffer N seconds of audio, including M milliseconds before the wakeword. We do not have a specification for N but we know M must be at least 500 millseconds.

N & M may be configurable and we either have enough SRAM or not. The build report will tell us.

Incorrect path to WW models in filesystem

In the file applications/avona/filesystem_support/create_fs.bat the 250k and 50k files need to go into a "ww" folder. So, these lines:

    cp "%WW_PATH%\models\common\WR_250k.en-US.alexa.bin" %temp%\fatmktmp\250kenUS.bin
    cp "%WW_PATH%\models\common\WS_50k.en-US.alexa.bin" %temp%\fatmktmp\50kenUS.bin

likely need to be changed to:

    cp "%WW_PATH%\models\common\WR_250k.en-US.alexa.bin" %temp%\fatmktmp\ww\250kenUS.bin
    cp "%WW_PATH%\models\common\WS_50k.en-US.alexa.bin" %temp%\fatmktmp\ww\50kenUS.bin

In addition, we no longer want to skip the creation of fat.fs when it exists already as this can hide issues.

Ensure version of the SDK being used is valid

It is possible when building the applications, by checking the https://github.com/xmos/xcore_sdk/blob/develop/settings.json file in the SDK. to verify the SDK version being used is the correct version.

Wakeword functional test fails

Current 250k model detects 6 or 7 out of 10 wakewords when the reference (x86 and xs3) detects 9/10.

Implement I2C device control interface

Keyword spotter proposal

Propose list of commands
Write scripts to extract and pre-process data

AEC passing tests

Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Acoustic Echo Canceller. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.

NS documentation complete

Provide documentation that describes:

What the NS does from a user's perspective,
Its mode(s) of operation, and
The purpose, parameters, return values and any constraints for each function in the component's API.

General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.

Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.

Simplify CMake

When writing the CMakeLists.txt for the lib_agc module, I simplified it to the minimal set of commands so it is much cleaner than the equivalent in lib_aec. Throughout sw_avona, we have been copying CMakeLists.txt files with a lot of boiler-plate code and unnecessary repetition. This can all be simplified by removing unnecessary commands and setting properties and variables in a suitable hierarchy to avoid repetition (eg. the executable suffix ".xe" should be set at a high level in the repo, not set individually in every CMakeLists.txt that produces such an executable).

VAD documentation complete

Provide documentation that describes:

What the VAD does from a user's perspective,
Its mode(s) of operation, and
The purpose, parameters, return values and any constraints for each function in the component's API.

General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.

Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.

AGC functionality complete

Re-design and implement the Automatic Gain Controller to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked AGC functions operate as intended.

NS passing tests

Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Noise Suppressor. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.

Implement USB audio test interface

Support the ability to route audio into the start of the pipeline from host.

Output the following synchronized audio signals:

Processed audio (ASR & comms)
Stereo reference audio
2x microphones

Host FAT filesystem generation scripts/applications

Add support for packed, 6 channel debug audio output over I2S

This includes host side app/scripts to unpack into a multi-channel wav file.

Implement USB audio interface

Reference audio in
Processed audio out
Adaptive mode

IC documentation complete

Provide documentation that describes:

What the IC does from a user's perspective,
Its mode(s) of operation, and
The purpose, parameters, return values and any constraints for each function in the component's API.

General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.

Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.

Implement SPI host interface mode

Consideration of an additional host interface option (SPI). Note this is non-real time interface, but may allow a more standard system level architecture.

Some open issues related to ADEC testing

While testing sw_avona adec module using the ADEC tests ported from lib_audio_pipelines (https://github.com/xmos/lib_audio_pipelines/tree/develop/tests/test_delay_estimator_controller), I found some issues that I've described here.

I have 3 failing tests on sw_avona ADEC.
rapid_changes - False negatives
small_mic_increase - False positives
delay_at_start - False positives.

While debugging these failures, I realised that 3610 lib_aec calculates inverse_X_energy normDenom in a slightly different way than the python model.
3610 lib_aec: https://github.com/xmos/lib_aec/blob/develop/lib_aec/src/aec_calc_inv_energy_params.xc#L59
python model: https://github.com/xmos/py_aec/blob/develop/py_aec/aec.py#L581

So there's an extra factor of 2 that is multiplied to X_energy to do norm_denom = 2X_energy + sigma_xxgamma.
When I add this multiplication by 2 factor in sw_avona lib_aec, the ADEC tests pass. If I remove this factor of 2 from 3610, lib_aec code, I see the 3 ADEC tests failures in lib_audio_pipelines as well.

I'm not sure how this discrepancy wrt python model got implemented in the first place. I don't recall adding an extra multiplication while implementing 3610 lib_aec and I maybe the python itself had the factor of 2 at some point?

When I run lib_audio_pipelines full keyword tests with and without this factor of 2 multiplication the results don't show any significant difference so both implementations seem okay.

While debugging these failing ADEC tests, here's what I found:

The failures happen because there's an early (frame 48) transition to DE mode in C code which doesn't happen on python.
The ERLE curve for C follows python (atleast to begin with, before they diverge due to different shadow resets, copy etc.) but at a lower ERLE on C than python. This lower ERLE triggers DE on frame 48 in C code. This happens on both the sw_avona and 3610 AEC code and is most likely because of limited fixed point precision and not a bug in the C code.
Because of the 48th frame DE transition, the small_mic_increase and delay_at_start tests fail with an extra false positive.
The rapid_changes test is interesting. While the early DE transition has happened and AEC is in DE mode, an actual delay change happens in the stream where the mic signal becomes early wrt reference. This delay change happens too late and before the AEC filter can converge to the new peak, we transition out of DE mode with the wrong measured delay. Since the actual delay is in fact, mic early, post this the filters never converge. This means that the initial shadow -> main filter copy never happens, which means ADEC doesn't run its logic anymore, since ADEC waits for the shadow->main filter copy before monitoring AEC performance. As a result we get false_negatives for this test case.
There's a watchdog in ADEC that is supposed to force trigger DE in consistently bad AEC case, but the watchdog check itself is within the has_shadow_copy_happened check so never gets triggered.

Conclusion:

I think getting ADEC to work with compare_filters logic needs more work. This is already an item acoustic team's backlog.
I've decided to introduce the factor of 2 multiplication in sw_avona lib_aec as well to get adec tests passing and also because it doesn't seem to degrade anything.
AEC performance parameters like ERLE, peak to average ratio etc. that are used to make decisions in compare_filters and ADEC algorithms are different in C vs python. So using the C code to tune these algorithms would perhaps make more sense. Also, debugging various reported issues in future on C rather than python would be useful.
In the pipeline example, I'm going to only demonstrate ADEC configured in initial delay estimation mode (same as 3610 default setting) since I'm not sure of robustness of ADEC in automatic DE control mode.

Implement I2S audio interface

Slave mode

Reference audio in
Processed audio out

AEC development beyond v0.1.0

I'm adding a list of lib_aec features not supported in 0.1.0 but will be added in the future:

L2 API level task distribution scheme.
Example demonstrating L2 API use.
Double precision C model for the AEC.
Decide if delay_estimator will be a separate module and move it out of lib_aec if needed.
Convert aec_unit_tests from xc to c.
Remove sh use from run_xcoreai.py in examples to make Windows compatible.

USB processed and reference output audio channels are all zeros on MacOS

ADEC documentation complete

Provide documentation that describes:

What the ADEC and surrounding components do from a user's perspective,
Their mode(s) of operation, and
The purpose, parameters, return values and any constraints for each function in the component's API.

General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.

Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.

Implement USB device control interface

IC passing tests

Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Interference Canceller. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.

Extend Wakeword functional test to determine pass/fail

Currently a log is generated that a user needs to manually compare with the output from the x86 application.

One idea is to parse this log in a pytest and compare to reference to determine pass fail.

Wakeword functional test

Some AGC loss control transitions to far-end speech don't appear possible

The current AGC implementation (and the python model in lib_agc) don't appear to support two loss control transitions:

near-end only to far-end ony
double-talk to far-end only
These require a transition to "silence" in between.

Relevant code is here. In particular, the "far-end speech only" branch has "do nothing", so the timers are not adjusted, and decrementing lc_t_near requires silence.

ADEC passing tests

Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Automatic Delay Estimation Controller and surrounding components. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.

AEC functionally complete

Re-design and implement the Acoustic Echo Canceller to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked AEC functions operate as intended.

Build UA and INT configs using WW

Need a way to get the WW library from the CI jobs.

Implement sample rate conversion

Input:

48 kSPS -> 16 kSPS

Output:

16 kSPS -> 48 kSPS

VAD passing tests

Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Voice Activity Detector. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.

Integrate wakeword engine

Implement placeholder pipeline

DSP Blocks:

NS functionally complete

Re-design and implement the Noise Suppressor to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked NS functions operate as intended.

AEC documentation complete

Provide documentation that describes:

What the AEC does from a user's perspective,
Its mode(s) of operation, and
The purpose, parameters, return values and any constraints for each function in the component's API.

General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.

Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.

MacOS coreaudio device name is "UAC2"

Investigate why, on macOS, the device name is not "XVF3652".

VAD functionally complete

Re-design and implement the Voide Activity Detector and related components to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked VAD functions operate as intended.

Windows not fully supported

Building for Windows still has some issues that need to be cleaned up and the Windows build steps are not fully documented.

In addition, the filesystem creation process needs to support Windows.

xmos / fwk_voice Goto Github PK

fwk_voice's People

Contributors

Stargazers

Watchers

Forkers

fwk_voice's Issues

Recommend Projects

Recommend Topics

Recommend Org