xmos / fwk_voice Goto Github PK
View Code? Open in Web Editor NEWVoice Framework
License: Other
Voice Framework
License: Other
The use case for Avona is to use ddr at runtime, without data required to be loaded via the bootloader.
This is logged as a XTC bug: http://10.0.102.172/show_bug.cgi?id=18540
Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Automatic Gain Controller. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.
Currently blocked waiting on fix in lib_xud
This branch from a commit on July 19 is the last one where the 250k mode (barely) fits in SRAM. The very next commit (489ffe2) added support for the model being stored in the filesystem. As part of this change, several buffer were increased in size.
The heap is the main culprit, increasing by 139400 bytes. And, perhaps the decode buffer can return to 30000. Need to investigate.
Re-design and implement the Interference Canceller to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked IC functions operate as intended.
Re-design and implement the Automatic Delay Estimation Controller and surrounding components to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked ADEC functions operate as intended.
TODO: Need more definition
Provide documentation that describes:
General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.
Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.
Wakeup signal will be a GPO pin.
We will need to buffer N seconds of audio, including M milliseconds before the wakeword. We do not have a specification for N but we know M must be at least 500 millseconds.
N & M may be configurable and we either have enough SRAM or not. The build report will tell us.
In the file applications/avona/filesystem_support/create_fs.bat
the 250k and 50k files need to go into a "ww" folder. So, these lines:
cp "%WW_PATH%\models\common\WR_250k.en-US.alexa.bin" %temp%\fatmktmp\250kenUS.bin
cp "%WW_PATH%\models\common\WS_50k.en-US.alexa.bin" %temp%\fatmktmp\50kenUS.bin
likely need to be changed to:
cp "%WW_PATH%\models\common\WR_250k.en-US.alexa.bin" %temp%\fatmktmp\ww\250kenUS.bin
cp "%WW_PATH%\models\common\WS_50k.en-US.alexa.bin" %temp%\fatmktmp\ww\50kenUS.bin
In addition, we no longer want to skip the creation of fat.fs
when it exists already as this can hide issues.
It is possible when building the applications, by checking the https://github.com/xmos/xcore_sdk/blob/develop/settings.json file in the SDK. to verify the SDK version being used is the correct version.
Current 250k model detects 6 or 7 out of 10 wakewords when the reference (x86 and xs3) detects 9/10.
Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Acoustic Echo Canceller. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.
Provide documentation that describes:
General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.
Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.
When writing the CMakeLists.txt for the lib_agc module, I simplified it to the minimal set of commands so it is much cleaner than the equivalent in lib_aec. Throughout sw_avona, we have been copying CMakeLists.txt files with a lot of boiler-plate code and unnecessary repetition. This can all be simplified by removing unnecessary commands and setting properties and variables in a suitable hierarchy to avoid repetition (eg. the executable suffix ".xe" should be set at a high level in the repo, not set individually in every CMakeLists.txt that produces such an executable).
Provide documentation that describes:
General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.
Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.
Re-design and implement the Automatic Gain Controller to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked AGC functions operate as intended.
Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Noise Suppressor. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.
Support the ability to route audio into the start of the pipeline from host.
Output the following synchronized audio signals:
This includes host side app/scripts to unpack into a multi-channel wav file.
Provide documentation that describes:
General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.
Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.
Consideration of an additional host interface option (SPI). Note this is non-real time interface, but may allow a more standard system level architecture.
While testing sw_avona adec module using the ADEC tests ported from lib_audio_pipelines (https://github.com/xmos/lib_audio_pipelines/tree/develop/tests/test_delay_estimator_controller), I found some issues that I've described here.
I have 3 failing tests on sw_avona ADEC.
rapid_changes - False negatives
small_mic_increase - False positives
delay_at_start - False positives.
While debugging these failures, I realised that 3610 lib_aec calculates inverse_X_energy normDenom in a slightly different way than the python model.
3610 lib_aec: https://github.com/xmos/lib_aec/blob/develop/lib_aec/src/aec_calc_inv_energy_params.xc#L59
python model: https://github.com/xmos/py_aec/blob/develop/py_aec/aec.py#L581
So there's an extra factor of 2 that is multiplied to X_energy to do norm_denom = 2X_energy + sigma_xxgamma.
When I add this multiplication by 2 factor in sw_avona lib_aec, the ADEC tests pass. If I remove this factor of 2 from 3610, lib_aec code, I see the 3 ADEC tests failures in lib_audio_pipelines as well.
I'm not sure how this discrepancy wrt python model got implemented in the first place. I don't recall adding an extra multiplication while implementing 3610 lib_aec and I maybe the python itself had the factor of 2 at some point?
When I run lib_audio_pipelines full keyword tests with and without this factor of 2 multiplication the results don't show any significant difference so both implementations seem okay.
While debugging these failing ADEC tests, here's what I found:
Conclusion:
Slave mode
I'm adding a list of lib_aec features not supported in 0.1.0 but will be added in the future:
Provide documentation that describes:
General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.
Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.
Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Interference Canceller. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.
Currently a log is generated that a user needs to manually compare with the output from the x86 application.
One idea is to parse this log in a pytest and compare to reference to determine pass fail.
The current AGC implementation (and the python model in lib_agc) don't appear to support two loss control transitions:
Relevant code is here. In particular, the "far-end speech only" branch has "do nothing", so the timers are not adjusted, and decrementing lc_t_near
requires silence.
Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Automatic Delay Estimation Controller and surrounding components. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.
Re-design and implement the Acoustic Echo Canceller to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked AEC functions operate as intended.
Need a way to get the WW library from the CI jobs.
Input:
Output:
Port and modify existing integration and component-level tests, design and implement any new tests and use them to show correct operation of the Voice Activity Detector. The tests shall reside in a repository separate from sw_avona. Each test may require specific hardware or a hardware simulator for operation.
DSP Blocks:
Re-design and implement the Noise Suppressor to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked NS functions operate as intended.
Provide documentation that describes:
General component documentation shall use ReStructured Text. Function documentation shall use Doxygen.
Keep examples to a minimum (we will add more later), and only include ones necessary to describe the purpose of the component and its mode(s) of operation.
Investigate why, on macOS, the device name is not "XVF3652".
Re-design and implement the Voide Activity Detector and related components to provide necessary functionality. Implementation shall use C and be suitable for use under FreeRTOS or bare-metal. It should use lib_xs3_math and VPU optimisations where possible. Nothing in the implementation shall refer to or require specific hardware, e.g., an xCore.AI. Completion of this issue does not require all tests passing. It may involve re-design and implementation of unit tests at the descretion of the engineer to show that the re-worked VAD functions operate as intended.
Building for Windows still has some issues that need to be cleaned up and the Windows build steps are not fully documented.
In addition, the filesystem creation process needs to support Windows.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.