Dear Kumatani, Could you share any additional information about the beamforming co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

conf file of circular array about distant_speech_recognition HOT 10 CLOSED

kkumatani commented on July 30, 2024

conf file of circular array

from distant_speech_recognition.

Comments (10)

kkumatani commented on July 30, 2024

I will add it to an example to documentation later. But instead of "linear", you can add the following fields to your JSON configuration file:
{
"array_type": "circular",
"microphone_positions": [[ 18.75, -32.475953, 0.0],
[ 37.5, 0.0, 0.0],
[ 18.75, 32.475953, 0.0],
[-18.75, 32.475953, 0.0],
[-37.5, 0.0, 0.0],
[-18.75, -32.475953, 0.0]],

...
}

from distant_speech_recognition.

kimchi88 commented on July 30, 2024

Thanks for this example.
Just to make it sure:
each row of that matrix is the position (x,y,z) of a single microphone of the array starting from the centre of the circle, right?
Could you please exaplain me also what is the "Target" field that I can see inside the configuration? If I got it right it is the coordinates where to focus the beam. If this is the case, do you have any ready-to-use method to integrate the info coming from DOA module ?
Thanks again for your help

from distant_speech_recognition.

kkumatani commented on July 30, 2024

Sorry for my slow response. I did not notice you had questions. My answers were below.

Q1. each row of that matrix is the position (x,y,z) of a single microphone of the array starting from the centre of the circle, right?
A1. That is correct.

Q2. Could you please exaplain me also what is the "Target" field that I can see inside the configuration? If I got it right it is the coordinates where to focus the beam.

A2. Yes, you are right. It is a list of time stamp and position vector.
The context of the position vector will change, depending on the shape of the array (linear, polar, planar or near-field).
In the case of the linear array, it will require an azimuth value in radians.
In the case of the polar and planar geometries, it should have azimuth and polar angle values.
If it is the near-field, it will need x, y an z coordinate values.

Q3. If this is the case, do you have any ready-to-use method to integrate the info coming from DOA module ?
No, it is not. Currently, it will need a position of a target source in that format.

from distant_speech_recognition.

pfeatherstone commented on July 30, 2024

@kkumatani, in the case of polar/circular geometry, could you clarify what the target position values should be? azimuth and elevation in radians ? in which case should the third entry in the position vector be Null ?

from distant_speech_recognition.

pfeatherstone commented on July 30, 2024

@kkumatani by polar angle, do you mean elevation?

from distant_speech_recognition.

pfeatherstone commented on July 30, 2024

@kkumatani also what are the units for distances, angles and timestamps. For angles i imagine it's radians. Given that the speed of sound is hard-coded to 343740, it looks like distances are in milimeters. Could you clarify all of these?

from distant_speech_recognition.

kkumatani commented on July 30, 2024

Thanks for asking, pfeatherstone.

Q1. Culd you clarify what the target position values should be in the case of the circular geometry?

In the case of the far-field assumption with the circular array geometry, the return value would be (polar angle, aziumth). Actually, I found a bug. numpy.array([phi, theta]) must be numpy.array([theta, phi]) in at line 590, lib/pytdoa.py
and yes, it should be null.

Q2. by polar angle, do you mean elevation?

No, the polar angle is not not the elevation. The polar angle is the angle measured from the z-axis while the elevation is the angle from the xy-plane (in the spherical coordinate)

Q3. also what are the units for distances, angles and timestamps. For angles i imagine it's radians. Given that the speed of sound is hard-coded to 343740, it looks like distances are in milimeters. Could you clarify all of these?

Yes. the distance will be in mill-meters, and angle will be in radians.

from distant_speech_recognition.

pfeatherstone commented on July 30, 2024

@kkumatani So just to clarify again, you are using the physics convention for spherical coordinates. Mathmos use phi for the polar angle, not theta.

Also the function description of instantaneous_position in lib/pytdoa.py says it returns [azimuth, polar angle] in which case returning numpy.array([phi, theta]) using the physics convention of spherical coordinates is correct. Are you sure there is a bug? Or should the description say it is returning [polar angle, azimuth] ?

from distant_speech_recognition.

pfeatherstone commented on July 30, 2024

@kkumatani yet another clarification. Could you confirm that 'microphone_positions' is always in Cartesian coordinates, regardless on geometry.

from distant_speech_recognition.

kkumatani commented on July 30, 2024

So just to clarify again, you are using the physics convention for spherical coordinates.

Yes, it does. Theta indicates polar angle in radians and Phi is for azimuth.
Are you sure there is a bug?
Yes, I think so. The other functions assume to return (ploar angle, azimuth). Only instantaneous_position(self, frame_no) returns the opposite. So, it must return [polar angle, azimuth]
Could you confirm that 'microphone_positions' is always in Cartesian coordinates, regardless on geometry.

Yes, that has to be always in Cartesian coordinates.

from distant_speech_recognition.

conf file of circular array about distant_speech_recognition HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent