Code Monkey home page Code Monkey logo

Comments (10)

kkumatani avatar kkumatani commented on July 30, 2024

I will add it to an example to documentation later. But instead of "linear", you can add the following fields to your JSON configuration file:
{
"array_type": "circular",
"microphone_positions": [[ 18.75, -32.475953, 0.0],
[ 37.5, 0.0, 0.0],
[ 18.75, 32.475953, 0.0],
[-18.75, 32.475953, 0.0],
[-37.5, 0.0, 0.0],
[-18.75, -32.475953, 0.0]],

...
}

from distant_speech_recognition.

kimchi88 avatar kimchi88 commented on July 30, 2024

Thanks for this example.
Just to make it sure:
each row of that matrix is the position (x,y,z) of a single microphone of the array starting from the centre of the circle, right?
Could you please exaplain me also what is the "Target" field that I can see inside the configuration? If I got it right it is the coordinates where to focus the beam. If this is the case, do you have any ready-to-use method to integrate the info coming from DOA module ?
Thanks again for your help

from distant_speech_recognition.

kkumatani avatar kkumatani commented on July 30, 2024

Sorry for my slow response. I did not notice you had questions. My answers were below.

Q1. each row of that matrix is the position (x,y,z) of a single microphone of the array starting from the centre of the circle, right?
A1. That is correct.

Q2. Could you please exaplain me also what is the "Target" field that I can see inside the configuration? If I got it right it is the coordinates where to focus the beam.

A2. Yes, you are right. It is a list of time stamp and position vector.
The context of the position vector will change, depending on the shape of the array (linear, polar, planar or near-field).
In the case of the linear array, it will require an azimuth value in radians.
In the case of the polar and planar geometries, it should have azimuth and polar angle values.
If it is the near-field, it will need x, y an z coordinate values.

Q3. If this is the case, do you have any ready-to-use method to integrate the info coming from DOA module ?
No, it is not. Currently, it will need a position of a target source in that format.

from distant_speech_recognition.

pfeatherstone avatar pfeatherstone commented on July 30, 2024

@kkumatani, in the case of polar/circular geometry, could you clarify what the target position values should be? azimuth and elevation in radians ? in which case should the third entry in the position vector be Null ?

from distant_speech_recognition.

pfeatherstone avatar pfeatherstone commented on July 30, 2024

@kkumatani by polar angle, do you mean elevation?

from distant_speech_recognition.

pfeatherstone avatar pfeatherstone commented on July 30, 2024

@kkumatani also what are the units for distances, angles and timestamps. For angles i imagine it's radians. Given that the speed of sound is hard-coded to 343740, it looks like distances are in milimeters. Could you clarify all of these?

from distant_speech_recognition.

kkumatani avatar kkumatani commented on July 30, 2024

Thanks for asking, pfeatherstone.

Q1. Culd you clarify what the target position values should be in the case of the circular geometry?

In the case of the far-field assumption with the circular array geometry, the return value would be (polar angle, aziumth). Actually, I found a bug. numpy.array([phi, theta]) must be numpy.array([theta, phi]) in at line 590, lib/pytdoa.py
and yes, it should be null.

Q2. by polar angle, do you mean elevation?

No, the polar angle is not not the elevation. The polar angle is the angle measured from the z-axis while the elevation is the angle from the xy-plane (in the spherical coordinate)

Q3. also what are the units for distances, angles and timestamps. For angles i imagine it's radians. Given that the speed of sound is hard-coded to 343740, it looks like distances are in milimeters. Could you clarify all of these?

Yes. the distance will be in mill-meters, and angle will be in radians.

from distant_speech_recognition.

pfeatherstone avatar pfeatherstone commented on July 30, 2024

@kkumatani So just to clarify again, you are using the physics convention for spherical coordinates. Mathmos use phi for the polar angle, not theta.

Also the function description of instantaneous_position in lib/pytdoa.py says it returns [azimuth, polar angle] in which case returning numpy.array([phi, theta]) using the physics convention of spherical coordinates is correct. Are you sure there is a bug? Or should the description say it is returning [polar angle, azimuth] ?

from distant_speech_recognition.

pfeatherstone avatar pfeatherstone commented on July 30, 2024

@kkumatani yet another clarification. Could you confirm that 'microphone_positions' is always in Cartesian coordinates, regardless on geometry.

from distant_speech_recognition.

kkumatani avatar kkumatani commented on July 30, 2024
  • So just to clarify again, you are using the physics convention for spherical coordinates.

    Yes, it does. Theta indicates polar angle in radians and Phi is for azimuth.

  • Are you sure there is a bug?
    Yes, I think so. The other functions assume to return (ploar angle, azimuth). Only instantaneous_position(self, frame_no) returns the opposite. So, it must return [polar angle, azimuth]

  • Could you confirm that 'microphone_positions' is always in Cartesian coordinates, regardless on geometry.

Yes, that has to be always in Cartesian coordinates.

from distant_speech_recognition.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.