Code Monkey home page Code Monkey logo

kaldiio's People

Contributors

csukuangfj avatar danpovey avatar kamo-naoyuki avatar nttcslab-sp-admin avatar ruabraun avatar shigekikarita avatar zh794390558 avatar zinurist avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kaldiio's Issues

Nnet example files

I'm trying to access the features that are used for kaldi's dnn model. It looks like these matrices stored as a different type of file (Nnet3Eg, NumIo). I don't see that these are supported. Would it be non-trivial to read these?

New version cannot work with multichannel .wav files

Hi,

I'm using kaldiio to process multichannel wav files in espnet, and noticed that in matio.py

mat = _load_mat(arkfd, offset, slices, endian=endian,
                                        as_bytes=as_bytes,
                                        use_scipy_wav=False)

can not successfully read the header but the older version (use_scipy_wav=offset is None) can.

I think it is because when processing multichannel files, it firstly call wavio.read_wav() which modified 'fd' but failed(don't know why), then call wavio.read_wav_scipy() with this 'fd' and fail to reach the header.

If my understanding is wrong please let me know, thanks.

Reading models

Hi, this is one of the best tools for reading ark and scp files! However, wanted to know how can one read models generated by kaldi, e.g. fullUBM models or ivector extractor models, which are usually generated by kaldi with 'ubm' extensions. For example: final.ubm, or final.dubm.

ValueError: Unexpected format <Nnet3Eg>

it's a good package that helps me a lot, but when I use it to load scp files (exp/xvector_nnet/egs/egs.1.scp for example) that kaldi prepared in egs/sre16 scripts, an ValueError occured, here's the traceback
Traceback (most recent call last):
File "", line 1, in
File "/home/torch/lib/python3.8/site-packages/kaldiio-2.17.0-py3.8.egg/kaldiio/matio.py", line 339, in load_ark
File "/home/torch/lib/python3.8/site-packages/kaldiio-2.17.0-py3.8.egg/kaldiio/matio.py", line 429, in read_kaldi
File "/home/torch/lib/python3.8/site-packages/kaldiio-2.17.0-py3.8.egg/kaldiio/matio.py", line 518, in read_matrix_or_vector
ValueError: Unexpected format: "". Now FM, FV, DM, DV, CM, CM2, CM3 are supported.
I suspect that is just an alias for the supported format, but have no idea how to do solve it. Is there any solutions? Thanks a lot.

Test failures

Hi,
The tests seem to run fine on a mac but when I tested on Linux I got some failures..
you might want to have a look.

test.txt

issues in README.md

The following
uttid1 cat /some/where/feats.ark:123 |
would not work because the :123 would not be interpreted by the shell.

Also there is an extra i in kaldiiio

matrix slices differ between KALDI and kaldiio

Hi,
I want to use the kaldiio library to read feats.scp,
but when slices are included in feats.scp file,
KALDI-command and kaldiio-function give different results.

This shell script(test.sh) reproduces the problem:

#!/bin/bash
matrix-dim /path/to/test.ark:6[3:5]
python3 -c 'import kaldiio; print(kaldiio.load_mat("/path/to/test.ark:6[3:5]").shape)'

Execution result:

$ test.sh
matrix-dim /path/to/test.ark:6[3:5]
3       20
(2,20)

thanks.

read to download sample wav.scp file(include pipe sox)

Hi all,
I want to use the kaldiio library to read wav.scp and segments file,but in wav.scp file,It contains pipe commands like the following:
ui23faz_0101 /usr/bin/sox /path/ui23faz_0102/ui23faz_0102.wav -r 16000 -c 1 -b 16 -t wav - downsample |"
the kaldiio reader is not working. Does kaldiio not support such wav.scp?

Windows Read Error

I'd like to read a 'scp' or ark file under windows-python, but it returns like below

C:\Users\xun\AppData\Roaming\Python\Python37\site-packages\kaldiio\utils.py:328: UserWarning: An error happens at loading "data/align.ark:589971"
  'An error happens at loading "{}"'.format(ark_name))
Traceback (most recent call last):
  File "C:\Users\xun\AppData\Roaming\Python\Python37\site-packages\kaldiio\utils.py", line 325, in __getitem__
    return self._loader(ark_name)
  File "C:\Users\xun\AppData\Roaming\Python\Python37\site-packages\kaldiio\matio.py", line 204, in load_mat
    return _load_mat(fd, offset, slices, endian=endian, as_bytes=as_bytes)
    array = read_kaldi(fd, endian)
  File "C:\Users\xun\AppData\Roaming\Python\Python37\site-packages\kaldiio\matio.py", line 343, in read_kaldi
    array, size = read_ascii_mat(fd, return_size=True)
  File "C:\Users\xun\AppData\Roaming\Python\Python37\site-packages\kaldiio\matio.py", line 536, in read_ascii_mat
    array = np.loadtxt(StringIO(string), dtype=dtype, ndmin=ndmin)
  File "C:\Users\xun\AppData\Roaming\Python\Python37\site-packages\numpy\lib\npyio.py", line 1141, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "C:\Users\xun\AppData\Roaming\Python\Python37\site-packages\numpy\lib\npyio.py", line 1065, in read_data
    % line_num)
ValueError: Wrong number of columns at line 2

It seems like while read_ascii_mat, we got error strings. I print it out and find it contains 2 lines:
the first line is the feat above, the second string is what we want

117 117 117 117 117 117 117 117 69 69 69 69 81 81 81 81 81 107 107 107 107 107 107 65 65 65 65 65 65 65 65 65 65 65 65 38 38 38 38 38 38 38 38 38 16 16 16 16 16 16 16 16 44 44 44 44 44 44 44 44 65 65 65 65 65 65 65 65 65 65 65 34 34 34 34 34 34 34 34 34 34 34 34 34 94 94 94 94 94 94 94 94 94 94 94 94 94 94 
94 94 119 119 119 119 119 119 119 39 39 39 39 39 39 39 39 39 19 19 19 19 19 19 19 43 43 43 43 65 65 65 65 65 65 65 65 65 30 30 30 30 30 30 30 30 30 30 30 
30 30 30 30 30 30 30 16 16 16 16 16 16 16 16 16 16 16 87 87 87 87 87 118 118 118 118 118 118 118 118 37 37 37 37 37 37 66 66 66 66 66 66 66 66 66 66 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
ast100 83 83 83 83 83 83 83 83 83 83 83 83 83 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 85 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 117 117 117 117 117 117 117 117 117 117 69 69 69 69 69 69 61 61 61 61 61 61 61 61 61 61 61 61 61 61 46 46 46 46 46 46 46 46 46 65 65 65 65 65 65 65 65 38 38 38 38 38 38 19 19 19 19 19 19 24 24 24 24 24 24 24 78 78 78 78 78 78 78 78 78 78 78 78 78 36 36 36 36 36 36 36 36 36 36 36 36 81 81 81 37 37 37 37 37 37 32 32 32 32 32 32 32 32 32 32 32 32 32 32 16 16 16 16 
16 16 16 44 44 44 44 44 44 44 44 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 120 120 120 120 120 120 120 22 22 22 22 22 65 65 65 65 64 64 64 64 64 24 24 24 24 24 24 24 24 119 119 119 119 119 119 119 4 4 4 4 4 4 4 4 66 66 66 63 63 63 63 63 63 63 63 24 24 24 24 24 24 24 24 24 24 24 24 24 24

kaldiio.load_mat to contiguous numpy array

Hi, I am encountering a problem about the numpy array loaded by the kaldiio.load_mat function.

The loaded numpy array is not contiguous. Are there any option parameters to load to be contiguous array.

Error reading from absolute Windows path

Hi,
thank you for your great tool! Unfortunately I have issues reading kaldi matrices / archives from absolute windows pathes, e.g.

my_mat = kaldiio.load_mat(r"C:\temp\my.mat")

The following error occurs:

  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\kaldiio\matio.py", line 232, in load_mat
    ark, offset, slices = _parse_arkpath(ark_name)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\site-packages\kaldiio\matio.py", line 275, in _parse_arkpath
    offset = int(offset)
ValueError: invalid literal for int() with base 10: '\\temp\\my.mat'

That happens because the absolute path is splitted at ":", to separate the path and offset in kaldi archives. Of course, in this example, "\temp\my_mat" is no valid integer offset.
Changing path using os.chdir and using the filename works, but of course this is a bad solution.
Is there any workaround for this (or maybe I use it wrong)?

UnicodeDecodeError when load cmvn.ark

Thanks for your tool which spares me really a lot of time working with kaldi files.

Recently I met a problem which I cannot solve. I firstly use the kaldi command apply-cmvn to compute the statistics of my dataset, and that generates cmvn.ark. I want to see the values in this file, but when I use

data = kaldiio.load_ark('cmvn.ark')
for k, v in data:
    # do something

I met this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/espnet/lib/python3.8/site-packages/kaldiio/matio.py", line 348, in load_ark
    array = read_kaldi(fd, endian)
  File "/opt/anaconda3/envs/espnet/lib/python3.8/site-packages/kaldiio/matio.py", line 441, in read_kaldi
    array = read_ascii_mat(fd)
  File "/opt/anaconda3/envs/espnet/lib/python3.8/site-packages/kaldiio/matio.py", line 588, in read_ascii_mat
    char = fd.read(1).decode(encoding=default_encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 0: unexpected end of data

However, this problem doesn't exist when loading feats.ark or other files. It seems it's only cmvn.ark that has this problem. I thought this file was broken, but I succeeded in using it by further kaldi commands such as dump.sh to obtain the normalized features.

Does this file have other formats than feats.ark? Do I have to pass extra arguments to load_ark? I am no expert in kaldi, so I'd be grateful if you have any hints.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.