riggsd / guano-py Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 4.0 130 KB

Python reference implementation of the GUANO bat acoustics metadata specification

Home Page: http://guano-md.org

License: MIT License

Python 98.69% Makefile 1.31%

anabat bat-acoustics bat-detector bats guano metadata

guano-py's People

Stargazers

Watchers

Forkers

szewczak colintalbert parsingphase abfleishman

guano-py's Issues

Type of Loc Accuracy is incorrect

A small thing, but I think the type of Loc Accuracy should be float, not int, per the spec here: https://www.wildlifeacoustics.com/SCHEMA/GUANO.html. That's at guano.py line 170.

This discrepancy means that in strict mode, the parser fails to parse this field, and throws an exception. This became an issue in real life when I couldn't parse out a real wav file with this guano data:

GUANO|Version: 1.0
Species Manual ID: MyoBra/MyoMys,MyoDau,MyoNat
Loc Position: (60.805893333, 24.600683333)
Loc Accuracy: 40.02338521414699
Anabat|CallType: Matala

Perhaps a further refinement would be to have the parser fail open even in strict mode, so that a single bad field doesn't prevent other fields from being parsed? And possibly, make the default mode lenient? I can't imagine that clients typically want the parser to choke on a minor issue such as this.

"GUANO" Namespace and "Version" Field Must Appear First

Clarify the spec to require that GUANO|Version field must be the first to appear in the metadata block, and all GUANO namespace fields must appear before top-level fields or vendor fields.

Cannot write MD to files located on different drive

When I try to write MD to files stored on a drive different from the one used by NamedTemporaryFile, I get a OSError: [WinError 17] The system cannot move the file to a different disk drive:

see: https://stackoverflow.com/questions/21116510/python-oserror-winerror-17-the-system-cannot-move-the-file-to-a-different-d

Could https://github.com/riggsd/guano-py/blob/master/guano.py#L489 be changed to use shutil.move?

This corrected the issue for me. I'd be happy to submit a pr.

Guano specification

The link http://guano-md.org/ appears to 404.

Is the documentation now elsewhere?

Martin

GUANO specification not correctly implemented in ELEKON BatExplorer 2.1

Due to a buggy timestamp creation, Elekon BatExplorer 2.1 (https://www.batlogger.com/en/downloads/batexplorer/software/be_2.1/) creates GUANO wavefiles with 7-digit milliseconds. strptime() in parse_timestamp() expects 6 digits for %f:
timestamp = datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f') (line 126)
The function can't parse the timestamp and throws a ValueError. So the default guano-py cannot be used with Elekon wavefiles.

Location help please

I'm trying to use guano_edit.py to add Loc Position to a WAV file that contains other GUANO metadata but I'm getting errors. It is probably a Python NOOB issue but

guano_edit.py 'Loc Position: (1, 1)' 20231102_210205_NoID.wav produces an error
guano_edit.py 'Loc Position: (1, 1)' 20231102_210205_NoID.wav
{'Loc Position': '(1, 1)'}

20231102_210205_NoID.wav
Traceback (most recent call last):
File "/home/user/.local/bin/guano_edit.py", line 100, in
main()
File "/home/user/.local/bin/guano_edit.py", line 96, in main
update(gfile, md, dry_run=DRY_RUN)
File "/home/user/.local/bin/guano_edit.py", line 75, in update
print(gfile.to_string())
File "/home/user/.local/lib/python3.10/site-packages/guano.py", line 418, in to_string
v = self._serialize(k, v)
File "/home/user/.local/lib/python3.10/site-packages/guano.py", line 228, in _serialize
return serialize(value)
File "/home/user/.local/lib/python3.10/site-packages/guano.py", line 177, in
'Loc Position': lambda value: '%f %f' % value,
TypeError: must be real number, not str

I have tried other types of Position formats with the same error.
What is the correct format please?

Voice Notes, Thumbnail Photos, and Large Binary Values

Devices targeted at active recording typically allow the user to record an audible-range voice note to accompany the ultrasonic bat recording. The spec currently doesn't define a top-level field for voice note; should we do so?

While voice notes are likely of a lower samplerate (eg. 44.1kHz), they may be even longer in duration than the actual bat recording, so the voice note could easily exceed a few mb in size (60 seconds of 16-bit 44.1kHz mono .WAV is ~5mb in size).

Should voice notes be embedded as a base64 binary field value?

Should we instead define a second chunk gbin for housing large binary "attachments", then reference them with a "pointer" inside the main guan chunk? With this strategy, reading implementations won't need to allocate memory and resources for reading these potentially large attachments unless they recognize that they want to. Additionally, by storing pure binary data in gbin a writing implementation won't need to base64 encode the data.

This issue applies not only to voice notes, but also thumbnail images (of rendered spectrogram, etc.), or any other "large" metadata value.

Read or write Guano MD from zero cross/Anabat format files

I've played around with this a bit, but it seems fairly complex and maybe something someone else has already solved. The code from the Zcant package appears related but doesn't quite seem to work for this for me.

I'd be happy to work on this if it makes sense in the current package.

Timestamp displays as None from WA guano data in .wav (guano_dump.py)

Timestamp from Wildlife Acoustics Echo Meter Touch 2 (iPhone app) is shown as None in guano_dump output:

(venv) root@be7e71f34666:/var/www/webroot/media/sessions/Session 20190430_210036# guano_dump.py PIPPYG_20190430_210244.wav | grep Timestamp
Timestamp: None
(venv) root@be7e71f34666:/var/www/webroot/media/sessions/Session 20190430_210036# tail -c800 PIPPYG_20190430_210244.wav
[BINARY STRIPPED]guan/GUANO|Version: 1.0
Firmware Version: App 2.7.7
Length: 15.00
Loc Position: 51.41691 -0.0671
Loc Elevation: 55.62311
Make: Wildlife Acoustics
Model: Echo Meter Touch 2
Original Filename: 20190430_210244.wav
Samplerate: 256000
Serial: E2B02356
Species Auto ID: PIPPYG
Species Manual ID:
Timestamp: 2019-04-30 21:02:44+0100
Note:
WA|Echo Meter|Auto ID: PIPPYG
WA|Song Meter|Audio settings: [{"prefix":"Yamato","trig max len":"15.00","rate":"256000","trig window":"3.00","trig min freq":"30000.00","trig level":"2.00","trig max freq":"128000.00","gain":"0.00"}]wamd�Echo Meter Touch E2B02356     App 2.7.72019-04-30 21:02:44+0100WGS84,51.41691,-0.0671,55.62311

Version info from Pipfile.lock:

    "guano": {
        "hashes": [
            "sha256:a913804a1844866d74eab3b38ec5b7d7759d8ad6238e6d90b88bfa4fd4e4b235",
            "sha256:fff5b8a63e713d47c0d8bb438bee989f2afb15c2a5aa95abd6e664f64786279d"
        ],
        "index": "pypi",
        "version": "==1.0.12"
    },

I note while processing this timestamp programatically in GuanoFile(filepath).item() that it is read from the guano data, but appears to have no timezone set, which causes a crash if you them call datetime.isoformat() on it. I would guess that parsing of this field is not setting a valid tz.

Happy to provide the wav file for testing; can also look at patching this myself at a later date once I'm more familiar with the library

guano_edit.py Templates Don't Support Namespaces

Using a namespaced field or a field containing a space character within a template substitution causes guano_edit.py to fail with the following exception:

Traceback (most recent call last):
  File "./bin/guano_edit.py", line 95, in <module>
    main()
  File "./bin/guano_edit.py", line 91, in main
    update(gfile, md, dry_run=DRY_RUN)
  File "./bin/guano_edit.py", line 69, in update
    gfile[k] = Template(v).substitute(gfile)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py", line 176, in substitute
    return self.pattern.sub(convert, self.template)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py", line 173, in convert
    self._invalid(mo)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py", line 146, in _invalid
    (lineno, colno))
ValueError: Invalid placeholder in string: line 1, col 1

Examples:

$> guano_edit.py 'Species Auto ID: ${SB|AutoID} foo/bar/*.wav

$> guano_edit.py 'Note: Recorded at ${Loc Position} by me.' foo/bar/*.wav

The class attribute string.Template.idpattern is a regular expression responsible for validating template patterns.

Remove `GUANO|Size` Field

The GUANO|Size field is at best redundant, and could possibly be in conflict with reality. Remove it from the spec.

The original use case is that implementations may wish to pre-allocate a large fixed size metadata block so that metadata may be edited without having to rewrite the other RIFF chunks in the file.

The recommended way to do this is to:

Write initial metadata, but pad it out to a total of 2048 bytes (for example) with whitespace.
Whenever editing the file, read the size of the guan RIFF chunk. If the UTF-8 rendered metadata block is less than 2048 bytes, go ahead and write it starting at chunk offset 0x00, and continue padding with whitespace out to 2048 total bytes written.
If the UTF-8 rendered metadata block is greater than 2048 bytes, you must rewrite the entire .WAV file since your edited metadata will not fit in the preallocated "guan" chunk. Consider increasing its size by double so that subsequent edits won't overflow this newly allocated block.

Multiple Species

The GUANO 1.0 specification defines Auto Species ID and Manual Species ID fields, which allow specifying a single species label. The abstraction that a single recording equals a single "bat pass" doesn't match reality, and we frequently record multiple individuals - including multiple species - in a single recording.

How should GUANO metadata handle the case of multiple species present within a single recording?

Visualization of bat calls

This might be well out of scope for the quano-py package but I'll share anyway. In working with bat wav files and introspecting them with guano-py I wanted to have some way to visualize their contents.

Would a plot function for a GuanoFile object be useful to others? I'm kinda thinking something like this:

`g = GuanoFile("bat.wav")

g.plot()
`
Which would produce the following matplotlib figure:

This would require additional dependencies (these could be optional) such as matplotlib, numpy and scipy. As such the additional complexity might not be desirable. At any rate I'll share the code I came up with for this in case others find it useful.
https://github.com/talbertc-usgs/Notebooks/blob/master/bats/VisualizingGuanoData.ipynb

I'd be happy to submit a pr for this. I'd also be happy to either start or contribute to a different package that had a collection of useful bat data utilities or applications.

Also, I'm relatively new to visualizing sound files and bat data, so suggestions on the plot are welcome.

Add support for Wildlife Acoustics W4V format

Would it be possible to add support for the Wildlife Acoustics W4V format? It uses guano meta out of the box. But its not possible to edit with guano-py

Error reading WAV from echometer touch

I have some WAV files from an EchoMeter TouchPro that fail to load as a GuanoFile (see: EPTFUS_20210617_213005.zip ). Any help would be appreciated.

import guano
path = r"\\nas3\NAS8_13Jan20\Sounds\read_metadata_test\WILDLIFE ACOUSTICS EM-TouchPRO\EPTFUS_20210617_213005.wav"
gf = GuanoFile(path)

Traceback (most recent call last):

File "", line 1, in
GuanoFile(path)

File "", line 215, in init
self._load()

File "", line 285, in _load
self._parse(metadata_buf)

File "", line 307, in _parse
self._md[namespace][key] = self._coerce(full_key, val)

File "", line 221, in _coerce
return self._coersion_rules[key](value)

ValueError: could not convert string to float:

Persist Other .WAV Chunks

GuanoFile.write() currently writes the fmt_, data, and guan chunks to an output file, but any other chunks are excluded.

We should (optionally?) persist all original chunks in the saved file.

Multiple Species Labels

The GUANO spec says that Species Auto ID and Species Manual ID are a "list of strings".

The serialized value should be comma-separated:

Species Manual ID: Mylu, Epfu

The Python library currently returns an str value, which in the above case would be "Mylu, Epfu".

This means that library users would need to split it and strip it to know how many and which species labels were present.

It means that they might need to parse it and reassemble it in order to append a new value. (The spec doesn't actually state that values must be unique, but for most use cases that's probably desired.)

Therefore I think we should set up coercion/serialization for these Species fields so that they return a Python list, as follows:

Species field not present: md.get("Species Auto ID") -> None and "species Auto ID" in md == false
Species field empty/blank: md["Species Auto ID"] -> []
Species field has a single species label: md["Species Auto ID"] -> ["Mylu"]
Species field has multiple species labels: md["Species Auto ID"] -> ["Mylu", "Epfu"]

The spec defines these fields as optional, so the way to check whether a recording is a Mylu recording would be:

if "Mylu" in md.get("Species Auto ID", []): ...

It would be cleaner and more intuitive if these two fields were required, but Timestamp is the only required field at this time.

Another approach to making it slightly cleaner would be adding explicit accessor methods for all of the well-known fields, where in could still check whether the field was present, and def species_auto_id(self) -> list would always return a list, possibly empty.

error running setup.py

I am trying to run python setup.py develop and I am getting the error below. I assume I have some issue with my environment, but I was able to run it after deleting the content of README.rst so maybe a bug?

Re-Add Wildlife Acoustics Namespace

The WAC namespace was removed in d2e65ac, re-add it to the specification doc.

Guano package no longer able to extract metadata

Unfortunately, Guano module (1.0.14) no longer seems to be able to properly parse the files generated by Wildlife Acoustics devices such as SM4BAT.

Error:

Traceback (most recent call last):
  File "/test.py", line 11, in <module>
    print(g['Prefix'])
  File "/usr/local/lib/python3.10/dist-packages/guano.py", line 349, in __getitem__
    return self._md[namespace][key]
KeyError: ''

Attaching a test case for testing:
testcase.zip

Module Version: 1.0.14
Python: 3.10

Clarify Standard Version vs. Python Lib Version

Ensure that we clearly state the GUANO specification version in the actual spec and changelog, and clearly denote that the Python reference implementation's versioning is independent.

Support Python 3

Look Into Filesystem-based Metadata Cacheing

Idea: SQLite3 database or fast key-value database like berkeleydb named .guano.py.cache with index of (filename, filesize, timestamp, hash).

Hash should be a fast non-cryptographic function like crc32, md5, sha1, xxHash.

If we determine that the file hasn't changed, load metadata from cache.

Would this be significantly faster given that we'd need to do full file reads to compute hash? Is (filename, filesize, timestamp) sufficient without a hash?

Multi-Line String Escaping

Because the '\n' character is used as a field delimiter, the spec defines the two-character string "\n" as an embedded newline within a field value. Without further escaping rules, this makes it impossible to intentionally use those characters together.

For example, a field may have a Windows filesystem path as value, C:\bat_calls\new\, but a newline would inadvertently be inserted between C:\bat_calls and new\.

Should we then define the two-character string "\\" to represent the character '\'? The above example would then need to be encoded as C:\\bat_calls\\new\\.
It should also be decided if these special escape characters apply to all string values, or exclusively to fields which specify "multi-line string" values?

For what it's worth, escaping a single token like "\n" is simple; simultaneously escaping several tokens, especially tokens containing the '' character (which itself escapes special characters in C-based languages), is non-trivial.

Fail on Raw D1000X Files

Failure when parsing raw Pettersson D1000X recordings:

Traceback (most recent call last):
    gfile = GuanoFile(fname, strict=False)
  File "/Users/driggs/workspace/guano-py/guano.py", line 202, in __init__
    self._load()
  File "/Users/driggs/workspace/guano-py/guano.py", line 256, in _load
    raise ValueError(e)
ValueError: unpack requires a string argument of length 4

This may be because the RIFF chunk size at offset 0x04 is incorrect: it lists the entire filesize in bytes, when the value should be filesize - 8. No... it looks like the files include dozens of kilobytes of null padding after the correctly-declared data chunk. Is this even legal for a RIFF file?? Our chunk parser is reading \00\00\00\00 as the chunkid with chunksize 0, over and over and over again

riggsd / guano-py Goto Github PK

guano-py's People

Stargazers

Watchers

Forkers

guano-py's Issues

Recommend Projects

Recommend Topics

Recommend Org