Code Monkey home page Code Monkey logo

Comments (4)

riggsd avatar riggsd commented on August 17, 2024

Below are the major options I've considered so far to address the issue of multiple species.

All examples are for a hypothetical recording which contains primarily an Epfu, but with a Mylu also present in the recording.

A. Do nothing

A single recording is theoretically a single "bat pass", and thus we apply at most one species label to a recording.

Species Auto ID: Epfu

This is the approach that every software autoclassifier in existence today currently takes. Currently end users are forced to invent their own conventions for specifying multiple species in a recording (eg. using a species label like Epfu+Mylu, or duplicating lines in a spreadsheet to make one file count as two), or else they ignore the presence of additional species present beyond the most "dominant".

By doing nothing (Auto Species ID and Manual Species ID continue to allow no more than one species label) we continue to support the abstraction that one recording equals one bat pass.

B. Allow multiple comma-separated species labels

This is the closest match to reality... but it explodes the complexity of reporting, visualization/organization of data, and adds a significant complication to all software. Shakespeare might reply "striving to better, oft we mar what is well".

Species Auto ID:  Epfu, Mylu

Details to iron out for this implementation include:

  • Are duplicate labels allowed (presumably representing multiple individuals or multiple passes)?
  • Is order meaningful, and if so how? Is the first species in the list the first chronologically, or the most dominant? In the case of two bats of equal dominance and simultaneous appearance, how do we choose?

C. Designate a primary and a secondary species field, no more.

Everything is almost as it is today, with the simple addition of the secondary species field. All other bat-related metadata fields refer to the primary species. The primary species is assumed to be dominant in the call, or "most significant" where significance() is intentionally undefined by the metadata standard. This approach lets us note when another bat is present, while still vastly simplifying the day-to-day working with recordings.

Species Auto ID:  Epfu
Species Auto ID 2nd:  Mylu

Other considerations

While there are currently no top-level "bat-related" metadata fields, these may be vendor-specified fields, and many may be likely candidates for future definition. Examples include Fc (characteristic frequency), F high, F low, Slope Total, Slope Upper, Slope Lower, TBC (time between calls), Duration, etc.

With any of the above options which allow specifying multiple species present, do we then allow/require also specifying multiple values for bat-related metadata values? For example, multiple comma-separated Fc values corresponding to the multiple Species Auto ID values?

Species Auto ID:  Epfu, Mylu
Fc:  27.2, 41.1

In the case of a Primary/Secondary species scheme, would it be more appropriate to duplicate the fields (eg. Fc and Fc Secondary)...

Species Auto ID:  Epfu
Fc:  27.2
Species Auto ID 2nd:  Mylu
Fc 2nd:  41.1

...or to exclusively list values for the primary species (such that the secondary species label was simply to denote the presence of an additional bat without providing detailed information about it)?

Species Auto ID:  Epfu
Species Auto ID 2nd:  Mylu
Fc:  27.2

from guano-py.

cjcorben avatar cjcorben commented on August 17, 2024

In my view, option B is the only one which makes any sense. It is not uncommon to get 5 different species in a single file and there is no way to avoid that, other than by very complex strategies to separate out different contributions from different bats. Furthermore, it is common to get more than one individual of a species in a file. Often, different individuals or different species overlap in time in a single file and it can be very difficult or impossible to assign individual pulses to different individuals, though it is a lot easier to be confident there are more than one individual. In AnalookW, in reponse to users requests, I made it impossible to have more than one label for a single species in a Species field, and immediately discovered I had violated several commonly use cases, so I made it possible to choose between enforcing just one label per species or alternatively allowing more than one. Both strategies have their merits. In AnalookW, the total species field is limited to 50 characters, but presumably in Guano you could avoid imposing such limits.

from guano-py.

riggsd avatar riggsd commented on August 17, 2024

It is not uncommon to get 5 different species in a single file and there is no way to avoid that, other than by very complex strategies to separate out different contributions from different bats. Furthermore, it is common to get more than one individual of a species in a file.

Chris, it sounds like you are a proponent of multiple species labels, including duplicate values.

Species Manual ID:  Epfu, Mylu, Mylu

(Where the above example represents a recording which contains an Epfu and two individual Mylu.)

If we were to go this route, it wouldn't necessarily mean that a writing implementation must allow the end user to select multiple instances of a single species label, nor even that a writing implementation must allow the end user to select multiple species labels. But it would mean that reading implementations must be prepared for the possibility of encountering both these cases, and must have a strategy for dealing with it when presenting and reporting on data.

Thanks for your feedback!

from guano-py.

cjcorben avatar cjcorben commented on August 17, 2024

Personally, I think there is little merit in labelling on the fly (ie at recording time). A voice comment is a much more powerful way of dealing with the need. In a voice comment, you can easily deal with far more information and far more variability than a label can cope with. The way the Walkabout does this is pretty good. The voice comment is easily stored with the bat files and will always be accessible to the user, plus the size is negligible compared to bat files. But there are situations where labelling could be useful (and not necessarily for species) and I don't see any need to limit its access. If you encounter the same label more than once, there is always a risk the additional labels were accidental. This is why I originally stopped it happening in AnalookW. For the most part, labelling should be an exercise for post recording analysis. There is merit in being able to explain why you thought a bat was a particular species at record time, when the total experience is still fresh in the user's mind. but that will be much better (and much more rapidly) conveyed by voice.

from guano-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.