Code Monkey home page Code Monkey logo

Comments (92)

taruti avatar taruti commented on August 18, 2024 3

Typically when using audio my needs have been:

  1. Read from input source (typically system IO + slice of []int16 or []float32)
  2. Filter&downsample&convert to preferred internal format (typically []float32)
  3. Do all internal processing with that type (typically []float32)
  4. Maintain as little latency as possible by keeping cpu and memory allocation (and with that GC) in check

from audio.

taruti avatar taruti commented on August 18, 2024 3

math package (trigonometric, logarithmic, etc) with float32 and SIMD optimization for any data type are two different things. In many cases just mult/add/sub/div are needed and for those package math is not needed.

I think that math32 and SIMD are best kept separate from this proposal.

If we are thinking of performance then conversions of buffers without needing to allocate can be important. For example have one input buffer and one output buffer for the conversion. Instead of allocating a new output buffer each time.

from audio.

mewmew avatar mewmew commented on August 18, 2024 1

There are many audio libraries in Go, would it help to look at them and see what actual minimal definitions would help e.g. when switching from one library to an another?

Just to add to the list. There exist a FLAC decoder written in Go at github.com/mewkiz/flac with a front-end decoder for the audio decoder interface defined by Azul3D at github.com/azul3d/engine/audio/flac.

My brother and I have begun implementing a front-end for the go-audio/audio interface, and it should not proove to be too difficult.

In general, I'd recommend anyone who hasn't already to take a look at the audio interface defined by Azul3D. It takes great inspiration from the image.Image package, and does provide some inspiration for how an audio API in Go may look like.

@slimsag, @karlek Feel free to join the discussion here if you have any input : )

from audio.

mattetti avatar mattetti commented on August 18, 2024 1

Something particular you would like to point out in the code?

I had a hard time trying to pinpoint what bothers me. Overall, there is nothing wrong with the approach so it's quite subjective. Let me try to explain what is going through my head and we can discuss each point.

I'm not a fan of the process32, process64 methods on the 3 buffers. It doesn't help that the code isn't documented but to me process means "a series of interdependent operations carried out by computer" (as per a random dictionary definition I found). In your implementation, you don't seem to be processing anything but populating the buffer with the passed data.

There is also the fact that the process methods don't handle buffers of different sizes.

You added interfaces like:

type Closer interface {
	Close() error
}

Which already in the std lib: https://golang.org/pkg/io/#Closer

I didn't seem to find a way for a package depending on this implementation to declare a function taking any kind of buffers?

Nigel mentioned frames and his suggestion uses frames. In his context, a frame is a series of samples for the same entry in a time series. For instance a frame of a mono signal would contain 1 sample while a frame of a stereo wold contain 2 samples etc... While it can be cumbersome at times, it makes a lot of sense at other times (for instance skipping to certain position).

On the other hand what I like with the other proposal is that we get:

  1. generic interface that can be implemented different ways
  2. interface that can be implemented in a performant way
  3. very similar to the image package and its implementation

I do have some concerns managing state and implementing some of the optimizations but as of right now, I think it's worth a shot. I unfortunately don't have a lot of free time this coming weeks due to NAMM starting in a couple weeks but I'm planning on giving the implementation a try if nobody beats me to it. Of course the buffer implementation is just one of pieces of the puzzle and we will need to see how nodes can be implemented. And that second part might make us question the implemented design. But again, I think it's worth it.

from audio.

kisielk avatar kisielk commented on August 18, 2024 1

I also agree that the process* methods should probably stay off the buffers. I like types that have fewer responsibilities and so I think a buffer type should mostly just take care of carrying and describing the data. Best not to conflate the processing there as well.

Since the underlying data type of a buffer can vary I can see only a few paths for ways to develop functions that can take any kind of buffer:

  • The Buffer interface only provides functions that are descriptive of the dimensions, eg: number of frames, number of channels per frame, format, etc. A type switch is required to get a concrete buffer type from which the data can be read out.
  • The Buffer interface provides a single set of methods for reading out the data from the buffer. This would of course have to be only one data type and buffer implementations would have to deal with converting to that type.
  • The Buffer interface provides methods for reading / writing the data in multiple formats, this further complicates the implementation of the concrete types.

Are there some more options I'm missing here? Another thing worth considering is that when converting between concrete buffer types it may be desirable to use different forms of companding or dithering that may be independent of the buffer type, so maybe it's best for the conversions to be done outside the types.

@mattetti if you're heading to NAMM, I will also be there if you want to chat at some point

from audio.

egonelbre avatar egonelbre commented on August 18, 2024 1

Yup, interleaved is faster to process... I was concerned about whether using buffer[i+1] etc. would introduce a bounds check in the tight loop. But it seems it is not: https://play.golang.org/p/EkNPEjU3bS

BenchmarkInterleaved-8                  10000000               144 ns/op
BenchmarkInterleavedVariable-8           5000000               295 ns/op
BenchmarkInterleaved2-8                 10000000               151 ns/op
BenchmarkDeinterleaved-8                10000000               219 ns/op

It seems that the optimizer can deal with it nicely. Disabling bounds checks had no effect on the results. However using "variable" number of channels has a quite impact.

Still, it is more convenient to write code for deinterleaved :D. Maybe there is a nice way of automatically generate code based on one single channel example for different types and channel counts?

Superpowered has chosen to support only stereo which makes things much easier and as far as I understand it only supports float32.

from audio.

faiface avatar faiface commented on August 18, 2024 1

Hi guys,

as you could've noticed on Reddit, a month ago we've started working on an audio package in Pixel and we want to make it a separate library.

Throughout the month, we learned numerous things. One of the most important things we learned is: buffers in the high-level API are not a very good idea for real-time. Why? Management, swapping them, queueing them, unqueueing them, and so on, cost significant CPU resources. OpenAL uses buffers and we weren't able to get less than 1/20s latency, which is quite high.

Also, buffers don't work very well. They don't compose, it's hard to build nice abstractions around them. They're simply data.

In Pixel audio, we chose a different approach. We built the whole library (not that it's finished) around this abstraction:

type Streamer interface {
    Stream(samples [][2]float64) (n int, ok bool)
    Err() error
}

Streamer interface is very similar to io.Reader, except that it's optimized for audio. For full explanation and documentation of the rest of the API, read here.

The thing is, this abstraction is extremely flexible and composes very well, just like io.Reader. Also, it's suitable for real-time audio processing (you can try for yourself, the library works).

I'd like to suggest replacing current go-audio with the audio package from Pixel. I know this is a big suggestion, but I think it would be worth it for go-audio. In case you don't want to do that, I'd at least suggest you change your abstractions away from buffers and more towards this streaming approach.

Thanks!

If you have any questions, please ask!

from audio.

mattetti avatar mattetti commented on August 18, 2024 1

We definitely need to have support for surround/multichannel audio formats. Let me take a look next week and write down my thoughts and see if we can go from there.

from audio.

faiface avatar faiface commented on August 18, 2024 1

@egonelbre With WAVE, only PCM support is planned so far, as PCM seems to be close to 100% of the existing usage. Currently supported are u8 and s16, but float32 support will be added shortly. I believe these cover an overwhelming majority of what's out there.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

The link to the alternate design: https://github.com/egonelbre/exp/tree/master/audio

101 of real-time audio http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing

from audio.

mattetti avatar mattetti commented on August 18, 2024

@egonelbre would you mind squashing your commits for the proposal or maybe send a PR. GitHub really makes it hard to comment on different part of the code coming from different commits :(

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@mattetti sure no problem.

Say you are designing a sample-based synthesizer (eg: Akai MPC) and your project has an audio pool it is working with. You'll want to be storing those samples in memory in the native format of your DSP path so you don't have to waste time doing conversions every time you are reading from your audio pool.

@kisielk sure, if you have such sample-based synth you probably need to track what notes are playing, etc. anyways so you would have a Synth node that produces float32/float64, i.e. you pay the conversion per synth not per sample. It's not as good as no conversion, but it just means you can have one less-effect overall for the same performance.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@mattetti Here you go: egonelbre/exp@81ba19e

from audio.

kisielk avatar kisielk commented on August 18, 2024

Yes but the "synth" is not going to be limited to one sample, usually you have some number of channels, say 8-16, and each one can choose any part of any sample to play at any time. In my opinion processing audio in float64 is pretty niche, relegated to some high precision or quality filters which aren't commonly used. Even in that case, the data can be converted to float64 for processing just within that filter block, there's little reason to store it in anything but float32 otherwise. Even still most DSP is performed using float32 even on powerful architectures like x86, reason being that you can do twice as much with SIMD instructions in that case.

Of course I'm totally fine with having float64 as an option for a buffer type when appropriate, but I believe that float32 should be on par. I feel like it would certainly be the primary format for any real-time applications. Even for batch processing you are likely to see performance gains from using it.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@kisielk Yes, also, for my own needs float32 would be completely sufficient.

Forums seemed to agree that in most cases float64 isn't a signifcant improvement. However, if one of the intended targets will be writing audio plugins; then many plugin API-s include float64 version (e.g. VST3) and DAW-s have an option to switch between float32 and float64.

I agree that, if only one should be chosen then float32 seems more suitable. (Although. I don't think I have the full knowledge of audio processing to definitively say it.) The only argument for float64 is that math package works on float64. So only using float32 means there is a need for math32 package.

from audio.

mattetti avatar mattetti commented on August 18, 2024

I agree that float32 is usually plenty enough but as mentioned my problem is that the Go math package is float64 only. Are we willing to reimplement the math functions we need? It might make sense if we start doing asm optimizations but that's quite a lot of work.

from audio.

kisielk avatar kisielk commented on August 18, 2024

Again, I don't think it's a binary choice, I just think that both should have equal support within the API. And yes, if I was using Go for realtime processing of audio I would definitely want a 32-bit version of the math package. I don't think the math package needs to dictate any limitations on any potential audio API.

from audio.

mattetti avatar mattetti commented on August 18, 2024

@kisielk sounds fair, just to be clear, would you be interested in using Go for realtime processing or at least giving it a try? You obviously do that for a living using C++ so your expertise would be invaluable.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

Are we willing to reimplement the math functions we need?

How much math functions are needed in practice? Initially the package could be a wrapper around math to make it more convenient and then start optimizing the bottlenecks. I never needed more than sin/cos/exp/abs/rand; but I've never done anything complicated either.

I suspect some of the first bottleneck and candidate for "asm optimized" code will be []int16->[]float32 conversion, buffer multiplication and/or addition two buffers together.

from audio.

kisielk avatar kisielk commented on August 18, 2024

@mattetti that is something I'm definitely interested in. I'm not exactly a DSP expert, but I work enough with it day to day to be fairly familiar with the domain.

@egonelbre Gain is also a big one that benefits from optimization. (edit: maybe that's what you meant by buffer multiplication, or did you mean convolution?)

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@kisielk yeah, I meant gain :), my brains language unit seems to be severely malfunctioning today.

from audio.

kisielk avatar kisielk commented on August 18, 2024

@taruti +:100:

from audio.

kisielk avatar kisielk commented on August 18, 2024

Speaking of conversion between buffers, I think it's important the API has a way to facilitate conversion between buffers of different data types and sizes without allocation (eg: 2 channels to 1, etc). The actual conversion method would be determined by the application but at least the API should be able to help facilitate this without too much additional complexity.

from audio.

mattetti avatar mattetti commented on August 18, 2024

Alright, here is my suggestion. I'll add you guys to the organization and we can figure out an API for real time processing and from there see how it works for offline. Ideally I would love to end with:

  • a generic audio API (what we are discussing here)
  • a list of codecs (I started with wav and aiff, they still need work and refinement but they work)
  • a set of transforms (gain, dynamics, eq, lfos)
  • analyzers (FFT and things like chromagrams, key, onset detectors...)
  • generators

@rakyll and I also discussed adding wrappers to things like CoreAudio on Mac so we could have an end to end experience without having to rely on things like portaudio. This is outside of the scope of what I have in mind but I figured I should mentioned it.

I like designing APIs against real usage, so maybe a first good step is to define an example we would like to build and from there define the components we need. Thoughts?

from audio.

kisielk avatar kisielk commented on August 18, 2024

That sounds like a good idea to me. However I would propose we limit the scope of the core audio package to the first two points (and perhaps a couple of very general utilities from point 3). I feel like the rest would be better suited for other packages. My main reasoning behind this is that I feel like the first two items can be achieved (relatively) objectively and there can be one canonical implementation. As you go down the list it becomes increasingly application-dependent.

from audio.

mattetti avatar mattetti commented on August 18, 2024

I think the audio API should be in its own package and each of those things in separate packages. For instance I have the wav and aiff packages isolated. That's another reason why having a GitHub organization is nice.

from audio.

kisielk avatar kisielk commented on August 18, 2024

Just noticed that when looking at the org page. Looks good to me 👍

from audio.

nigeltao avatar nigeltao commented on August 18, 2024

There's the original proposal. @egonelbre has an alternative proposal. Here are a couple more (conflicting) API ideas for a Buffer type. I'm not saying that either of them are any good, but there might be a useful core in there somewhere. See also another API design in the github.com/azul3d/engine/audio package.

Reader/Writer-ish:

type Buffer interface {
	Format() Format

	// The ReadFrames and WriteFrames methods are roughly analogous to bulk
	// versions of the Image.At and Image.Set methods from the standard
	// library's image and image/draw packages.

	// ReadFrames converts that part of the buffer's data in the range [offset
	// : offset + n] to float32 samples in dst[:n], and returns n, the minimum
	// of length and the number of samples that dst can hold.
	//
	// offset, length and n count frames, not samples (slice elements). For
	// example, stereo audio might have two samples per frame. To convert
	// between a frame count and a sample count, multiply or divide by
	// Format().SamplesPerFrame().
	//
	// The offset is relative to the start of the buffer, which is not
	// necessarily the start of any underlying audio clip.
	//
	// The n returned is analogous to the built-in copy function, where
	// copy(dst, src) returns the minimum of len(dst) and len(src), except that
	// the methods here count frames, not samples (slice elements).
	//
	// Unlike the io.Reader interface, ReadFrames should read (i.e. convert) as
	// many frames as possible, rather than returning short. The conversion
	// presumably does not require any further I/O.
	//
	// TODO: make this return (int, error) instead of int, and split this into
	// audio.Reader and audio.Writer interfaces, analogous to io.Reader and
	// io.Writer, so that you could write "mp3.Decoder(anIOReader)" to get an
	// audio.Reader?
	ReadFrames(dst []float32, offset, length int) (n int)

	// WriteFrames is like ReadFrames except that it converts from src to this
	// Buffer, instead of converting from this Buffer to dst.
	WriteFrames(src []float32, offset, length int) (n int)
}

type BufferI16 struct {
	Fmt  Format
	Data []int16
}

type BufferF32 struct {
	Fmt  Format
	Data []float32
}

Have Buffer be a concrete type, not an interface type:

type Buffer struct {
	Format Format

	DataType DataType

	// The DataType field selects which slice field to use.
	U8  []uint8
	I16 []int16
	F32 []float32
	F64 []float64
}

type DataType uint8

const (
	DataTypeUnknown DataType = iota
	DataTypeU8_U8
	DataTypeU8_I16BE
	DataTypeU8_I16LE
	DataTypeU8_F32BE
	DataTypeU8_F32LE
	DataTypeI16
	DataTypeF32
	DataTypeF64
)

from audio.

mattetti avatar mattetti commented on August 18, 2024

In addition, here is another comment from @nigeltao about the math library:

As for a math32 library, I'm not sure if it's necessary. It's slow to call (64-bit) math.Sin inside your inner loop. Instead, I'd expect to pre-compute a global sine table, such as "var sineTable = [4096]float32{ etc }". Compute that table at "go generate" time, and you don't need the math package (or a math32 package) at run time.

I really like this idea which can also apply to log. It might come at an extra memory cost but I am personally OK with that.

Let try to summarize the pros and cons of those different approaches and let's discuss what we value and the direction we want to take. I am now convinced that my initial proposal, while fitting my needs, doesn't work well in other scenarios and shouldn't be left as is.

from audio.

nigeltao avatar nigeltao commented on August 18, 2024

A broader point, re the proposal to add packages to the Go standard library or under golang.org/x, is that I think it is too early to say what the 'right' API should be just by looking at an interface definition. As rsc said on golang/go#18497 (comment): "The right way to start is to create a package somewhere else (github.com/go-audio is great) and get people to use it. Once you have experience with the API being good, then it might make sense to promote to a subrepo or eventually the standard library (the same basic path context followed)." Emphasis added.

The right way might actually involve letting a hundred API flowers bloom, and trying a few different APIs before making a push for any particular flower.

I'd certainly like to see more experience with how audio codecs fit into any API proposal: how does the Buffer type (whatever it is) interact with sources (which can block on I/O, e.g. playing an mp3 stream over the network) and sinks (which you don't want to glitch)?

WAV and AIFF are a good start, but handling some sort of compressed audio would be even better. A full-blown mp3 decoder is a lot of work, but as far as kicking API tyres, it might suffice to write a decoder for a toy audio codec where "c3d1e3c1e2c2e4" decoded to "play a C sine wave for 3 seconds, D for 1 second, E for 3 seconds, etc", i.e. to play a really ugly version of "doe a deer".

from audio.

nigeltao avatar nigeltao commented on August 18, 2024

Back on API design brainstorming and codecs, there might be some more inspiration in the golang.org/x/text/encoding/... and golang.org/x/text/transform packages, which let you e.g. convert between character encodings like Shift JIS, Windows 1252 and UTF-8.

Text encodings are far simpler than audio codecs, though, so it might not end up being relevant.

from audio.

kisielk avatar kisielk commented on August 18, 2024

Some more API inspiration, from C++:

https://www.juce.com/doc/classAudioBuffer
https://www.juce.com/doc/classAudioProcessor

JUCE is one of the most-used audio processing libraries out there.

from audio.

kisielk avatar kisielk commented on August 18, 2024

Obviously the API isn't very go-like since it's C++ (and has a fair amount of pre-C++11 legacy, though is gradually being modernized) but it's worth taking a look at how they put things together.

from audio.

mattetti avatar mattetti commented on August 18, 2024

JUCE uses overloading quite heavily and as mentioned isn't very go-like (it's also a framework more than a suite of library, but it is well written and very popular). My hope is that we can come up with a more modern and accessible API instead of "port", I would really want audio in Go to be much easier for new developers. On a side note, I did port over some part of JUCE such as https://www.juce.com/doc/classValueTree for better interop with audio plugins.

from audio.

kisielk avatar kisielk commented on August 18, 2024

I'm not suggesting porting it, but I think the concepts in the library are pretty well thought out and cover most of what you would want to do with audio processing. It's worth getting familiar with. I don't think the use of overloading really matters, it's pretty easy to do that in other ways with Go.

from audio.

mattetti avatar mattetti commented on August 18, 2024

@nigeltao I agree with rsc and to be honest my goal was more to get momentum than to get the proposal accepted. I'm very happy to have found a group of motivated people who are interested in tackling the same issue.

I'll open a couple issues to discuss code styling and "core values" of this project.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@nigeltao I think my design would also benefit from a Stream/Seeker (or similar) interface, but I'm not sure what the right approach is. I will try to implement some basic "remote-streaming", to find out what is essential. I have a feeling that it could fit together with Buffer32 nicely.

from audio.

mattetti avatar mattetti commented on August 18, 2024

I really like @nigeltao proposal here #3 (comment) . I had something similar earlier: https://github.com/mattetti/audio/blob/master/pcm_buffer.go#L43 But I couldn't find a way to properly read/write the buffer, Nigel solves that nicely with:

  ReadFrames(dst []float32, offset, length int) (n int)
  WriteFrames(src []float32, offset, length int) (n int)

The part I don't understand is how do you avoid the type conversion if you aren't working in float32. Let say you want to stay in int16 or float64, what do you do? What if you worked in float32s and need to go to int16, what's the API for that?

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

I think we need to clarify the terms. Here's how I understand the core terms of audio:

  1. Buffer: uncompressed PCM buffer for processing (usually []float32 array, with some additional info)
  2. Stream (...Writer/...Reader): it is often seekable, and can sometimes only be read/written in Frames.
  3. Frame: a chunk of audio data. The Frame format/size can be different from Buffer and may need conversion. The internals of Frame depend highly on the Stream producing it. Frame-s can change mid-stream. (Buffers are equivalent to Frames in some cases)
  4. Node/Processor: processes Buffers, can have internal mutable state and potentially uses or can be an seekable/unseekable Stream
  5. Codec: takes io.Reader/io.Writer and implements a Stream, also has ways of detecting whether some io.Reader is a valid encoding
  6. OutputDevice/InputDevice: implements unseekable StreamReader/StreamWriter and needs to talk to the hardware, ideally the Buffer and Frame formats/sizes match.
  7. metadata: Stream and Frame often have more information than Buffers.

PS: these are tentative.

from audio.

briansorahan avatar briansorahan commented on August 18, 2024

This is my 2 cents as well as shameless self promotion.
I created package sc specifically so that I could make synths with Go without having to worry about whether or not it is suitable for real-time audio.
Admittedly, I hate sclang as a programming language and my use case is pretty specific. I simply want to make synths that run on my laptop which I can wire up to my MIDI controller.
I'm curious to hear why we want to do realtime audio in Go when there are so many other tools out there that do this very well (SuperCollider, ChucK, faust, Extempore, etc, etc).
What are the use cases?
If the goal is to experiment with writing dsp algorithms in your favorite programming language then more power to ya. I'm definitely curious to see what kind of realtime audio processing the Go runtime can handle without glitching, and I'm all for hands-on learning.
But if the goal is adoption among people who have a practical interest in realtime audio, I think absence of glitching is much more important than choice of programming language.

from audio.

kisielk avatar kisielk commented on August 18, 2024

Real-time audio processing isn't only about synthesis or effects for music / experimental purposes. There's lots of other practical applications, eg: VoIP

Other tools are good, but they also impose additional overhead on a project. If the rest of your application is in Go, you now need to have some way to interface with those. Building and distributing your software becomes more complicated. It's the same reason why it's preferable to have native Go code instead of linking to C libraries via cgo or calling out to external processes.

I don't see why you think glitching would be a big concern. So long as the application is able to keep up with the audio sample rate, there should be no glitches. As of Go 1.8 the typical worst case GC pause is under 100 µsec. If you avoid doing a lot of allocations, that should be even better.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

I'm curious to hear why we want to do realtime audio in Go when there are so many other tools out there that do this very well.

My main reason is writing games with dynamic audio in Go (e.g. first thing I'm going to try is adding it to https://github.com/loov/zombies-on-ice; the hammer would have pitch based on the speed and smashes are panned based on player location). I could use existing audio-libs, but it means a huge annoyance in compiling things.

I'm definitely curious to see what kind of realtime audio processing the Go runtime can handle without glitching, and I'm all for hands-on learning.

The lowest I have gotten on Windows is 512 samples ~ 11ms latency. It was pretty much the first implementation and I haven't started extensive debugging --- quite likely I'm using Windows API wrongly or should be using WASAPI or I have some stupid mistake in Go code. (https://github.com/loov/synth)

But if the goal is adoption among people who have a practical interest in realtime audio, I think absence of glitching is much more important than choice of programming language.

I agree. Any professional plugin will probably still be written in C/C++ (or whatever the state-of-the-art is).

I think the target demographic is "enthusiast level real-time audio". (This of course still means doing our best in implementing things and trying to be as "professional-level" as possible.)

from audio.

mattetti avatar mattetti commented on August 18, 2024

At work we do server side processing and analysis, we are also planning on doing more on desktop and there is little reason not to use Go. We don't currently really need real time audio but that might become a thing as we grow a good library. Go could also make it for a great language to write instruments on something like a Pi that should be plenty powerful enough to behave very well. Remember that there are DAWs written in Java ;)

from audio.

nigeltao avatar nigeltao commented on August 18, 2024

how do you avoid the type conversion if you aren't working in float32. Let say you want to stay in int16 or float64

It'd be similar to the image.Image type in the standard library, and how the image/draw package type switches for faster (or less allocate-y) code paths. An audio gain filter would take a Buffer argument (the interface type). It would type switch on some well known, common, concrete types, such as F32Buffer or I16Buffer. For an I16Buffer, it would read and write int16 values, without calling the ReadFrames or WriteFrames methods per se.

If none of the type switch cases match, it would fall back to ReadFrames and WriteFrames, possibly working with its own (possibly lazily allocated) [1024]float32 scratch space.

from audio.

mattetti avatar mattetti commented on August 18, 2024

I really like this approach, here is the Draw function code: https://golang.org/src/image/draw/draw.go?s=2824:2903#L90

It keeps the API simple, avoids a lot of duplicated methods and offers a nice fallback.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@nigeltao first version of toy audio codec in my version:

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

And... after thinking about it, I realized that when you do not expose the Stream/Device internals the code becomes much clearer. This of course doesn't mean that when you know the internals you could still use it via type switching -- alternatively Stream could have a "PreferredBuffer32"/"PreferredBuffer64" method.

from audio.

kisielk avatar kisielk commented on August 18, 2024

Do you think it's really necessary to return the number of samples processed? What's a situation where you'd want to process something smaller than the given buffer size? Also does the number indicate number of bytes, or number of frames?

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

Do you think it's really necessary to return the number of samples processed? What's a situation where you'd want to process something smaller than the given buffer size?

The main use case is when you are doing conversion of an existing audio with (large) buffers.

The alternative approach is to expose the number of frames still available, but this could get problematic with stream where you don't know that number in advance. Unfortunately, I don't know a good design that avoids it. The Frame approach could do it, but it causes some other code to be much more complicated.

The sample number is not necessary for real-time audio or audio-effects. Also, the initial version didn't have it, I added it after started working on the stream/conversion examples.

Also does the number indicate number of bytes, or number of frames?

Number of samples (== frames * number of channels). You use it like this:

n, err := node.Process32(buffer)
unprocessed := buffer.Data[n:]

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

PS: the Process32 means a generic operation read or write; e.g. it could mean reading data from a microphone.

from audio.

kisielk avatar kisielk commented on August 18, 2024

It seems like the buffer needs to have something equivalent to a len and capacity. Suppose you pass a buffer of size M to node1. It fills the buffer with N frames and returns n. Now you pass the buffer to node2. How does node2 know that it should only process N frames when the buffer is of size M?

from audio.

taruti avatar taruti commented on August 18, 2024

I feel like we might be going into this too much abstract API first.

There are many audio libraries in Go, would it help to look at them and see what actual minimal definitions would help e.g. when switching from one library to an another?

Easy things to unify could be:

  • audio sample format constants
  • how to read samples from a library audio input source in the preferred format (the available formats of a source may vary) without copying
  • how to read samples from an library audio input source with conversion
  • how to write samples to audio output library in their preferred format without copying
  • how to write samples to audio output library with conversion

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

It seems like the buffer needs to have something equivalent to a len and capacity.

Yeah, I thought about it. You would need ReadHead and WriteHead, you already have the capacity in the Buffer; playing around with cap/len would be possible, but it would make writing to the buffer more annoying.

Suppose you pass a buffer of size M to node1. It fills the buffer with N frames and returns n. Now you pass the buffer to node2. How does node2 know that it should only process N frames when the buffer is of size M?

Currently I've changed the buffer outside the node, and make a temporary "Buffer" with the same backing slice, but with different head and len. See https://github.com/egonelbre/exp/blob/master/audio/copy.go#L6

However, I agree that it is error-prone and not having to return the sample count would be cleaner. Have to experiment with that design.

from audio.

mattetti avatar mattetti commented on August 18, 2024

I would like to focus on the buffer only for now and keep the API small. So far I like Nigel proposal the best, we can remove the AsXxx methods from my proposal and get something similar to image.Image.

Once we are happy with that and have an implementation we can test it further against things like other abstractions and other existing libraries that might exist. (The game engine Nigel linked to is quite interesting)

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

I feel like we might be going into this too much abstract API first.

Something particular you would like to point out in the code?

There are many audio libraries in Go, would it help to look at them and see what actual minimal definitions would help e.g. when switching from one library to an another?

Much of my design has also involved going through multiple audio API-s and designs. Although I haven't looked at Go audio libs in particular.

However, I think the approach of trying to base a new library to improve others doesn't yield good results. The approach of "implement first" and see how it can be fit into/with other libs has given me better code. Of course, this doesn't mean everyone has to use the same approach -- i.e. different approaches show different issues.

how to read samples from a library audio input source in the preferred format (the available formats of a source may vary) without copying
how to write samples to audio output library in their preferred format without copying

See the Frame design and Nigel-s approach. But, not converting seems to add a big burden of managing all the different buffer formats.

Output/Input devices can also benefit from a callback based approach. (And many libraries have chosen that route.)

how to read samples from an library audio input source with conversion
how to write samples to audio output library with conversion

See Process32/Process64.

from audio.

mattetti avatar mattetti commented on August 18, 2024

@mewmew https://godoc.org/azul3d.org/engine/audio#Buffer is interesting but the implementation bothers me a little, especially the custom types such as https://godoc.org/azul3d.org/engine/audio#Slice

It would be interesting to see if we can make our Buffer interface and implementations compatible with azul3d's audio lib. Take a look at #3 (comment) which is a simplified version of what I proposed. I believe it addresses the main issue of having AsFloatBuffer() *FloatBuffer etc.. in the interface.

I don't see why we couldn't get azul's buffer to conform to this API too even if "our" buffer implementations would be simpler.

from audio.

briansorahan avatar briansorahan commented on August 18, 2024

@kisielk I raise the concern about glitches since typically code that is expected to run in realtime should never do any allocations. glibc malloc locks since it is thread safe, hence should never be used in realtime code. Maybe another libc would be better, I'm not sure. Totally agree on the annoyance of using external C code or co processes btw. I have plenty of complaints about SuperCollider now that I have tried to use the server without the language.
@egonelbre Looks like fun!
Thanks for the responses everyone, and hope I didn't pollute this thread too much.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

Was wondering about interface perf and did some basic benchmarks... tl;dr; overhead of interface seems to be insignificant. I did not examine whether there are cases where it could inline the direct calls.

But, using sample based callbacks is significantly worse (2x).

Raw stats:

BenchmarkInterface-8             1000000              1288 ns/op               0 B/op          0 allocs/op
BenchmarkDirect-8                1000000              1246 ns/op               0 B/op          0 allocs/op
BenchmarkInline-8                2000000               671 ns/op               0 B/op          0 allocs/op
BenchmarkInterfaceInline-8       2000000               677 ns/op               0 B/op          0 allocs/op

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

It doesn't help that the code isn't documented

There are two reasons why I prefer not to comment prototypes:

  • it wastes time on things that can change rapidly (i.e. several times a day)
  • it makes mistakes in API more obvious, e.g. if someone with sufficient knowledge doesn't understand what something does, just by reading code it indicates there's something wrong with the API

I'm not a fan of the process32, process64 methods on the 3 buffers.

Yeah, after reading what you and @kisielk; and some thinking on my own. I agree.

One of the problem is that it's unclear what they mean. (Currently they meant converting to one of the standard buffers.) But that conversion cannot be done in a stateless manner in some cases, i.e. downsampling. When down-sampling you probably need some additional state to carry over for dithering.

process means "a series of interdependent operations carried out by computer"...

Process is a very very common interface (VST, AU, JUCE...) for audio nodes, diverging on that name doesn't make sense to me.

However, after thinking about it, and comments about removing n. I think audio.Processor, audio.Reader and audio.Writer should be different entities. When using them it would be easy to make a mistake when you don't cut the buffer "the right way".

While it can be cumbersome at times, it makes a lot of sense at other times (for instance skipping to certain position).

I'm uncertain about using "frames" part in API-s. But certainly it would be usable, when we have methods for Buffer-s to cut the front or back by frames.

With regards to seeking I think using time.Duration makes more sense than frames. The only case I prefer using frames or samples is when the audio system glitches and I would need to skip to the correct position. I'm also somewhat afraid that there are audio streams/formats where the sample rate varies mid-stream and hence using frames would be unreliable. (I don't have any examples to back it up.)

(Just realized that there might be a need for a Nop buffer for skipping, something similar to JUCE processBlockBypassed.)

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

I didn't seem to find a way for a package depending on this implementation to declare a function taking any kind of buffers?

Yeah, I have to agree that having a single interface Buffer and accepting that in nodes makes the API cleaner and avoids some of the duplication in setup/teardown for processor nodes. I will adjust my design accordingly.

from audio.

mattetti avatar mattetti commented on August 18, 2024

With regards to seeking I think using time.Duration makes more sense than frames

I very rarely use the concept of duration when working with audio. You referred to VST/AU and the process function of plugins. In this specific case, the plugin implements a function that receives a pointer to an array and modify that buffer in place. The receiver has no notion of duration, if the developer wanted to implement a delay for instance, the host BPM would be used, and the delay would be calculated in samples/frames and applied to the buffer. Now let say if the developer wanted to fix duration delay of 0.5s, she would also probably convert that duration in samples/frames so she could apply the effect across multiple buffers.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

I very rarely use the concept of duration when working with audio.

For writing audio code yes. But for using audio code I would say that you exclusively expose durations. e.g. when you create an Echo node, you specify the delay in seconds not frames. Internally, yes, everything eventually ends up as frames.

For the seeking part. The most likely usage I can imagine is this:

decoder, _ := audio.NewDecoder(file)
player := NewPlayer(decoder)
go player.Play()

for {
	key := GetKey
	switch key {
	case '2': player.Seek(0.2 * player.Duration(), audio.SeekStart)
	case key.Right: player.Seek(5 * time.Second, audio.SeekCurrent)

Or did you mean some other use-case when you talked about seeking?

from audio.

mattetti avatar mattetti commented on August 18, 2024

back from NAMM, need to catch-up on a few things and I'll work on implementing Nigel's suggestion to see how it looks.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

Few notes about my experiments with different things:

  1. Implementing Processor-s on an interleaved buffer is much more effort than on a non-interleaved version.
  2. Supporting multiple buffer formats manually is annoying.

from audio.

mattetti avatar mattetti commented on August 18, 2024

@egonelbre can you say more, especially about #1?

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

The main issue comes from trying to handle the different channels, just as an example it would often end up something like this:

switch buf.ChannelCount {
case 1:
	for i := 0; i < len(buf.Data); i++ {
		buf.Data[i] = float32(phase)
		phase += speed
		speed = (speed + target) * 0.5
	}
case 2:
	for i := 0; i < len(buf.Data); i += 2 {
		buf.Data[i] = float32(phase)
		buf.Data[i+1] = buf.Data[i]
		phase += speed
		speed = (speed + target) * 0.5
	}
default:
	// alternatively a slow path ...
	return 0, errors.New("unsupported channel count")
}

// vs this version, that handles all channels

c0 := buf.Channel(0)
for i := range c0 {
	c0[i] = phase
	phase += speed
	speed = (speed + target) * 0.5
}
for k := 1; k < buf.ChannelCount; k++){
	copy(buf.Channel(k), c0)
}

And this only accounts for two channels. For more channels, you would need more cases to properly optimize them and handle the striding. I suspect because of the += 2 or += 3 will do worse optimizations, but I haven't checked this. With convolution something similar happens, you have to write the filters to work with arbitrary strides, rather than just with 1.

When you have data in separate channels then you could write the single channel version (assuming you don't need to analyze multiple channels), and use the same code for other channels.

But, I do understand that when reading or writing, then the interleaved version is better, because you can process in smaller chunks. Also as far as I skimmed different API-s, they used non-interleaved approaches.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@mattetti I just converted my experiment to use split channels, the full difference can be seen here egonelbre/exp@8c77c79?diff=split#diff-b1e66adfee4cfc554526b30559e7e612

from audio.

mattetti avatar mattetti commented on August 18, 2024

I'm back at looking at what we can do to design a generic buffer API. @egonelbre I don't think isolating channels is the way to go, we can more than likely implement an API similar to your channel() passing an iteration function that will be fed every samples in a channel for instance. From everything I've seen so far, samples are always interleaved (that said I'm sure there are counter examples).

I did hit an issue today when I needed to feed a PCM data chunk in int32 or float32 and my current API was only providing int. So I'm going to explore the image and text packages to see if there is a good flexible solution there. I looked at azul3d which is quite well done but I'm not a fan of their buffer/slice implementation: https://github.com/azul3d/engine/blob/master/audio/slice.go

from audio.

mattetti avatar mattetti commented on August 18, 2024

Taking notes:

Text transformer & chain interfaces: https://godoc.org/golang.org/x/text/transform#Transformer (similar to what an audio transformer interface could be but passing a buffer)

In regards to the Draw package, it starts from an Image interface:

// Image is a finite rectangular grid of color.Color values taken from a color
// model.
type Image interface {
	// ColorModel returns the Image's color model.
	ColorModel() color.Model
	// Bounds returns the domain for which At can return non-zero color.
	// The bounds do not necessarily contain the point (0, 0).
	Bounds() Rectangle
	// At returns the color of the pixel at (x, y).
	// At(Bounds().Min.X, Bounds().Min.Y) returns the upper-left pixel of the grid.
	// At(Bounds().Max.X-1, Bounds().Max.Y-1) returns the lower-right one.
	At(x, y int) color.Color
}

The interface is implemented by many concrete types such as:

// NRGBA is an in-memory image whose At method returns color.NRGBA values.
type NRGBA struct {
	// Pix holds the image's pixels, in R, G, B, A order. The pixel at
	// (x, y) starts at Pix[(y-Rect.Min.Y)*Stride + (x-Rect.Min.X)*4].
	Pix []uint8
	// Stride is the Pix stride (in bytes) between vertically adjacent pixels.
	Stride int
	// Rect is the image's bounds.
	Rect Rectangle
}

func (p *NRGBA) ColorModel() color.Model { return color.NRGBAModel }

func (p *NRGBA) Bounds() Rectangle { return p.Rect }

func (p *NRGBA) At(x, y int) color.Color {
	return p.NRGBAAt(x, y)
}

or

// Paletted is an in-memory image of uint8 indices into a given palette.
type Paletted struct {
	// Pix holds the image's pixels, as palette indices. The pixel at
	// (x, y) starts at Pix[(y-Rect.Min.Y)*Stride + (x-Rect.Min.X)*1].
	Pix []uint8
	// Stride is the Pix stride (in bytes) between vertically adjacent pixels.
	Stride int
	// Rect is the image's bounds.
	Rect Rectangle
	// Palette is the image's palette.
	Palette color.Palette
}

func (p *Paletted) ColorModel() color.Model { return p.Palette }

func (p *Paletted) Bounds() Rectangle { return p.Rect }

func (p *Paletted) At(x, y int) color.Color {
	if len(p.Palette) == 0 {
		return nil
	}
	if !(Point{x, y}.In(p.Rect)) {
		return p.Palette[0]
	}
	i := p.PixOffset(x, y)
	return p.Palette[p.Pix[i]]
}

A gif image is implemented as such:

type GIF struct {
	Image     []*image.Paletted
        //...
}

When drawing using the generic Image interface, a switch type is done to optimize the flow:
https://golang.org/src/image/draw/draw.go?s=2824:2903#L114

But there is always a slow fallback.

from audio.

kisielk avatar kisielk commented on August 18, 2024

@mattetti that's a good find.

Maybe the audio interface could have functions that return / set values for a particular channel, sample pair as either float64 or int. Then the underlying data could be in a more optimized form and functions that need the highest performance can use a type switch and operate on the data directly.

from audio.

kisielk avatar kisielk commented on August 18, 2024

I'm thinking something like:

type Buffer interface {
   Size() Format // returns number of channels and samples, not sure of the naming
   ReadInt(int ch, int sample) int // Maybe read into a passed-in buffer instead?
   ReadFloat(int ch, int sample) float64 // ditto
}

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

From everything I've seen so far, samples are always interleaved (that said I'm sure there are counter examples.

There seem to be two uses:

  1. reading & writing... buffer usually interleaved. e.g. most audio formats and devices.
  2. processing... buffer usually split. e.g. VST, AU (V2), WebAudio, AAX, RTAS, JUCE

With callbacks and call per sample, the overhead is an issue; if it could be inlined and better optimized, it would make some things much nicer.

For the 1. case, you don't always have random access or it is expensive for compressed streams. So building a buffer with random sample based access doesn't make sense... I don't think random access is necessary there, you want to read a chunk or write a chunk of samples. Interleaved when you want to immediately output or basic processing. Deinterleaved for more complicated things.

For 2. you want deinterleaved buffers to make processing simpler. To ensure that processing nodes can communicate you want at most three different buffer formats that work with it.

The case against not doing swizzling when reading/writing is performance... but when you want performance, then in those cases you probably need to use the native format anyways, which might be uint16... but then there might also be issues with sample rate or mono-stereo conversions.

from audio.

mattetti avatar mattetti commented on August 18, 2024

I was apparently wrong:

The buffer contains data in the following format: non-interleaved IEEE754 32-bit linear PCM with a nominal range between -1 and +1, that is, 32bits floating point buffer, with each samples between -1.0 and 1.0. If the AudioBuffer has multiple channels, they are stored in separate buffer.

But the API clearly exposes a way to get the channel number: AudioBuffer.numberOfChannels You can also decode a stereo track into a buffer so I'm super confused. Ok I'm not confused anymore, the documentation is misleading, Web Audio is storing the channel PCM data in separate internal buffers accessible through getChannelData(channel) confirming what you were saying.

// Stereo
var channels = 2;

// Create an empty two second stereo buffer at the
// sample rate of the AudioContext
var frameCount = audioCtx.sampleRate * 2.0;
var myArrayBuffer = audioCtx.createBuffer(channels, frameCount, audioCtx.sampleRate);

button.onclick = function() {
  // Fill the buffer with white noise;
  // just random values between -1.0 and 1.0
  for (var channel = 0; channel < channels; channel++) {
    // This gives us the actual array that contains the data
    var nowBuffering = myArrayBuffer.getChannelData(channel);
    for (var i = 0; i < frameCount; i++) {
      // Math.random() is in [0; 1.0]
      // audio needs to be in [-1.0; 1.0]
      nowBuffering[i] = Math.random() * 2 - 1;
    }
  }

  // Get an AudioBufferSourceNode.
  // This is the AudioNode to use when we want to play an AudioBuffer
  var source = audioCtx.createBufferSource();

  // set the buffer in the AudioBufferSourceNode
  source.buffer = myArrayBuffer;

  // connect the AudioBufferSourceNode to the
  // destination so we can hear the sound
  source.connect(audioCtx.destination);

  // start the source playing
  source.start();

}

from audio.

mattetti avatar mattetti commented on August 18, 2024

Quick update: this interleaved vs not interleaved issue got me stuck. I instead opted to do a lot of work on my wav and aiff decoders, make sure the buffered approach worked and was documented/had examples. I spent a decent amount of time improving the 24bit audio support for the codecs (my implementation was buggy but hard to verify, tests were added).

At this point, I still think Nigel/image pkg approach is the most interesting but I don't have the bandwidth to build a full implementation. Egonelbre's implementation shows some of the challenges we are facing depending on the design decision we take.

I'll focus on my current edge cases and real world implementation to see how a generic API would best benefit my own usage.

On a different note, @brettbuddin wrote a very interesting synth in Go and I think he would be a good addition to this discussion: https://github.com/brettbuddin/eolian

from audio.

brettbuddin avatar brettbuddin commented on August 18, 2024

Took me a bit to get caught up on the state of the discussion.

One of the early decisions I made with Eolian was to focus on a single channel of audio. This is mostly because I didn't want to deal with interleaving in processing and I didn't have a decent implementation for keeping channels separate at the time. I've wanted to implement 1-to-2 (and back) conversion modules for some time now, but have been stuck in a similar mode of trying to decide how not to disrupt the current mono structure too drastically.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

While implementing different things I realized there is one important case where int16 is preferrable -- ARM -- or mobile device in general. Looking at pure op stats there's around 2x performance difference between using float32 and int16. Of course, I'm not sure how directly these measurements translate into audio code, but it is something to be considered.

from audio.

mattetti avatar mattetti commented on August 18, 2024

Here is the API superpowered uses to deal with offline processing:
https://github.com/superpoweredSDK/Low-Latency-Android-Audio-iOS-Audio-Engine/blob/master/Examples_iOS/SuperpoweredOfflineProcessingExample/SuperpoweredOfflineProcessingExample/ViewController.mm#L79

Here is how they deal with real time audio: https://github.com/superpoweredSDK/Low-Latency-Android-Audio-iOS-Audio-Engine/blob/master/Examples_iOS/SuperpoweredFrequencyDomain/SuperpoweredFrequencyDomain/ViewController.mm#L19 ( with a hint at interleaved vs not)

from audio.

brettbuddin avatar brettbuddin commented on August 18, 2024

Is there a case where you wouldn't be aware of how many channels of audio there are in your stream? The variable method doesn't seem all that valuable to me.

Edit: Nevermind. I misread the example. Disregard this question.

from audio.

faiface avatar faiface commented on August 18, 2024

(Reposting from Pixel gitter).

Just a quick point here from me (got some work right now, will come with more ideas later). Looking at go-audio, I don't really like the core Buffer interface at all. It's got way too many methods, none of which enables reading data from it without converting it to one of those 3 standard formats. Which makes it kind of pointless to use any other format.

Which is very bad, because a playback library might want to use a different format.

Which would require extensive conversions for every piece of audio data, maybe even 2 conversions for each piece, and that's a lot of mess.

from audio.

mattetti avatar mattetti commented on August 18, 2024

I wasn't aware of this new audio package. I'll definitely check it out. One thing that surprised me is the fact that your streamer interface seems to be locked to stereo. Is that correct?

from audio.

faiface avatar faiface commented on August 18, 2024

Yes that is correct. This no problem for mono, mono can always be easily turned into stereo. This is only a problem for more channels than stereo offers. I'm open to solutions for this. So far, it's like "stereo is enough", but that might not be true.

from audio.

faiface avatar faiface commented on August 18, 2024

The idea of replacing go-audio was a bit too rushed, upon second though, it's probably not a good idea. However, I still suggest that go-audio changes its core interfaces away from buffers and towards streaming.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@faiface I'm unclear about the comment that "buffers in the high-level API are not a very good idea for real-time". I mean your design still uses buffers i.e. the samples [][2]float32 parameter.

I do agree that the end-user of things like player, effects, decoding, streaming, encoding, shouldn't have to worry about buffers. But internally handling of buffers is unfortunate, but necessary. E.g. when you get 16 track audio sequencer, most likely you will need some way to generate/mix them separately, because one might cause ducking on an another track.

Note that the go-audio/audio is not the only attempt at audio lib - i.e. https://github.com/loov/audio.

I will try to go over the package in more details later, but preliminary comments based on seeing the API:

  1. There can be N channels.
  2. There can be multiple SampleRate-s in your input/processing/output.
  3. There can be multiple sample types e.g. int16, float32, float64 come to mind.

I do agree that as a simple game audio package it will be sufficient... And not handling the diversity can be a good trade-off.

from audio.

faiface avatar faiface commented on August 18, 2024

@egonelbre Of course, internally there usually are some buffers, however, this API minimizes their use. This is in contrast to OpenAL, which requires you to create and fill a buffer if you want to play anything. This results in additional copying between buffers and so on. Maybe that can be implemented efficiently, however, OpenAL does not implement it efficiently enough. Note, that when calling streamer.Stream(buffer), no additional buffer needs to be created. This is not the case for OpenAL way of handling buffers. We switched to ALSA backend for Linux and that enabled us to have millisecond(-ish) latency. Data is only copied to temporary on-stack buffers when we need to do mixing, other than that, data is copied directly to the lowest-level.

Regarding channels, yeah, that's a trade-off.

Regarding sample rates. Yes, there can be multiple sample rates. The reason we adopted the approach of "unified sample rate" is that it simplifies signal processing code in major ways. For example, if the Stream method would take an additional format argument, it would always need to handle conversions between formats. This would non-only result in a lot more code, it would also result in worse performance.

However, unified sample rate is not a problem, IMHO. Audio sources, such as files, can always be resampled on-the-fly, or the sample rate can be adjusted according to them. The unified sample rate is only important for the intermediate audio processing stage.

The same holds for different sample types. Any sample type can be converted to float64. If the audio source contains int16 samples, it's no problem. They simply get converted to float64 when streaming.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

Yup, I agree that within a single pipeline you will most likely use a single sample rate. However, a server that does audio-processing in multiple threads might need different sample-rates. But, I'm also not completely clear how big of a problem this is in real-world.

With regards to float64. On ARM there's a (potential) performance hit that you take due to processing float64 instead of float32 or int16.

I understand very well that handling all these cases complicate the implementation. (To the extent that for a game, I will probably just use stereo float32, myself.)

With regards to performance, I would like to specialize on all the different parameters. Effectively, to have type Buffer_SampleRate44100_Stereo struct and get all the effects filters implemented automatically for all the variations and with SIMD (as much as possible) -- but I still don't have a good idea how it would look like in practice. This might be an unrealistic goal, but definitely something to think about.

I do have some thoughts. 1. code-gen; Nile stream processing language for defining effects, 2. optimization passes similar to go compiler SSA and generate SIMD code, 3. single interfaced package that is backed by multiple implementations etc....

But, generally, every decision has trade-offs -- whether you care about the "trade-off" is dependent on the domain and your use-cases -- and it's completely fine for some domain not to care about some of these trade-offs.

from audio.

faiface avatar faiface commented on August 18, 2024

I believe that you and I agree that having to implement each filter/effect/compositor for each sampling format (mono, stereo, int8, int16, float32, ...) is awful. So yeah, one way probably is code generation, although I'm not sure how feasible this is.

The question I think is in place is: is it worth to support all the formats within the audio processing stage? I decided to answer this question with: no. Of course, it's necessary to support all the formats for decoding and encoding audio files. But I think it ends there.

Let me show you. Here's the implementation of the "gain effect" (a non-sophisticated volume slider): https://github.com/faiface/beep/blob/master/effects.go#L8. Now, the important thing is: this is all I had to write. And it works with everything. I can use to to adjust the volume of music, sound effects, individual sounds, I can change the volume in real-time. If I was to support all the different formats in the audio processing stage, I can't really see how I would achieve this level of convenience. And convenience like this makes it possible to implement all of the complicated effects, such as 3D sound and Doppler effect things in very few lines of code.

Beep is currently less than 1K LOC (not counting https://github.com/hajimehoshi/oto, which implements the actual playback backend) and already supports loading WAV files, mixing, sequencing, controlling playback (pausing, tracking time), playing only parts of an audio file or any streamer, and makes it easy to create your own streamers and effects.

I'm sorry if I sounded like I was wanting to destroy go-audio before :). I eventually came to the conclusion that it's best to keep Beep and go-audio separate. I just want to point out one way of doing audio and show its benefits. And you guys can take an inspiration, or don't. No problem there.

EDIT: And I don't see why a server with multiple threads could possibly need to use different sample rates anywhere.

from audio.

mattetti avatar mattetti commented on August 18, 2024

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

I completely understand the reasoning for not supporting the different options I described. I think it's important to examine the potential design-space as fully as possible.

Note, the Gain function you have there can produce clicking noise. i.e. try switching between gain 0 and 1 every ~100ms. Not sure where the gain value comes from, how it's modified and how big are your buffers... so it might not happen in practice. And, oh, yes, I know all the pain of writing the processor code for multiple formats/options https://github.com/loov/audio/blob/master/example/internal/effect/gain.go#L20. It avoids some of the aliasing, but still can have problems with it. Also it doesn't do control signal input. But I digress.

Do note, this is also the reason we have go-audio/audio and loov/audio separate -- so we can experiment independently and come up with unique solutions. (e.g. I noticed in the wave decoder this https://github.com/faiface/beep/blob/master/wav/decode.go#L125 -- look at similar issue I posted against go-audio/wav go-audio/wav#5 (comment))

Eventually we can converge on some of the issues... or maybe we will create multiple package for some different-domains, but there is still code that could be potentially shared between all the implementations. (e.g. ASM/SIMD code on []float32, []float64 arrays for different platforms; or converting MULAW bytestream to []float64-s).

EDIT: And I don't see why a server with multiple threads could possibly need to use different sample rates anywhere.

Facilitated example: Imagine a resampling service where you submit a wave-file and you can specify the end-result sample-rate, which you can later download.

from audio.

faiface avatar faiface commented on August 18, 2024

One of the reasons why I concluded that beep is not a good fit for go-audio is that I deliberately make compromises to enable simplifity and remove the pain of audio programming, but that gets in the way of universality, so we're aiming at slightly different goals.

Regarding clicking in Gain, I don't think it's Gain's responsibility to fix that. You know, if you start playing PCM data in the middle of a wave (at value 0.4 for example) a click is what's supposed to happen. I think it's the user's responsibility to adjust the gain value smoothly, e.g. by lerping. And the buffer size can be as small as 1/200s (but only on Linux at the moment, but we're working on getting the latency down on other platforms too).

The decoding design, that's interesting, although I think WAVE only supports uint8, int16 and float32, right? So I'm not sure it's worth it, but I'll think about it.

And resampling server. If you take a look at our roadmap, one of the things which are to be done are: Buffer struct and Resample decorater. Don't be confused, Buffer struct is more like a bytes.Buffer for samples and less like an OpenAL buffer. So the sample conversion will be done something like this:

buf := beep.NewBuffer(fmt2)
buf.Collect(beep.Resample(source, fmt1.SampleRate, fmt2.SampleRate))

or even directly to the file (in which case the file is never fully in memory, note that wav.Encode is not implemented yet)

wav.Encode(fw, fmt2, beep.Resample(source, fmt1.SampleRate, fmt2.SampleRate))

beep.SampleRate only takes place when it's important, and that will be documented.

from audio.

egonelbre avatar egonelbre commented on August 18, 2024

@faiface Some WAVE samples here: https://en.wikipedia.org/wiki/WAV#WAV_file_audio_coding_formats_compared. And a list of things libsndfile supports http://www.mega-nerd.com/libsndfile/. Although, you can pretty far by just supporting PCM u8, s16, s24, s32, s64 and IEEE_FLOAT 32, 64, mulaw and alaw.

from audio.

wsc1 avatar wsc1 commented on August 18, 2024

@faiface audacity uses float32 wav

from audio.

wsc1 avatar wsc1 commented on August 18, 2024

Hi all,

There is an issue about pre-emption in go runtime which can influence reliability of audio i/o.

Also, zc/sio is a work in progress to deal with audio callback APIs like CoreAudio, AAudio, Jack which operate on a foreign usually real-time thread.
The goal is to make the path to the callback free of sys-calls (cgo isn't).

from audio.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.