API Documentation

Below is the API documentation for Onda.jl.

For general information regarding the Onda Format itself, please see beacon-biosignals/OndaFormat.

For a nice introduction to the package, see the Onda Tour.

Support For Generic Path-Like Types

Onda.jl attempts to be as agnostic as possible with respect to the storage system that sample data, Arrow files, etc. are read from/written to. As such, any path-like argument accepted by an Onda.jl API function should generically "work" as long as the argument's type supports:

Base.read(path)::Vector{UInt8} (return the bytes stored at path)
Base.write(path, bytes::Vector{UInt8}) (write bytes to the location specified by path)

For backends which support direct byte range access (e.g. S3), Onda.read_byte_range may be overloaded for the backend's corresponding path type to enable further optimizations:

Onda.read_byte_range — Function

read_byte_range(path, byte_offset, byte_count)

Return the equivalent read(path)[(byte_offset + 1):(byte_offset + byte_count)], but try to avoid reading unreturned intermediate bytes. Note that the effectiveness of this method depends on the type of path.

`*.onda.annotations.arrow`

Onda.Annotation — Type

Annotation(annotations_table_row)
Annotation(recording, id, span; custom...)
Annotation(; recording, id, span, custom...)

Return an Annotation instance that represents a row of an *.onda.annotations.arrow table.

The names, types, and order of the columns of an Annotation instance are guaranteed to result in a *.onda.annotations.arrow-compliant row when written out via write_annotations.

This type primarily exists to aid in the validated construction of such rows/tables, and is not intended to be used as a type constraint in function or struct definitions. Instead, you should generally duck-type any "annotation-like" arguments/fields so that other generic row types will compose with your code.

This type supports Tables.jl's AbstractRow interface (but does not subtype AbstractRow).

Onda.read_annotations — Function

read_annotations(io_or_path; materialize::Bool=false, validate_schema::Bool=true)

Return the *.onda.annotations.arrow-compliant table read from io_or_path.

If validate_schema is true, the table's schema will be validated to ensure it is a *.onda.annotations.arrow-compliant table. An ArgumentError will be thrown if any schema violation is detected.

If materialize is false, the returned table will be an Arrow.Table while if materialize is true, the returned table will be a NamedTuple of columns. The primary difference is that the former has a conversion-on-access behavior, while for the latter, any potential conversion cost has been paid up front.

Onda.write_annotations — Function

write_annotations(io_or_path, table; kwargs...)

Write table to io_or_path, first validating that table is a *.onda.annotations.arrow-compliant table. An ArgumentError will be thrown if any schema violation is detected.

kwargs is forwarded to an internal invocation of Arrow.write(...; file=true, kwargs...).

Onda.merge_overlapping_annotations — Function

merge_overlapping_annotations(annotations)

Given the *.onda.annotations.arrow-compliant table annotations, return a table corresponding to annotations except that overlapping entries have been merged.

Specifically, two annotations a and b are determined to be "overlapping" if a.recording == b.recording && TimeSpans.overlaps(a.span, b.span). Merged annotations' span fields are generated via calling TimeSpans.shortest_timespan_containing on the overlapping set of source annotations.

The returned annotations table only has a single custom column named from whose entries are Vector{UUID}s populated with the ids of the generated annotations' source(s). Note that every annotation in the returned table has a freshly generated id field and a non-empty from field, even if the from only has a single element (i.e. corresponds to a single non-overlapping annotation).

Note that this function internally works with Tables.columns(annotations) rather than annotations directly, so it may be slower and/or require more memory if !Tables.columnaccess(annotations).

`*.onda.signals.arrow`

Onda.Signal — Type

Signal(signals_table_row)
Signal(recording, file_path, file_format, span, kind, channels, sample_unit,
       sample_resolution_in_unit, sample_offset_in_unit, sample_type, sample_rate;
       custom...)
Signal(; recording, file_path, file_format, span, kind, channels, sample_unit,
       sample_resolution_in_unit, sample_offset_in_unit, sample_type, sample_rate,
       custom...)
Signal(info::SamplesInfo; recording, file_path, file_format, span, custom...)

Return a Signal instance that represents a row of an *.onda.signals.arrow table

The names, types, and order of the columns of a Signal instance are guaranteed to result in a *.onda.signals.arrow-compliant row when written out via write_signals. The exception is the file_path column, whose type is unchecked in order to allow callers to utilize custom path types.

This type primarily exists to aid in the validated construction of such rows/tables, and is not intended to be used as a type constraint in function or struct definitions. Instead, you should generally duck-type any "signal-like" arguments/fields so that other generic row types will compose with your code.

This type supports Tables.jl's AbstractRow interface (but does not subtype AbstractRow).

Onda.SamplesInfo — Type

SamplesInfo(; kind, channels, sample_unit,
            sample_resolution_in_unit, sample_offset_in_unit,
            sample_type, sample_rate,
            validate::Bool=Onda.validate_on_construction())
SamplesInfo(kind, channels, sample_unit,
            sample_resolution_in_unit, sample_offset_in_unit,
            sample_type, sample_rate;
            validate::Bool=Onda.validate_on_construction())
SamplesInfo(signals_table_row; validate::Bool=Onda.validate_on_construction())

Return a SamplesInfo instance whose fields are a subset of a *.onda.signals.arrow row:

kind
channels
sample_unit
sample_resolution_in_unit
sample_offset_in_unit
sample_type
sample_rate

The SamplesInfo struct bundles together the fields of a *.onda.signals.arrow row that are intrinsic to a signal's sample data, leaving out extrinsic file or recording information. This is useful when the latter information is irrelevant or does not yet exist (e.g. if sample data is being constructed/manipulated in-memory without yet having been serialized).

Bundling these fields together under a common type facilitates dispatch for various Onda API functions. Additionally:

If validate is true, then Onda.validate is called on new instances upon construction.
The provided sample_type may be either an Onda-compliant string or a DataType. If it is a string, it will be converted to its corresponding DataType.

Onda.validate — Function

validate(info::SamplesInfo)

Returns nothing, checking that the given info.kind, info.channels and info.sample_unit are valid w.r.t. the Onda specification. If a violation is found, an ArgumentError is thrown.

validate(samples::Samples)

Returns nothing, checking that the given samples are valid w.r.t. the underlying samples.info and the Onda specification's canonical LPCM representation. If a violation is found, an ArgumentError is thrown.

Properties that are validated by this function include:

encoded element type matches samples.info.sample_type
the number of rows of samples.data matches the number of channels in samples.info

Onda.read_signals — Function

read_signals(io_or_path; materialize::Bool=false, validate_schema::Bool=false)

Return the *.onda.signals.arrow-compliant table read from io_or_path.

If validate_schema is true, the table's schema will be validated to ensure it is a *.onda.signals.arrow-compliant table. An ArgumentError will be thrown if any schema violation is detected.

If materialize is false, the returned table will be an Arrow.Table while if materialize is true, the returned table will be a NamedTuple of columns. The primary difference is that the former has a conversion-on-access behavior, while for the latter, any potential conversion cost has been paid up front.

Onda.write_signals — Function

write_signals(io_or_path, table; kwargs...)

Write table to io_or_path, first validating that table is a compliant *.onda.signals.arrow table. An ArgumentError will be thrown if any schema violation is detected.

kwargs is forwarded to an internal invocation of Arrow.write(...; file=true, kwargs...).

Onda.channel — Method

channel(x, name)

Return i where x.channels[i] == name.

Onda.channel — Method

channel(x, i::Integer)

Return x.channels[i].

Onda.channel_count — Method

channel_count(x)

Return length(x.channels).

Onda.sample_count — Method

sample_count(x, duration::Period)

Return the number of multichannel samples that fit within duration given x.sample_rate.

Onda.sizeof_samples — Method

sizeof_samples(x, duration::Period)

Returns the expected size (in bytes) of an encoded Samples object corresponding to x and duration:

sample_count(x, duration) * channel_count(x) * sizeof(x.sample_type)

`Samples`

Onda.Samples — Type

Samples(data::AbstractMatrix, info::SamplesInfo, encoded::Bool;
        validate::Bool=Onda.validate_on_construction())

Return a Samples instance with the following fields:

data::AbstractMatrix: A matrix of sample data. The i th row of the matrix corresponds to the ith channel in info.channels, while the jth column corresponds to the jth multichannel sample.
info::SamplesInfo: The SamplesInfo object that describes the Samples instance.
encoded::Bool: If true, the values in data are LPCM-encoded as prescribed by the Samples instance's info. If false, the values in data have been decoded into the info's canonical units.

If validate is true, Onda.validate is called on the constructed Samples instance before it is returned.

Note that getindex and view are defined on Samples to accept normal integer indices, but also accept channel names for row indices and TimeSpan values for column indices; see Onda/examples/tour.jl for a comprehensive set of indexing examples.

See also: load, store, encode, encode!, decode, decode!

Base.:== — Method

==(a::Samples, b::Samples)

Returns a.encoded == b.encoded && a.info == b.info && a.data == b.data.

Onda.channel — Function

channel(x, name)

Return i where x.channels[i] == name.

channel(x, i::Integer)

Return x.channels[i].

channel(samples::Samples, name)

Return channel(samples.info, name).

This function is useful for indexing rows of samples.data by channel names.

channel(samples::Samples, i::Integer)

Return channel(samples.info, i).

Onda.channel_count — Function

channel_count(x)

Return length(x.channels).

channel_count(samples::Samples)

Return channel_count(samples.info).

Onda.sample_count — Function

sample_count(x, duration::Period)

Return the number of multichannel samples that fit within duration given x.sample_rate.

sample_count(samples::Samples)

Return the number of multichannel samples in samples (i.e. size(samples.data, 2))

Onda.encode — Function

encode(sample_type::DataType, sample_resolution_in_unit, sample_offset_in_unit,
       sample_data, dither_storage=nothing)

Return a copy of sample_data quantized according to sample_type, sample_resolution_in_unit, and sample_offset_in_unit. sample_type must be a concrete subtype of Onda.VALID_SAMPLE_TYPE_UNION. Quantization of an individual sample s is performed via:

round(S, (s - sample_offset_in_unit) / sample_resolution_in_unit)

with additional special casing to clip values exceeding the encoding's dynamic range.

If dither_storage isa Nothing, no dithering is applied before quantization.

If dither_storage isa Missing, dither storage is allocated automatically and triangular dithering is applied to the info prior to quantization.

Otherwise, dither_storage must be a container of similar shape and type to sample_data. This container is then used to store the random noise needed for the triangular dithering process, which is applied to the info prior to quantization.

If:

sample_type === eltype(sample_data) &&
sample_resolution_in_unit == 1 &&
sample_offset_in_unit == 0

then this function will simply return sample_data directly without copying/dithering.

encode(samples::Samples, dither_storage=nothing)

If samples.encoded is false, return a Samples instance that wraps:

encode(samples.info.sample_type,
       samples.info.sample_resolution_in_unit,
       samples.info.sample_offset_in_unit,
       samples.data, dither_storage)

If samples.encoded is true, this function is the identity.

Onda.encode! — Function

encode!(result_storage, sample_type::DataType, sample_resolution_in_unit,
        sample_offset_in_unit, sample_data, dither_storage=nothing)
encode!(result_storage, sample_resolution_in_unit, sample_offset_in_unit,
        sample_data, dither_storage=nothing)

Similar to encode(sample_type, sample_resolution_in_unit, sample_offset_in_unit, sample_data, dither_storage), but write encoded values to result_storage rather than allocating new storage.

sample_type defaults to eltype(result_storage) if it is not provided.

If:

sample_type === eltype(sample_data) &&
sample_resolution_in_unit == 1 &&
sample_offset_in_unit == 0

then this function will simply copy sample_data directly into result_storage without dithering.

encode!(result_storage, samples::Samples, dither_storage=nothing)

If samples.encoded is false, return a Samples instance that wraps:

encode!(result_storage,
        samples.info.sample_type,
        samples.info.sample_resolution_in_unit,
        samples.info.sample_offset_in_unit,
        samples.data, dither_storage)`.

If samples.encoded is true, return a Samples instance that wraps copyto!(result_storage, samples.data).

Onda.decode — Function

decode(sample_resolution_in_unit, sample_offset_in_unit, sample_data)

Return sample_resolution_in_unit .* sample_data .+ sample_offset_in_unit.

If:

sample_data isa AbstractArray &&
sample_resolution_in_unit == 1 &&
sample_offset_in_unit == 0

then this function is the identity and will return sample_data directly without copying.

decode(samples::Samples)

If samples.encoded is true, return a Samples instance that wraps

decode(samples.info.sample_resolution_in_unit, samples.info.sample_offset_in_unit, samples.data)

If samples.encoded is false, this function is the identity.

Onda.decode! — Function

decode!(result_storage, sample_resolution_in_unit, sample_offset_in_unit, sample_data)

Similar to decode(sample_resolution_in_unit, sample_offset_in_unit, sample_data), but write decoded values to result_storage rather than allocating new storage.

decode!(result_storage, samples::Samples)

If samples.encoded is true, return a Samples instance that wraps

decode!(result_storage, samples.info.sample_resolution_in_unit, samples.info.sample_offset_in_unit, samples.data)

If samples.encoded is false, return a Samples instance that wraps copyto!(result_storage, samples.data).

Onda.load — Function

load(signal[, span]; encoded::Bool=false)
load(file_path, file_format::Union{AbstractString,AbstractLPCMFormat}, info::SamplesInfo[, span]; encoded::Bool=false)

Return the Samples object described by signal/file_path/file_format/info.

If span is present, return load(...)[:, span], but attempt to avoid reading unreturned intermediate sample data. Note that the effectiveness of this optimized method versus the naive approach depends on the types of file_path (i.e. if there is a fast method defined for Onda.read_byte_range(::typeof(file_path), ...)) and file_format (i.e. does the corresponding format support random or chunked access).

If encoded is true, do not decode the Samples object before returning it.

Onda.store — Function

store(file_path, file_format::Union{AbstractString,AbstractLPCMFormat}, samples::Samples)

Serialize the given samples to file_format and write the output to file_path.

store(file_path, file_format::Union{AbstractString,AbstractLPCMFormat}, samples::Samples,
      recording::UUID, start::Period; custom...)

Serialize the given samples to file_format and write the output to file_path, returning a Signal instance constructed from the provided arguments (any provided custom keyword arguments are forwarded to an invocation of the Signal constructor).

Onda.channel — Method

channel(samples::Samples, name)

Return channel(samples.info, name).

This function is useful for indexing rows of samples.data by channel names.

Onda.channel — Method

channel(samples::Samples, i::Integer)

Return channel(samples.info, i).

Onda.channel_count — Method

channel_count(samples::Samples)

Return channel_count(samples.info).

Onda.sample_count — Method

sample_count(samples::Samples)

Return the number of multichannel samples in samples (i.e. size(samples.data, 2))

LPCM (De)serialization API

Onda.jl's LPCM (De)serialization API facilitates low-level streaming sample data (de)serialization and provides a storage-agnostic abstraction layer that can be overloaded to support new file/byte formats for (de)serializing LPCM-encodeable sample data.

Onda.AbstractLPCMFormat — Type

AbstractLPCMFormat

A type whose subtypes represents byte/stream formats that can be (de)serialized to/from Onda's standard interleaved LPCM representation.

All subtypes of the form F<:AbstractLPCMFormat must call Onda.register_lpcm_format! and define an appropriate file_format_string method.

See also:

format
deserialize_lpcm
deserialize_lpcm_callback
serialize_lpcm
LPCMFormat
LPCMZstFormat
AbstractLPCMStream

Onda.AbstractLPCMStream — Type

AbstractLPCMStream

A type that represents an LPCM (de)serialization stream.

See also:

deserializing_lpcm_stream
serializing_lpcm_stream
finalize_lpcm_stream

Onda.LPCMFormat — Type

LPCMFormat(channel_count::Int, sample_type::Type)
LPCMFormat(info::SamplesInfo)

Return a LPCMFormat<:AbstractLPCMFormat instance corresponding to Onda's default interleaved LPCM format assumed for sample data files with the "lpcm" extension.

channel_count corresponds to length(info.channels), while sample_type corresponds to info.sample_type

Note that bytes (de)serialized to/from this format are little-endian (per the Onda specification).

Onda.LPCMZstFormat — Type

LPCMZstFormat(lpcm::LPCMFormat; level=3)
LPCMZstFormat(info::SamplesInfo; level=3)

Return a LPCMZstFormat<:AbstractLPCMFormat instance that corresponds to Onda's default interleaved LPCM format compressed by zstd. This format is assumed for sample data files with the "lpcm.zst" extension.

The level keyword argument sets the same compression level parameter as the corresponding flag documented by the zstd command line utility.

See https://facebook.github.io/zstd/ for details about zstd.

Onda.format — Function

format(file_format::AbstractString, info::SamplesInfo; kwargs...)

Return f(info; kwargs...) where f constructs the AbstractLPCMFormat instance that corresponds to file_format. f is determined by matching file_format to a suitable format constuctor registered via register_lpcm_format!.

See also: deserialize_lpcm, serialize_lpcm

Onda.deserialize_lpcm — Function

deserialize_lpcm(format::AbstractLPCMFormat, bytes,
                 samples_offset::Integer=0,
                 samples_count::Integer=typemax(Int))
deserialize_lpcm(stream::AbstractLPCMStream,
                 samples_offset::Integer=0,
                 samples_count::Integer=typemax(Int))

Return a channels-by-timesteps AbstractMatrix of interleaved LPCM-encoded sample data by deserializing the provided bytes in the given format, or from the given stream constructed by deserializing_lpcm_stream.

Note that this operation may be performed in a zero-copy manner such that the returned sample matrix directly aliases bytes.

The returned segment is at most sample_offset samples offset from the start of stream/bytes and contains at most sample_count samples. This ensures that overrun behavior is generally similar to the behavior of Base.skip(io, n) and Base.read(io, n).

This function is the inverse of the corresponding serialize_lpcm method, i.e.:

serialize_lpcm(format, deserialize_lpcm(format, bytes)) == bytes

Onda.serialize_lpcm — Function

serialize_lpcm(format::AbstractLPCMFormat, samples::AbstractMatrix)
serialize_lpcm(stream::AbstractLPCMStream, samples::AbstractMatrix)

Return the AbstractVector{UInt8} of bytes that results from serializing samples to the given format (or serialize those bytes directly to stream) where samples is a channels-by-timesteps matrix of interleaved LPCM-encoded sample data.

Note that this operation may be performed in a zero-copy manner such that the returned AbstractVector{UInt8} directly aliases samples.

This function is the inverse of the corresponding deserialize_lpcm method, i.e.:

deserialize_lpcm(format, serialize_lpcm(format, samples)) == samples

Onda.deserialize_lpcm_callback — Function

deserialize_lpcm_callback(format::AbstractLPCMFormat, samples_offset, samples_count)

Return (callback, required_byte_offset, required_byte_count) where callback accepts the byte block specified by required_byte_offset and required_byte_count and returns the samples specified by samples_offset and samples_count.

As a fallback, this function returns (callback, missing, missing), where callback requires all available bytes. AbstractLPCMFormat subtypes that support partial/block-based deserialization (e.g. the basic LPCMFormat) can overload this function to only request exactly the byte range that is required for the sample range requested by the caller.

This allows callers to handle the byte block retrieval themselves while keeping Onda's LPCM Serialization API agnostic to the caller's storage layer of choice.

Onda.deserializing_lpcm_stream — Function

deserializing_lpcm_stream(format::AbstractLPCMFormat, io)

Return a stream::AbstractLPCMStream that wraps io to enable direct LPCM deserialization from io via deserialize_lpcm.

Note that stream must be finalized after usage via finalize_lpcm_stream. Until stream is finalized, io should be considered to be part of the internal state of stream and should not be directly interacted with by other processes.

Onda.serializing_lpcm_stream — Function

serializing_lpcm_stream(format::AbstractLPCMFormat, io)

Return a stream::AbstractLPCMStream that wraps io to enable direct LPCM serialization to io via serialize_lpcm.

Note that stream must be finalized after usage via finalize_lpcm_stream. Until stream is finalized, io should be considered to be part of the internal state of stream and should not be directly interacted with by other processes.

Onda.finalize_lpcm_stream — Function

finalize_lpcm_stream(stream::AbstractLPCMStream)::Bool

Finalize stream, returning true if the underlying I/O object used to construct stream is still open and usable. Otherwise, return false to indicate that underlying I/O object was closed as result of finalization.

Onda.register_lpcm_format! — Function

Onda.register_lpcm_format!(create_constructor)

Register an AbstractLPCMFormat constructor so that it can automatically be used when format is called. Authors of new AbstractLPCMFormat subtypes should call this function for their subtype.

create_constructor should be a unary function that accepts a single file_format::AbstractString argument, and return either a matching AbstractLPCMFormat constructor or nothing. Any returned AbstractLPCMFormat constructor f should be of the form f(info::SamplesInfo; kwargs...)::AbstractLPCMFormat.

Note that if Onda.register_lpcm_format! is called in a downstream package, it must be called within the __init__ function of the package's top-level module to ensure that the function is always invoked when the module is loaded (not just during precompilation). For details, see https://docs.julialang.org/en/v1/manual/modules/#Module-initialization-and-precompilation.

Onda.file_format_string — Function

file_format_string(format::AbstractLPCMFormat)

Return the String representation of format to be written to the file_format field of a *.signals file.

Utilities

Onda.gather — Function

gather(column_name, tables...; extract=((table, idxs) -> view(table, idxs, :)))

Gather rows from tables into a unified cross-table index along column_name. Returns a Dict whose keys are the unique values of column_name across tables, and whose values are tuples of the form:

(rows_matching_key_in_table_1, rows_matching_key_in_table_2, ...)

The provided extract function is used to extract rows from each table; it takes as input a table and a Vector{Int} of row indices, and returns the corresponding subtable. The default definition is sufficient for DataFrames tables.

Note that this function may internally call Tables.columns on each input table, so it may be slower and/or require more memory if any(!Tables.columnaccess, tables).

Onda.validate_on_construction — Function

Onda.validate_on_construction()

Returns true by default.

If this function returns true, various Onda objects will be validated upon construction for compliance with the Onda specification.

Users may interactively redefine this method to false in order to disable this extra layer validation, which can be useful when working with malformed Onda datasets.

See also: Onda.validate

Onda.upgrade_onda_dataset_to_v0_5! — Function

upgrade_onda_dataset_to_v0_5!(dataset_path;
                              verbose=true,
                              uuid_from_annotation=(_ -> uuid4()),
                              signal_file_path=((uuid, kind, ext) -> joinpath("samples", string(uuid), kind * "." * ext)),
                              signal_file_format=((ext, opts) -> ext),
                              kwargs...)

Upgrade a Onda Format v0.3/v0.4 dataset to Onda Format v0.5 by converting the dataset's recordings.msgpack.zst file into upgraded.onda.signals.arrow and upgraded.onda.annotations.arrow` files written to the root of the dataset (w/o deleting existing content).

Returns a tuple (signals, annotations) where signals is the table corresponding to upgraded.onda.signals.arrow and annotations is the table corresponding to upgraded.onda.annotations.arrow.

If verbose is true, this function will print out timestamped progress logs.
uuid_from_annotation is an function that takes in an Onda Format v0.3/v0.4

annotation (as a Dict{String}) and returns the id field to be associated with that annotation.

signal_file_path is a function that takes in a signal's recording UUID, the

signal's kind (formerly the name field), and the signal's file_extension field and returns the file_path field to be associated with that signal.

signal_file_format is a function that takes in a signal's file_extension field

and file_options field and returns the file_format field to be associated with that signal.

kwargs is forwarded to internal invocations of Arrow.write(...; file=true, kwargs...)

used to write the *.arrow files.

To upgrade a dataset that are older than Onda Format v0.3/v0.4, first use an older version of Onda.jl to upgrade the dataset to Onda Format v0.3 or above.