API Documentation
If you're a newcomer to Legolas.jl, please familiarize yourself with the tour before diving into this documentation.
Legolas Schema
s
Legolas.SchemaVersion
— TypeLegolas.SchemaVersion{name,version}
A type representing a particular version of Legolas schema. The relevant name
(a Symbol
) and version
(an Integer
) are surfaced as type parameters, allowing them to be utilized for dispatch.
For more details and examples, please see Legolas.jl/examples/tour.jl
and the "Schema-Related Concepts/Conventions" section of the Legolas.jl documentation.
The constructor SchemaVersion{name,version}()
will throw an ArgumentError
if version
is negative.
See also: Legolas.@schema
Legolas.@schema
— Macro@schema "name" Prefix
Declare a Legolas schema with the given name
. Types generated by subsequent @version
declarations for this schema will be prefixed with Prefix
.
For more details and examples, please see Legolas.jl/examples/tour.jl
.
Legolas.@version
— Macro@version RecordType begin
declared_field_expression_1
declared_field_expression_2
⋮
end
@version RecordType > ParentRecordType begin
declared_field_expression_1
declared_field_expression_2
⋮
end
Given a prior @schema
declaration of the form:
@schema "example.name" Name
...the n
th version of example.name
can be declared in the same module via a @version
declaration of the form:
@version NameV$(n) begin
declared_field_expression_1
declared_field_expression_2
⋮
end
...which generates types definitions for the NameV$(n)
type (a Legolas.AbstractRecord
subtype) and NameV$(n)SchemaVersion
type (an alias of typeof(SchemaVersion("example.name", n))
), as well as the necessary definitions to overload relevant Legolas methods with specialized behaviors in accordance with the declared fields.
If the declared schema version has a parent, it should be specified via the optional > ParentRecordType
clause. ParentRecordType
should refer directly to an existing Legolas-generated record type.
Each declared_field_expression
declares a field of the schema version, and is an expression of the form field::F = rhs
where:
field
is the corresponding field's name::F
denotes the field's type constraint (if elided, defaults to::Any
).rhs
is the expression which producesfield::F
(if elided, defaults tofield
).
Accounting for all of the aforementioned allowed elisions, valid declared_field_expression
s include:
field::F = rhs
field::F
(interpreted asfield::F = field
)field = rhs
(interpreted asfield::Any = rhs
)field
(interpreted asfield::Any = field
)
F
is generally a type literal, but may also be an expression of the form (<:T)
, in which case the declared schema version's generated record type will expose a type parameter (constrained to be a subtype of T
) for the given field. For example:
julia> @schema "example.foo" Foo
julia> @version FooV1 begin
x::Int
y::(<:Real)
end
julia> FooV1(x=1, y=2.0)
FooV1{Float64}: (x = 1, y = 2.0)
julia> FooV1{Float32}(x=1, y=2)
FooV1{Float32}: (x = 1, y = 2.0f0)
julia> FooV1(x=1, y="bad")
ERROR: TypeError: in FooV1, in _y_T, expected _y_T<:Real, got Type{String}
This macro will throw a Legolas.SchemaVersionDeclarationError
if:
- The provided
RecordType
does not follow the$(Prefix)V$(n)
format, wherePrefix
was previously associated with a given schema by a prior@schema
declaration. - There are no declared field expressions, duplicate fields are declared, or a given declared field expression is invalid.
- (if a parent is specified) The
@version
declaration does not comply with its parent's@version
declaration, or the parent hasn't yet been declared at all.
Note that this macro expects to be evaluated within top-level scope.
For more details and examples, please see Legolas.jl/examples/tour.jl
and the "Schema-Related Concepts/Conventions" section of the Legolas.jl documentation.
Legolas.@check
— Macro@check expr
Define a constraint for a schema version (e.g. @check x > 0
) from a boolean expression. The expr
should evaulate to true
if the constraint is met or false
if the constraint is violated. Multiple constraints may be defined for a schema version. All @check
constraints defined with a @version
must follow all fields defined by the schema version.
For more details and examples, please see Legolas.jl/examples/tour.jl
.
Legolas.is_valid_schema_name
— FunctionLegolas.is_valid_schema_name(x::AbstractString)
Return true
if x
is a valid schema name, return false
otherwise.
Valid schema names are lowercase, alphanumeric, and may contain hyphens or periods.
Legolas.parse_identifier
— FunctionLegolas.parse_identifier(id::AbstractString)
Given a valid schema version identifier id
of the form:
$(names[1])@$(versions[1]) > $(names[2])@$(versions[2]) > ... > $(names[n])@$(versions[n])
return an n
element Vector{SchemaVersion}
whose i
th element is SchemaVersion(names[i], versions[i])
.
Throws an ArgumentError
if the provided string is not a valid schema version identifier.
For details regarding valid schema version identifiers and their structure, see the "Schema-Related Concepts/Conventions" section of the Legolas.jl documentation.
Legolas.name
— FunctionLegolas.name(::Legolas.SchemaVersion{n})
Return n
.
Legolas.version
— FunctionLegolas.version(::Legolas.SchemaVersion{n,v})
Return v
.
Legolas.identifier
— FunctionLegolas.identifier(::Legolas.SchemaVersion)
Return this Legolas.SchemaVersion
's fully qualified schema version identifier. This string is serialized as the "legolas_schema_qualified"
field value in table metadata for table written via Legolas.write
.
Legolas.schema_provider
— FunctionLegolas.schema_provider(::SchemaVersion)
Returns a NamedTuple with keys name
and version
. The name is a Symbol
corresponding to the package which defines the schema version, if known; otherwise nothing
. Likewise the version
is a VersionNumber
or nothing
.
Legolas.parent
— FunctionLegolas.parent(sv::Legolas.SchemaVersion)
Return the Legolas.SchemaVersion
instance that corresponds to sv
's declared parent.
Legolas.declared_fields
— FunctionLegolas.declared_fields(sv::Legolas.SchemaVersion)
Return a NamedTuple{...,Tuple{Vararg{DataType}}
whose fields take the form:
<name of field declared by `sv`> = <field's type>
If sv
has a parent, the returned fields will include declared_fields(parent(sv))
.
Legolas.declaration
— FunctionLegolas.declaration(sv::Legolas.SchemaVersion)
Return a Pair{String,Vector{NamedTuple}}
of the form
schema_version_identifier::String => declared_field_infos::Vector{Legolas.DeclaredFieldInfo}
where DeclaredFieldInfo
has the fields:
name::Symbol
: the declared field's nametype::Union{Symbol,Expr}
: the declared field's declared type constraintparameterize::Bool
: whether or not the declared field is exposed as a parameterstatement::Expr
: the declared field's full assignment statement (as processed by@version
, not necessarily as written)
Note that declaration
is primarily intended to be used for interactive discovery purposes, and does not include the contents of declaration(parent(sv))
.
Legolas.record_type
— FunctionLegolas.record_type(sv::Legolas.SchemaVersion)
Return the Legolas.AbstractRecord
subtype associated with sv
.
See also: Legolas.schema_version_from_record
Legolas.schema_version_from_record
— FunctionLegolas.schema_version_from_record(record::Legolas.AbstractRecord)
Return the Legolas.SchemaVersion
instance associated with record
.
See also: Legolas.record_type
Legolas.declared
— FunctionLegolas.declared(sv::Legolas.SchemaVersion{name,version})
Return true
if the schema version name@version
has been declared via @version
in the current Julia session; return false
otherwise.
Legolas.find_violation
— FunctionLegolas.find_violation(ts::Tables.Schema, sv::Legolas.SchemaVersion)
For each field f::F
declared by sv
:
- Define
A = Legolas.accepted_field_type(sv, F)
- If
f::T
is present ints
, ensure thatT <: A
or else immediately returnf::Symbol => T::DataType
. - If
f
isn't present ints
, ensure thatMissing <: A
or else immediately returnf::Symbol => missing::Missing
.
Otherwise, return nothing
.
To return all violations instead of just the first, use Legolas.find_violations
.
See also: Legolas.validate
, Legolas.complies_with
, Legolas.find_violations
.
Legolas.find_violations
— FunctionLegolas.find_violations(ts::Tables.Schema, sv::Legolas.SchemaVersion)
Return a Vector{Pair{Symbol,Union{Type,Missing}}}
of all of ts
's violations with respect to sv
.
This function's notion of "violation" is defined by Legolas.find_violation
, which immediately returns the first violation found; prefer to use that function instead of find_violations
in situations where you only need to detect any violation instead of all violations.
See also: Legolas.validate
, Legolas.complies_with
, Legolas.find_violation
.
Legolas.complies_with
— FunctionLegolas.complies_with(ts::Tables.Schema, sv::Legolas.SchemaVersion)
Return isnothing(find_violation(ts, sv))
.
See also: Legolas.find_violation
, Legolas.find_violations
, Legolas.validate
Legolas.validate
— FunctionLegolas.validate(ts::Tables.Schema, sv::Legolas.SchemaVersion)
Throws a descriptive ArgumentError
if any violations are found, else return nothing
.
See also: Legolas.find_violation
, Legolas.find_violations
, Legolas.find_violation
, Legolas.complies_with
Legolas.accepted_field_type
— FunctionLegolas.accepted_field_type(sv::Legolas.SchemaVersion, T::Type)
Return the "maximal supertype" of T
that is accepted by sv
when evaluating a field of type >:T
for schematic compliance via Legolas.find_violation
; see that function's docstring for an explanation of this function's use in context.
SchemaVersion
authors may overload this function to broaden particular type constraints that determine schematic compliance for their SchemaVersion
, without needing to broaden the type constraints employed by their SchemaVersion
's record type.
Legolas itself defines the following default overloads:
accepted_field_type(::SchemaVersion, T::Type) = T
accepted_field_type(::SchemaVersion, ::Type{Any}) = Any
accepted_field_type(::SchemaVersion, ::Type{UUID}) = Union{UUID,UInt128}
accepted_field_type(::SchemaVersion, ::Type{Symbol}) = Union{Symbol,AbstractString}
accepted_field_type(::SchemaVersion, ::Type{String}) = AbstractString
accepted_field_type(sv::SchemaVersion, ::Type{<:Vector{T}}) where T = AbstractVector{<:(accepted_field_type(sv, T))}
accepted_field_type(::SchemaVersion, ::Type{Vector}) = AbstractVector
accepted_field_type(sv::SchemaVersion, ::Type{Union{T,Missing}}) where {T} = Union{accepted_field_type(sv, T),Missing}
Outside of these default overloads, this function should only be overloaded against specific SchemaVersion
s that are authored within the same module as the overload definition; to do otherwise constitutes type piracy and should be avoided.
Validating/Writing/Reading Legolas Tables
Legolas.extract_schema_version
— FunctionLegolas.extract_schema_version(table)
Attempt to extract Arrow metadata from table
via Arrow.getmetadata(table)
.
If Arrow metadata is present and contains "legolas_schema_qualified" => s
, return first(parse_identifier(s))
Otherwise, return nothing
.
Legolas.write
— FunctionLegolas.write(io_or_path, table, sv::SchemaVersion; validate::Bool=true, kwargs...)
Write table
to io_or_path
, inserting the appropriate legolas_schema_qualified
field in the written out Arrow metadata.
If validate
is true
, Legolas.validate(Tables.schema(table), vs)
will be invoked before the table is written out to io_or_path
.
Any other provided kwargs
are forwarded to an internal invocation of Arrow.write
.
Note that io_or_path
may be any type that supports Base.write(io_or_path, bytes::Vector{UInt8})
.
Legolas.read
— FunctionLegolas.read(io_or_path; validate::Bool=true)
Read and return an Arrow.Table
from io_or_path
.
If validate
is true
, Legolas.read
will attempt to extract a Legolas.SchemaVersion
from the deserialized Arrow.Table
's metadata and use Legolas.validate
to verify that the table's Table.Schema
complies with the extracted Legolas.SchemaVersion
before returning the table.
Note that io_or_path
may be any type that supports Base.read(io_or_path)::Vector{UInt8}
.
Legolas.tobuffer
— FunctionLegolas.tobuffer(args...; kwargs...)
A convenience function that constructs a fresh io::IOBuffer
, calls Legolas.write(io, args...; kwargs...)
, and returns seekstart(io)
.
Analogous to the Arrow.tobuffer
function.
Utilities
Legolas.lift
— Functionlift(f, x)
Return f(x)
unless x isa Union{Nothing,Missing}
, in which case return missing
.
This is particularly useful when handling values from Arrow.Table
, whose null values may present as either missing
or nothing
depending on how the table itself was originally constructed.
See also: construct
lift(f)
Returns a curried function, x -> lift(f,x)
Legolas.construct
— Functionconstruct(T::Type, x)
Construct T(x)
unless x
is of type T
, in which case return x
itself. Useful in conjunction with the lift
function for types which don't have a constructor which accepts instances of itself (e.g. T(::T)
).
Examples
julia> using Legolas: construct
julia> construct(Float64, 1)
1.0
julia> Some(Some(1))
Some(Some(1))
julia> construct(Some, Some(1))
Some(1)
Use the curried form when using lift
:
julia> using Legolas: lift, construct
julia> lift(Some, Some(1))
Some(Some(1))
julia> lift(construct(Some), Some(1))
Some(1)
Legolas.record_merge
— Functionrecord_merge(record::AbstractRecord; fields_to_merge...)
Return a new AbstractRecord
with the same schema version as record
, whose fields are computed via Tables.rowmerge(record; fields_to_merge...)
. The returned record is constructed by passing these merged fields to the AbstractRecord
constructor that matches the type of the input record
.
Legolas.gather
— FunctionLegolas.gather(column_name, tables...; extract=((table, idxs) -> view(table, idxs, :)))
Gather rows from tables
into a unified cross-table index along column_name
. Returns a Dict
whose keys are the unique values of column_name
across tables
, and whose values are tuples of the form:
(rows_matching_key_in_table_1, rows_matching_key_in_table_2, ...)
The provided extract
function is used to extract rows from each table; it takes as input a table and a Vector{Int}
of row indices, and returns the corresponding subtable. The default definition is sufficient for DataFrames
tables.
Note that this function may internally call Tables.columns
on each input table, so it may be slower and/or require more memory if any(!Tables.columnaccess, tables)
.
Note that we intend to eventually migrate this function from Legolas.jl to a more appropriate package.
Legolas.locations
— Functionlocations(collections::Tuple)
Return a Dict
whose keys are the set of all elements across all provided collections, and whose values are the indices that locate each corresponding element across all provided collecitons.
Specifically, locations(collections)[k][i]
will return a Vector{Int}
whose elements are the index locations of k
in collections[i]
. If !(k in collections[i])
, this Vector{Int}
will be empty.
For example:
julia> Legolas.locations((['a', 'b', 'c', 'f', 'b'],
['d', 'c', 'e', 'b'],
['f', 'a', 'f']))
Dict{Char, Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}} with 6 entries:
'f' => ([4], [], [1, 3])
'a' => ([1], [], [2])
'c' => ([3], [2], [])
'd' => ([], [1], [])
'e' => ([], [3], [])
'b' => ([2, 5], [4], [])
This function is useful as a building block for higher-level tabular operations that require indexing/grouping along specific sets of elements.
Legolas.materialize
— FunctionLegolas.materialize(table)
Return a fully deserialized copy of table
.
This function is useful when table
has built-in deserialize-on-access or conversion-on-access behavior (like Arrow.Table
) and you'd like to pay such access costs upfront before repeatedly accessing the table.
Note that we intend to eventually migrate this function from Legolas.jl to a more appropriate package.