API Documentation
If you're a newcomer to Legolas.jl, please familiarize yourself with the tour before diving into this documentation.
Legolas Schemas
Legolas.SchemaVersion — TypeLegolas.SchemaVersion{name,version}A type representing a particular version of Legolas schema. The relevant name (a Symbol) and version (an Integer) are surfaced as type parameters, allowing them to be utilized for dispatch.
For more details and examples, please see Legolas.jl/examples/tour.jl and the "Schema-Related Concepts/Conventions" section of the Legolas.jl documentation.
The constructor SchemaVersion{name,version}() will throw an ArgumentError if version is negative.
See also: Legolas.@schema
Legolas.@schema — Macro@schema "name" PrefixDeclare a Legolas schema with the given name. Types generated by subsequent @version declarations for this schema will be prefixed with Prefix.
For more details and examples, please see Legolas.jl/examples/tour.jl.
Legolas.@version — Macro@version RecordType begin
declared_field_expression_1
declared_field_expression_2
⋮
end
@version RecordType > ParentRecordType begin
declared_field_expression_1
declared_field_expression_2
⋮
endGiven a prior @schema declaration of the form:
@schema "example.name" Name...the nth version of example.name can be declared in the same module via a @version declaration of the form:
@version NameV$(n) begin
declared_field_expression_1
declared_field_expression_2
⋮
end...which generates types definitions for the NameV$(n) type (a Legolas.AbstractRecord subtype) and NameV$(n)SchemaVersion type (an alias of typeof(SchemaVersion("example.name", n))), as well as the necessary definitions to overload relevant Legolas methods with specialized behaviors in accordance with the declared fields.
If the declared schema version has a parent, it should be specified via the optional > ParentRecordType clause. ParentRecordType should refer directly to an existing Legolas-generated record type.
Each declared_field_expression declares a field of the schema version, and is an expression of the form field::F = rhs where:
fieldis the corresponding field's name::Fdenotes the field's type constraint (if elided, defaults to::Any).rhsis the expression which producesfield::F(if elided, defaults tofield).
Accounting for all of the aforementioned allowed elisions, valid declared_field_expressions include:
field::F = rhsfield::F(interpreted asfield::F = field)field = rhs(interpreted asfield::Any = rhs)field(interpreted asfield::Any = field)
F is generally a type literal, but may also be an expression of the form (<:T), in which case the declared schema version's generated record type will expose a type parameter (constrained to be a subtype of T) for the given field. For example:
julia> @schema "example.foo" Foo
julia> @version FooV1 begin
x::Int
y::(<:Real)
end
julia> FooV1(x=1, y=2.0)
FooV1{Float64}: (x = 1, y = 2.0)
julia> FooV1{Float32}(x=1, y=2)
FooV1{Float32}: (x = 1, y = 2.0f0)
julia> FooV1(x=1, y="bad")
ERROR: TypeError: in FooV1, in _y_T, expected _y_T<:Real, got Type{String}This macro will throw a Legolas.SchemaVersionDeclarationError if:
- The provided
RecordTypedoes not follow the$(Prefix)V$(n)format, wherePrefixwas previously associated with a given schema by a prior@schemadeclaration. - There are no declared field expressions, duplicate fields are declared, or a given declared field expression is invalid.
- (if a parent is specified) The
@versiondeclaration does not comply with its parent's@versiondeclaration, or the parent hasn't yet been declared at all.
Note that this macro expects to be evaluated within top-level scope.
For more details and examples, please see Legolas.jl/examples/tour.jl and the "Schema-Related Concepts/Conventions" section of the Legolas.jl documentation.
Legolas.@check — Macro@check exprDefine a constraint for a schema version (e.g. @check x > 0) from a boolean expression. The expr should evaulate to true if the constraint is met or false if the constraint is violated. Multiple constraints may be defined for a schema version. All @check constraints defined with a @version must follow all fields defined by the schema version.
For more details and examples, please see Legolas.jl/examples/tour.jl.
Legolas.is_valid_schema_name — FunctionLegolas.is_valid_schema_name(x::AbstractString)Return true if x is a valid schema name, return false otherwise.
Valid schema names are lowercase, alphanumeric, and may contain hyphens or periods.
Legolas.parse_identifier — FunctionLegolas.parse_identifier(id::AbstractString)Given a valid schema version identifier id of the form:
$(names[1])@$(versions[1]) > $(names[2])@$(versions[2]) > ... > $(names[n])@$(versions[n])return an n element Vector{SchemaVersion} whose ith element is SchemaVersion(names[i], versions[i]).
Throws an ArgumentError if the provided string is not a valid schema version identifier.
For details regarding valid schema version identifiers and their structure, see the "Schema-Related Concepts/Conventions" section of the Legolas.jl documentation.
Legolas.name — FunctionLegolas.name(::Legolas.SchemaVersion{n})Return n.
Legolas.version — FunctionLegolas.version(::Legolas.SchemaVersion{n,v})Return v.
Legolas.identifier — FunctionLegolas.identifier(::Legolas.SchemaVersion)Return this Legolas.SchemaVersion's fully qualified schema version identifier. This string is serialized as the "legolas_schema_qualified" field value in table metadata for table written via Legolas.write.
Legolas.schema_provider — FunctionLegolas.schema_provider(::SchemaVersion)Returns a NamedTuple with keys name and version. The name is a Symbol corresponding to the package which defines the schema version, if known; otherwise nothing. Likewise the version is a VersionNumber or nothing.
Legolas.parent — FunctionLegolas.parent(sv::Legolas.SchemaVersion)Return the Legolas.SchemaVersion instance that corresponds to sv's declared parent.
Legolas.declared_fields — FunctionLegolas.declared_fields(sv::Legolas.SchemaVersion)Return a NamedTuple{...,Tuple{Vararg{DataType}} whose fields take the form:
<name of field declared by `sv`> = <field's type>If sv has a parent, the returned fields will include declared_fields(parent(sv)).
Legolas.declaration — FunctionLegolas.declaration(sv::Legolas.SchemaVersion)Return a Pair{String,Vector{NamedTuple}} of the form
schema_version_identifier::String => declared_field_infos::Vector{Legolas.DeclaredFieldInfo}where DeclaredFieldInfo has the fields:
name::Symbol: the declared field's nametype::Union{Symbol,Expr}: the declared field's declared type constraintparameterize::Bool: whether or not the declared field is exposed as a parameterstatement::Expr: the declared field's full assignment statement (as processed by@version, not necessarily as written)
Note that declaration is primarily intended to be used for interactive discovery purposes, and does not include the contents of declaration(parent(sv)).
Legolas.record_type — FunctionLegolas.record_type(sv::Legolas.SchemaVersion)Return the Legolas.AbstractRecord subtype associated with sv.
See also: Legolas.schema_version_from_record
Legolas.schema_version_from_record — FunctionLegolas.schema_version_from_record(record::Legolas.AbstractRecord)Return the Legolas.SchemaVersion instance associated with record.
See also: Legolas.record_type
Legolas.declared — FunctionLegolas.declared(sv::Legolas.SchemaVersion{name,version})Return true if the schema version name@version has been declared via @version in the current Julia session; return false otherwise.
Legolas.find_violation — FunctionLegolas.find_violation(ts::Tables.Schema, sv::Legolas.SchemaVersion)For each field f::F declared by sv:
- Define
A = Legolas.accepted_field_type(sv, F) - If
f::Tis present ints, ensure thatT <: Aor else immediately returnf::Symbol => T::DataType. - If
fisn't present ints, ensure thatMissing <: Aor else immediately returnf::Symbol => missing::Missing.
Otherwise, return nothing.
To return all violations instead of just the first, use Legolas.find_violations.
See also: Legolas.validate, Legolas.complies_with, Legolas.find_violations.
Legolas.find_violations — FunctionLegolas.find_violations(ts::Tables.Schema, sv::Legolas.SchemaVersion)Return a Vector{Pair{Symbol,Union{Type,Missing}}} of all of ts's violations with respect to sv.
This function's notion of "violation" is defined by Legolas.find_violation, which immediately returns the first violation found; prefer to use that function instead of find_violations in situations where you only need to detect any violation instead of all violations.
See also: Legolas.validate, Legolas.complies_with, Legolas.find_violation.
Legolas.complies_with — FunctionLegolas.complies_with(ts::Tables.Schema, sv::Legolas.SchemaVersion)Return isnothing(find_violation(ts, sv)).
See also: Legolas.find_violation, Legolas.find_violations, Legolas.validate
Legolas.validate — FunctionLegolas.validate(ts::Tables.Schema, sv::Legolas.SchemaVersion)Throws a descriptive ArgumentError if any violations are found, else return nothing.
See also: Legolas.find_violation, Legolas.find_violations, Legolas.find_violation, Legolas.complies_with
Legolas.accepted_field_type — FunctionLegolas.accepted_field_type(sv::Legolas.SchemaVersion, T::Type)Return the "maximal supertype" of T that is accepted by sv when evaluating a field of type >:T for schematic compliance via Legolas.find_violation; see that function's docstring for an explanation of this function's use in context.
SchemaVersion authors may overload this function to broaden particular type constraints that determine schematic compliance for their SchemaVersion, without needing to broaden the type constraints employed by their SchemaVersion's record type.
Legolas itself defines the following default overloads:
accepted_field_type(::SchemaVersion, T::Type) = T
accepted_field_type(::SchemaVersion, ::Type{Any}) = Any
accepted_field_type(::SchemaVersion, ::Type{UUID}) = Union{UUID,UInt128}
accepted_field_type(::SchemaVersion, ::Type{Symbol}) = Union{Symbol,AbstractString}
accepted_field_type(::SchemaVersion, ::Type{String}) = AbstractString
accepted_field_type(sv::SchemaVersion, ::Type{<:Vector{T}}) where T = AbstractVector{<:(accepted_field_type(sv, T))}
accepted_field_type(::SchemaVersion, ::Type{Vector}) = AbstractVector
accepted_field_type(sv::SchemaVersion, ::Type{Union{T,Missing}}) where {T} = Union{accepted_field_type(sv, T),Missing}Outside of these default overloads, this function should only be overloaded against specific SchemaVersions that are authored within the same module as the overload definition; to do otherwise constitutes type piracy and should be avoided.
Validating/Writing/Reading Legolas Tables
Legolas.extract_schema_version — FunctionLegolas.extract_schema_version(table)Attempt to extract Arrow metadata from table via Arrow.getmetadata(table).
If Arrow metadata is present and contains "legolas_schema_qualified" => s, return first(parse_identifier(s))
Otherwise, return nothing.
Legolas.write — FunctionLegolas.write(io_or_path, table, sv::SchemaVersion; validate::Bool=true, kwargs...)Write table to io_or_path, inserting the appropriate legolas_schema_qualified field in the written out Arrow metadata.
If validate is true, Legolas.validate(Tables.schema(table), vs) will be invoked before the table is written out to io_or_path.
Any other provided kwargs are forwarded to an internal invocation of Arrow.write.
Note that io_or_path may be any type that supports Base.write(io_or_path, bytes::Vector{UInt8}).
Legolas.read — FunctionLegolas.read(io_or_path; validate::Bool=true)Read and return an Arrow.Table from io_or_path.
If validate is true, Legolas.read will attempt to extract a Legolas.SchemaVersion from the deserialized Arrow.Table's metadata and use Legolas.validate to verify that the table's Table.Schema complies with the extracted Legolas.SchemaVersion before returning the table.
Note that io_or_path may be any type that supports Base.read(io_or_path)::Vector{UInt8}.
Legolas.tobuffer — FunctionLegolas.tobuffer(args...; kwargs...)A convenience function that constructs a fresh io::IOBuffer, calls Legolas.write(io, args...; kwargs...), and returns seekstart(io).
Analogous to the Arrow.tobuffer function.
Utilities
Legolas.lift — Functionlift(f, x)Return f(x) unless x isa Union{Nothing,Missing}, in which case return missing.
This is particularly useful when handling values from Arrow.Table, whose null values may present as either missing or nothing depending on how the table itself was originally constructed.
See also: construct
lift(f)Returns a curried function, x -> lift(f,x)
Legolas.construct — Functionconstruct(T::Type, x)Construct T(x) unless x is of type T, in which case return x itself. Useful in conjunction with the lift function for types which don't have a constructor which accepts instances of itself (e.g. T(::T)).
Examples
julia> using Legolas: construct
julia> construct(Float64, 1)
1.0
julia> Some(Some(1))
Some(Some(1))
julia> construct(Some, Some(1))
Some(1)Use the curried form when using lift:
julia> using Legolas: lift, construct
julia> lift(Some, Some(1))
Some(Some(1))
julia> lift(construct(Some), Some(1))
Some(1)Legolas.record_merge — Functionrecord_merge(record::AbstractRecord; fields_to_merge...)Return a new AbstractRecord with the same schema version as record, whose fields are computed via Tables.rowmerge(record; fields_to_merge...). The returned record is constructed by passing these merged fields to the AbstractRecord constructor that matches the type of the input record.
Legolas.gather — FunctionLegolas.gather(column_name, tables...; extract=((table, idxs) -> view(table, idxs, :)))Gather rows from tables into a unified cross-table index along column_name. Returns a Dict whose keys are the unique values of column_name across tables, and whose values are tuples of the form:
(rows_matching_key_in_table_1, rows_matching_key_in_table_2, ...)The provided extract function is used to extract rows from each table; it takes as input a table and a Vector{Int} of row indices, and returns the corresponding subtable. The default definition is sufficient for DataFrames tables.
Note that this function may internally call Tables.columns on each input table, so it may be slower and/or require more memory if any(!Tables.columnaccess, tables).
Note that we intend to eventually migrate this function from Legolas.jl to a more appropriate package.
Legolas.locations — Functionlocations(collections::Tuple)Return a Dict whose keys are the set of all elements across all provided collections, and whose values are the indices that locate each corresponding element across all provided collecitons.
Specifically, locations(collections)[k][i] will return a Vector{Int} whose elements are the index locations of k in collections[i]. If !(k in collections[i]), this Vector{Int} will be empty.
For example:
julia> Legolas.locations((['a', 'b', 'c', 'f', 'b'],
['d', 'c', 'e', 'b'],
['f', 'a', 'f']))
Dict{Char, Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}} with 6 entries:
'f' => ([4], [], [1, 3])
'a' => ([1], [], [2])
'c' => ([3], [2], [])
'd' => ([], [1], [])
'e' => ([], [3], [])
'b' => ([2, 5], [4], [])This function is useful as a building block for higher-level tabular operations that require indexing/grouping along specific sets of elements.
Legolas.materialize — FunctionLegolas.materialize(table)Return a fully deserialized copy of table.
This function is useful when table has built-in deserialize-on-access or conversion-on-access behavior (like Arrow.Table) and you'd like to pay such access costs upfront before repeatedly accessing the table.
Note that we intend to eventually migrate this function from Legolas.jl to a more appropriate package.