API Documentation

If you're a newcomer to Legolas.jl, please familiarize yourself with via the tour before diving into this documentation.

Legolas Schemas and Rows

Legolas.@rowMacro
@row("name@version", field_expressions...)
@row("name@version" > "parent_name@parent_version", field_expressions...)

Define a new Legolas.Schema{name,version} whose required fields are specified by field_expressions. Returns Legolas.Row{Legolas.Schema{name,version}} which can be conveniently aliased to the caller's preferred binding for a row constructor associated with Legolas.Schema{name,version}.

Each element of field_expression defines a required field for Legolas.Schema{name,version}, and is an expression of the form field::F = rhs where:

  • field is the corresponding field's name
  • ::F denotes the field's type constraint (if elided, defaults to ::Any).
  • rhs is the expression which produces field::F (if elided, defaults to field).

As implied above, the following alternative forms are also allowed:

  • field::F (interpreted as field::F = field)
  • field = rhs (interpreted as field::Any = rhs)
  • field (interpreted as field::Any = field)

For more details and examples, please see Legolas.jl/examples/tour.jl and the "Tips for Schema Authors" section of the Legolas.jl documentation.

source
Legolas.RowType
Legolas.Row(schema::Schema; fields...)
Legolas.Row(schema::Schema, row)

Return a Legolas.Row <: Tables.AbstractRow instance whose fields are the provided fields (or the fields of row) validated/transformed in accordance with provided schema.

For more details and examples, please see Legolas.jl/examples/tour.jl.

source
Legolas.SchemaType
Legolas.Schema{name,version}

A type representing the schema of a Legolas.Row. The name (a Symbol) and version (an Integer) are surfaced as type parameters, allowing them to be utilized for dispatch.

For more details and examples, please see Legolas.jl/examples/tour.jl and the "Tips for Schema Authors" section of the Legolas.jl documentation.

See also: schema_name, schema_version, schema_parent

source
Legolas.is_valid_schema_nameFunction
Legolas.is_valid_schema_name(x::AbstractString)

Return true if x is a valid schema name, return false otherwise.

Valid schema names are lowercase, alphanumeric, and may contain hyphens or periods.

source
Legolas.schema_nameFunction
schema_name(::Type{<:Legolas.Schema{name}})
schema_name(::Legolas.Schema{name})

Return name.

source
Legolas.schema_versionFunction
schema_version(::Type{Legolas.Schema{name,version}})
schema_version(::Legolas.Schema{name,version})

Return version.

source
Legolas.schema_qualified_stringFunction
schema_qualified_string(::Legolas.Schema{name,version})

Return this Legolas.Schema's fully qualified schema identifier string. This string is serialized as the "legolas_schema_qualified"field value in table metadata for table written via [Legolas.write`](@ref).

source
Legolas.schema_parentFunction
schema_parent(::Type{Legolas.Schema{name,version}})
schema_parent(::Legolas.Schema{name,version})

Return the Legolas.Schema instance that corresponds to the parent of the given Legolas.Schema.

source

Validating/Writing/Reading Legolas Tables

Legolas.extract_schemaFunction
Legolas.extract_schema(table)

Attempt to extract Arrow metadata from table via Arrow.getmetadata(table).

If Arrow metadata is present and contains "legolas_schema_qualified" => s, return Legolas.Schema(s).

Otherwise, return nothing.

source
Legolas.validateFunction
Legolas.validate(tables_schema::Tables.Schema, legolas_schema::Legolas.Schema)

Throws an ArgumentError if tables_schema does comply with legolas_schema, otherwise returns nothing.

Specifically, tables_schema is considered to comply with legolas_schema if:

  • every non->:Missing field required by legolas_schema is present in tables_schema.
  • T <: S for each field f::T in tables_schema that matches a required legolas_schema field f::S.
source
Legolas.validate(table, legolas_schema::Legolas.Schema)

Attempt to determine s::Tables.Schema from table and return Legolas.validate(s, legolas_schema).

If a Tables.Schema cannot be determined, a warning message is logged and nothing is returned.

source
Legolas.validate(table)

If Legolas.extract_schema(table) returns a valid Legolas.Schema, return Legolas.validate(table, Legolas.extract_schema(table)).

Otherwise, if a Legolas.Schema isn't found or is invalid, an ArgumentError is thrown.

source
Legolas.writeFunction
Legolas.write(io_or_path, table, schema::Schema; validate::Bool=true, kwargs...)

Write table to io_or_path, inserting the appropriate legolas_schema_qualified field in the written out Arrow metadata.

If validate is true, Legolas.validate will be called on the table before it written out.

Any other provided kwargs are forwarded to an internal invocation of Arrow.write.

Note that io_or_path may be any type that supports Base.write(io_or_path, bytes::Vector{UInt8}).

source
Legolas.readFunction
Legolas.read(io_or_path; validate::Bool=true)

Read and return an Arrow.Table from io_or_path.

If validate is true, Legolas.validate will be called on the table before it is returned.

Note that io_or_path may be any type that supports Base.read(io_or_path)::Vector{UInt8}.

source

Utilities

Legolas.liftFunction
lift(f, x)

Return f(x) unless x isa Union{Nothing,Missing}, in which case return missing.

This is particularly useful when handling values from Arrow.Table, whose null values may present as either missing or nothing depending on how the table itself was originally constructed.

source
lift(f)

Returns a curried function, x -> lift(f,x)

source
Missing docstring.

Missing docstring for Legolas.assign_to_table_metadata!. Check Documenter's build log for details.

Legolas.gatherFunction
gather(column_name, tables...; extract=((table, idxs) -> view(table, idxs, :)))

Gather rows from tables into a unified cross-table index along column_name. Returns a Dict whose keys are the unique values of column_name across tables, and whose values are tuples of the form:

(rows_matching_key_in_table_1, rows_matching_key_in_table_2, ...)

The provided extract function is used to extract rows from each table; it takes as input a table and a Vector{Int} of row indices, and returns the corresponding subtable. The default definition is sufficient for DataFrames tables.

Note that this function may internally call Tables.columns on each input table, so it may be slower and/or require more memory if any(!Tables.columnaccess, tables).

Note that we intend to eventually migrate this function from Legolas.jl to a more appropriate package.

source
Legolas.locationsFunction
locations(collections::Tuple)

Return a Dict whose keys are the set of all elements across all provided collections, and whose values are the indices that locate each corresponding element across all provided collecitons.

Specifically, locations(collections)[k][i] will return a Vector{Int} whose elements are the index locations of k in collections[i]. If !(k in collections[i]), this Vector{Int} will be empty.

For example:

julia> Legolas.locations((['a', 'b', 'c', 'f', 'b'],
                          ['d', 'c', 'e', 'b'],
                          ['f', 'a', 'f']))
Dict{Char, Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}} with 6 entries:
  'f' => ([4], [], [1, 3])
  'a' => ([1], [], [2])
  'c' => ([3], [2], [])
  'd' => ([], [1], [])
  'e' => ([], [3], [])
  'b' => ([2, 5], [4], [])

This function is useful as a building block for higher-level tabular operations that require indexing/grouping along specific sets of elements.

source
Legolas.materializeFunction
materialize(table)

Return a fully deserialized copy of table.

This function is useful when table has built-in deserialize-on-access or conversion-on-access behavior (like Arrow.Table) and you'd like to pay such access costs upfront before repeatedly accessing the table.

Note that we intend to eventually migrate this function from Legolas.jl to a more appropriate package.

source