TransformSpecifications.jl

Enabling structured transformations via defined I/O specifications.

Introduction & Overview

This package provides tools to define explicitly-specified transformation components. Such components can then be used to define pipelines that are themselves composed of individual explicitly-specified components, or facilitate distributed computation. One primary use-case is in creating explicitly defined pipelines that chain components together. These pipelines are in the form of directed acyclic graphs (DAGs), where each node of the graph is a component, and the edges correspond to data transfers between the components. The graph is "directed" since data flows in one direction (from the outputs of a component to the inputs of another), and "acyclic" since cycles are not allowed; one component cannot supply data to another which then supplies data back to the original component.

Later in the documentation, we will get into a lot more details about the tools that this package provides. But first, let us look at the high-level steps one follows to define such a pipeline using this package.

  1. Define the inputs and outputs of each step. TransformSpecifications itself does not provide (nor require) specific types for defining inputs and outputs, but this is commonly implemented via Legolas.jl schemas.
  2. Define functions that takes each set inputs to the corresponding outputs. For the purposes of setting up the pipeline, these can be placeholder functions that don't actually do anything, but once you want to run the pipeline, these will need to do whatever work is required in order to generate the outputs from the inputs. Again, this step is independent of any code in TransformSpecifications.jl itself.
  3. Package up steps (1) and (2) into AbstractTransformSpecifications, like TransformSpecification and NoThrowTransform. These are the "components", the nodes of the graph.
  4. Create input_assemblers for each component to route necessary outputs of previous components into the inputs of the component. This creates the edges of the graph.
  5. Create a DAG using DAGStep or NoThrowDAG to assemble all of the components and assemblers into a DAG.
  6. Use it! Apply the DAG to inputs using transform! or transform, and create a mermaid diagram using mermaidify.

With these general steps in mind, it can help to see some examples.

Table of contents

TransformSpecification

TransformSpecifications.TransformSpecificationType
TransformSpecification{T<:Type,U<:Type} <: AbstractTransformSpecification

Basic component that specifies a transform that, when applied to input of type T, will return output of type U.

See also: TransformSpecification

Fields

  • input_specification::T
  • output_specification::U
  • transform_fn::Function Function with signature transform_fn(::input_specification) -> output_specification

Example

using Legolas: @schema, @version

@schema "example-in" ExampleInSchema
@version ExampleInSchemaV1 begin
    in_name::String
end

@schema "example-out" ExampleOutSchema
@version ExampleOutSchemaV1 begin
    out_name::String
end

function apply_example(in_record)
    out_name = in_record.in_name * " earthling"
    return ExampleOutSchemaV1(; out_name)
end
ts = TransformSpecification(ExampleInSchemaV1, ExampleOutSchemaV1, apply_example)

# output
TransformSpecification{ExampleInSchemaV1,ExampleOutSchemaV1}: `apply_example`
transform!(ts, ExampleInSchemaV1(; in_name="greetings"))

# output
ExampleOutSchemaV1: (out_name = "greetings earthling",)
source
TransformSpecifications.transform!Method
transform!(ts::TransformSpecification, input)

Return output_specification(ts) by applying ts.transform_fn to input. May error if:

  • input does not conform to input_specification(ts), i.e., convert_spec(input_specification(ts), input) errors
  • ts.transform_fn errors when applied to the interpreted input, or
  • the output generated by ts.transform_fn is not a output_specification(ts)

For a non-erroring alternative, see NoThrowTransform.

See also: convert_spec

source

NoThrowTransform

NoThrowTransforms are a way to wrap a transform such that any errors encountered during the application of the transform will be returned as a NoThrowResult rather than thrown as an exception.

Debugging tip

To get the stack trace for a violation generated by a NoThrowTransform, call transform_force_throw! on it instead of transform!.

TransformSpecifications.NoThrowResultType
NoThrowResult(result::T, violations, warnings) where {T}
NoThrowResult(result; violations=String[], warnings=String[])
NoThrowResult(; result=missing, violations=String[], warnings=String[])

Type that specifies the result of a transformation, indicating successful application of a transform through presence (or lack thereof) of violations. Consists of either a non-missing result (success state) or non-empty violations and type Missing (failure state).

Note that constructing a NoThrowTransform from an input result of type NoThrowTransform, e.g., NoThrowTransform(::NoThrowTransform{T}, ...), collapses down to a singleNoThrowResult{T}; any inner and outer warnings and violations fields are concatenated and returned in the resultantNoThrowResult{T}`.

See also: nothrow_succeeded

Fields

  • warnings::AbstractVector{<:AbstractString}: List of generated warnings that are not critical enough to be violations.
  • violations::AbstractVector{<:AbstractString} List of reason(s) result was not able to be generated.
  • result::: Generated result; missing if any violations encountered.

Example

using Legolas: @schema, @version
@schema "example" ExampleSchemaA
@version ExampleSchemaAV1 begin
    name::String
end

NoThrowResult(ExampleSchemaAV1(; name="yeehaw"))

# output
NoThrowResult{ExampleSchemaAV1}: Transform succeeded
  ✅ result:
ExampleSchemaAV1: (name = "yeehaw",)
NoThrowResult(ExampleSchemaAV1(; name="huzzah"); warnings="Hark, watch your step...")

# output
NoThrowResult{ExampleSchemaAV1}: Transform succeeded
  ⚠️  Hark, watch your step...
  ✅ result:
ExampleSchemaAV1: (name = "huzzah",)
NoThrowResult(; violations=["Epic fail!", "Slightly less epic fail!"],
                warnings=["Uh oh..."])

# output
NoThrowResult{Missing}: Transform failed
  ❌ Epic fail!
  ❌ Slightly less epic fail!
  ⚠️  Uh oh...
source
TransformSpecifications.NoThrowTransformType
NoThrowTransform{TransformSpecification{T<:Type,U<:Type}} <: AbstractTransformSpecification

Wrapper around a basic TransformSpecification that returns a NoThrowResult of type NoThrowResult{T}, where T is the output specification of the inner transform. If calling transform! on a NoThrowTransform errors, due to either incorrect input/output types or an exception during the transform itself, the exception will be caught and returned as a NoThrowResult{Missing}, with the error(s) in the result's violations field. See NoThrowResult for details.

Note that results of a NoThrowTransform collapse down to a single NoThrowResult when nested, such that if the outputspecification of the inner TransformSpecification is itself a NoThrowResult{T}, the outputspecification of the NoThrowTransform will have that same output specification NoThrowResult{T}, and not NoThrowResult{NoThrowResult{T}}.

Fields

  • transform_spec::TransformSpecification{T,U}

Example 1: Successful transformation

Set-up:

using Legolas: @schema, @version

@schema "example-a" ExampleSchemaA
@version ExampleSchemaAV1 begin
    in_name::String
end

@schema "example-b" ExampleSchemaB
@version ExampleSchemaBV1 begin
    out_name::String
end

function apply_example(in_record)
    out_name = in_record.in_name * " earthling"
    return ExampleSchemaBV1(; out_name)
end
ntt = NoThrowTransform(ExampleSchemaAV1, ExampleSchemaBV1, apply_example)

# output
NoThrowTransform{ExampleSchemaAV1,ExampleSchemaBV1}: `apply_example`

Application of transform:

transform!(ntt, ExampleSchemaAV1(; in_name="greetings"))

# output
NoThrowResult{ExampleSchemaBV1}: Transform succeeded
  ✅ result:
ExampleSchemaBV1: (out_name = "greetings earthling",)

Example 2: Failing transformation

Set-up:

force_failure_example(in_record) = NoThrowResult(; violations=["womp", "womp"])
ntt = NoThrowTransform(ExampleSchemaAV1, ExampleSchemaBV1, force_failure_example)

# output
NoThrowTransform{ExampleSchemaAV1,ExampleSchemaBV1}: `force_failure_example`

Application of transform:

transform!(ntt, ExampleSchemaAV1(; in_name="greetings"))

# output
NoThrowResult{Missing}: Transform failed
  ❌ womp
  ❌ womp
source
TransformSpecifications.transform!Method
transform!(ntt::NoThrowTransform, input; verbose_violations=false)

Return NoThrowResult of applying ntt.transform_spec.transform_fn to input. Transform will fail (i.e., return a NoThrowResult{Missing} if:

  • input does not conform to input_specification(ntt), i.e., convert_spec(input_specification(ntt), input) throws an error
  • ntt.transform_spec.transform_fn returns a NoThrowResult{Missing} when applied to the interpreted input,
  • ntt.transform_spec.transform_fn errors when applied to the interpreted input, or
  • the output generated by ntt.transform_spec.transform_fn is not a Union{NoThrowResult{Missing},output_specification(ntt)}

In any of these failure cases, this function will not throw, but instead will return the cause of failure in the output violations field.

If verbose_violations=true, then much more verbose violation strings will be generated in the case of unexpected violations (including full stacktraces).

Note

For debugging purposes, it may be helpful to bypass the "no-throw" feature and so as to have access to a callstack. To do this, use transform_force_throw! in place of transform!.

See also: convert_spec

source

NoThrowDAG

NoThrowDAGs are a way to compose multiple specified transforms (DAGStep) into a DAG, such that any errors errors encountered during the application of the DAG will be returned as a NoThrowResult rather than thrown as an exception.

Debugging tips

To debug the source of a returned violation from a NoThrowDAG, call transform_force_throw! on it instead of transform!. Errors (and their stack traces) will be thrown directly, rather than returned nicely as NoThrowResults. Alternatively/additionally, create your DAG from a subset of its constituent steps. Bisecting the full DAG chain can help zero in on errors in DAG construction: e.g., transform!(NoThrowDAG(steps[1:4]), input), etc.

TransformSpecifications.DAGStepType
DAGStep

Helper struct, used to construct NoThrowDAGs. Requires fields

  • name::String: Name of step, must be unique across a constructed DAG
  • input_assembler::TransformSpecification: Transform used to construct step's input; see input_assembler for details.
  • transform_spec::AbstractTransformSpecification: Transform applied by step
source
TransformSpecifications.NoThrowDAGType
NoThrowDAG <: AbstractTransformSpecification
NoThrowDAG(steps::AbstractVector{DAGStep})

Transform specification constructed from a DAG of transform specification nodes (steps), such that calling transform! on the DAG iterates through the steps, first constructing that step's input from all preceding upstream step outputs and then appling that step's own transform to the constructed input.

The DAG's input_specification is that of the first step in the DAG; its output_specification is that of the last step. As the first step's input is by definition the same as the overall input to the DAG, its step.input_assembler must be nothing.

DAG construction tip

As the input to the DAG at is by definition the input to the first step in that DAG, only the first step will have access to the input directly passed in by the caller. To grant access to this top-level input to downstream tasks, construct the DAG with an initial step that is an identity transform, i.e., is_identity_no_throw_transform(first(steps)) returns true. Downstream steps can then depend on the output of specific fields from this initial step. The single argument TransformSpecification constructor creates such an identity transform.

DAG construction warning

It is the caller's responsibility to implement a DAG, and to not introduce any recursion or cycles. What will happen if you do? To quote Tom Lehrer, "well, you ask a silly question, you get a silly answer!"

Storage of intermediate values

The output of each step in the DAG is stored locally in memory for the entire lifetime of the transform operation, whether or not it is actually accessed by any later steps. Large intermediate outputs may result in unexpected memory pressure relative to function composition or even local evaluation (since they are not visible to the garbage collector).

Fields

The following fields are constructed automatically when constructing a NoThrowDAG from a vector of DAGSteps:

  • step_transforms::OrderedDict{String,AbstractTransformSpecification}: Ordered dictionary of processing steps
  • step_input_assemblers::Dict{String,TransformSpecification}: Dictionary with functions for constructing the input for each key in step_transforms as a function that takes in a Dict{String,NoThrowResult} of all upstream step_transforms results.
  • _step_output_fields::Dict{String,Dict{Symbol,Any}}: Internal mapping of upstream step outputs to downstream inputs, used to e.g. valdiate that the input to each step can be constructed from the outputs of the upstream steps.

Example

using Legolas: @schema, @version

@schema "example-one-var" ExampleOneVarSchema
@version ExampleOneVarSchemaV1 begin
    var::String
end

@schema "example-two-var" ExampleTwoVarSchema
@version ExampleTwoVarSchemaV1 begin
    var1::String
    var2::String
end

# Say we have three functions we want to chain together:
fn_a(x) = ExampleOneVarSchemaV1(; var=x.var * "_a")
fn_b(x) = ExampleOneVarSchemaV1(; var=x.var * "_b")
fn_c(x) = ExampleOneVarSchemaV1(; var=x.var1 * x.var2 * "_c")

# First, specify these functions as transforms: what is the specification of the
# function's input and output?
step_a_transform = NoThrowTransform(ExampleOneVarSchemaV1, ExampleOneVarSchemaV1, fn_a)
step_b_transform = NoThrowTransform(ExampleOneVarSchemaV1, ExampleOneVarSchemaV1, fn_b)
step_c_transform = NoThrowTransform(ExampleTwoVarSchemaV1, ExampleOneVarSchemaV1, fn_c)

# Next, set up the DAG between the upstream outputs into each step's input:
step_b_assembler = input_assembler(upstream -> (; var=upstream["step_a"][:var]))
step_c_assembler = input_assembler(upstream -> (; var1=upstream["step_a"][:var],
                                                var2=upstream["step_b"][:var]))
# ...note that step_a is skipped, as there are no steps upstream from it.

steps = [DAGStep("step_a", nothing, step_a_transform),
         DAGStep("step_b", step_b_assembler, step_b_transform),
         DAGStep("step_c", step_c_assembler, step_c_transform)]
dag = NoThrowDAG(steps)

# output
NoThrowDAG (ExampleOneVarSchemaV1 => ExampleOneVarSchemaV1):
  🌱  step_a: ExampleOneVarSchemaV1 => ExampleOneVarSchemaV1: `fn_a`
   ·  step_b: ExampleOneVarSchemaV1 => ExampleOneVarSchemaV1: `fn_b`
  🌷  step_c: ExampleTwoVarSchemaV1 => ExampleOneVarSchemaV1: `fn_c`

This DAG can then be applied to an input, just like a regular TransformSpecification can:

input = ExampleOneVarSchemaV1(; var="initial_str")
transform!(dag, input)

# output
NoThrowResult{ExampleOneVarSchemaV1}: Transform succeeded
  ✅ result:
ExampleOneVarSchemaV1: (var = "initial_str_ainitial_str_a_b_c",)

Similarly, this transform will fail if the input specification is violated–-but because it returns a NoThrowResult, it will fail gracefully:

# What is the input specification?
input_specification(dag)

# output
ExampleOneVarSchemaV1
transform!(dag, ExampleTwoVarSchemaV1(; var1="wrong", var2="input schema"))

# output
NoThrowResult{Missing}: Transform failed
  ❌ Input to step `step_a` doesn't conform to specification `ExampleOneVarSchemaV1`. Details: ArgumentError("Invalid value set for field `var`, expected String, got a value of type Missing (missing)")

To visualize this DAG, you may want to generate a plot via mermaid, which is a markdown-like plotting language that is rendered automatically via GitHub and various other platforms. To create a mermaid plot of a DAG, use mermaidify:

mermaid_str = mermaidify(dag)

# No need to dump full output string here, but let's check that the results are
# the same as in our generated ouptut test, so that we know that the rendered graph
# in the documentation stays synced with the code.
print(mermaid_str)

# output
flowchart

%% Define steps (nodes)
subgraph OUTERLEVEL["` `"]
direction LR
subgraph STEP_A["Step a"]
  direction TB
  subgraph STEP_A_InputSchema["Input: ExampleOneVarSchemaV1"]
    direction RL
    STEP_A_InputSchemavar{{"var::String"}}
    class STEP_A_InputSchemavar classSpecField
  end
  subgraph STEP_A_OutputSchema["Output: ExampleOneVarSchemaV1"]
    direction RL
    STEP_A_OutputSchemavar{{"var::String"}}
    class STEP_A_OutputSchemavar classSpecField
  end
  STEP_A_InputSchema:::classSpec -- fn_a --> STEP_A_OutputSchema:::classSpec
end
subgraph STEP_B["Step b"]
  direction TB
  subgraph STEP_B_InputSchema["Input: ExampleOneVarSchemaV1"]
    direction RL
    STEP_B_InputSchemavar{{"var::String"}}
    class STEP_B_InputSchemavar classSpecField
  end
  subgraph STEP_B_OutputSchema["Output: ExampleOneVarSchemaV1"]
    direction RL
    STEP_B_OutputSchemavar{{"var::String"}}
    class STEP_B_OutputSchemavar classSpecField
  end
  STEP_B_InputSchema:::classSpec -- fn_b --> STEP_B_OutputSchema:::classSpec
end
subgraph STEP_C["Step c"]
  direction TB
  subgraph STEP_C_InputSchema["Input: ExampleTwoVarSchemaV1"]
    direction RL
    STEP_C_InputSchemavar1{{"var1::String"}}
    class STEP_C_InputSchemavar1 classSpecField
    STEP_C_InputSchemavar2{{"var2::String"}}
    class STEP_C_InputSchemavar2 classSpecField
  end
  subgraph STEP_C_OutputSchema["Output: ExampleOneVarSchemaV1"]
    direction RL
    STEP_C_OutputSchemavar{{"var::String"}}
    class STEP_C_OutputSchemavar classSpecField
  end
  STEP_C_InputSchema:::classSpec -- fn_c --> STEP_C_OutputSchema:::classSpec
end

%% Link steps (edges)
STEP_A:::classStep -..-> STEP_B:::classStep
STEP_B:::classStep -..-> STEP_C:::classStep

end
OUTERLEVEL:::classOuter ~~~ OUTERLEVEL:::classOuter

%% Styling definitions
classDef classOuter fill:#cbd7e2,stroke:#000,stroke-width:0px;
classDef classStep fill:#eeedff,stroke:#000,stroke-width:2px;
classDef classSpec fill:#f8f7ff,stroke:#000,stroke-width:1px;
classDef classSpecField fill:#fff,stroke:#000,stroke-width:1px;

See this rendered plot in the built documentation.

To display a mermaid plot via e.g. Documenter.jl, additional setup will be required.

source
TransformSpecifications.get_stepMethod
get_step(dag::NoThrowDAG, name::String) -> DAGStep
get_step(dag::NoThrowDAG, step_index::Int) -> DAGStep

Return DAGStep with name or step_index.

source
TransformSpecifications.input_assemblerMethod
input_assembler(conversion_fn) -> TransformSpecification{Dict{String,Any}, NamedTuple}

Special transform used to convert the outputs of upstream steps in a NoThrowDAG into a NamedTuple that can be converted into that type's input specification.

conversion_fn must be a function that

  • takes as input a Dictionary with keys that are the names of upstream steps, where the value of each of these keys is the output of that upstreamstep, as specified by `outputspecification(upstream_step)`.
  • returns a NamedTuple that can be converted, via convert_spec, to the specification of an AbstractTransformSpecification that it is paired with in a DAGStep.

Note that the current implementation is a stopgap for a better-defined implementation defined in https://github.com/beacon-biosignals/TransformSpecifications.jl/issues/8

source
TransformSpecifications.transform!Method
transform!(dag::NoThrowDAG, input; verbose_violations=false)

Return NoThrowResult of sequentially transform!ing all dag.step_transforms, after passing input to the first step.

Before each step, that step's input_assembler is called on the results of all previous processing steps; this constructor generates input that conforms to the step's input_specification.

If verbose_violations=true, then much more verbose violation strings will be generated in the case of unexpected violations (including full stacktraces).

See also: transform_force_throw!

source
TransformSpecifications.transform_force_throw!Method
transform_force_throw!(dag::NoThrowDAG, input)

Utility for debugging NoThrowDAGs by consecutively applying transform!(step, input) on each step, such that the output of each step is of type output_specification(step.transform_spec) rather than a NoThrowResult, and any failure will result in throwing an error.

source

Plotting NoThrowDAGs

Here is the mermaid plot generated for the example DAG in NoThrowDAG:

flowchart %% Define steps (nodes) subgraph OUTERLEVEL["` `"] direction LR subgraph STEP_A["Step a"] direction TB subgraph STEP_A_InputSchema["Input: ExampleOneVarSchemaV1"] direction RL STEP_A_InputSchemavar{{"var::String"}} class STEP_A_InputSchemavar classSpecField end subgraph STEP_A_OutputSchema["Output: ExampleOneVarSchemaV1"] direction RL STEP_A_OutputSchemavar{{"var::String"}} class STEP_A_OutputSchemavar classSpecField end STEP_A_InputSchema:::classSpec -- fn_a --> STEP_A_OutputSchema:::classSpec end subgraph STEP_B["Step b"] direction TB subgraph STEP_B_InputSchema["Input: ExampleOneVarSchemaV1"] direction RL STEP_B_InputSchemavar{{"var::String"}} class STEP_B_InputSchemavar classSpecField end subgraph STEP_B_OutputSchema["Output: ExampleOneVarSchemaV1"] direction RL STEP_B_OutputSchemavar{{"var::String"}} class STEP_B_OutputSchemavar classSpecField end STEP_B_InputSchema:::classSpec -- fn_b --> STEP_B_OutputSchema:::classSpec end subgraph STEP_C["Step c"] direction TB subgraph STEP_C_InputSchema["Input: ExampleTwoVarSchemaV1"] direction RL STEP_C_InputSchemavar1{{"var1::String"}} class STEP_C_InputSchemavar1 classSpecField STEP_C_InputSchemavar2{{"var2::String"}} class STEP_C_InputSchemavar2 classSpecField end subgraph STEP_C_OutputSchema["Output: ExampleOneVarSchemaV1"] direction RL STEP_C_OutputSchemavar{{"var::String"}} class STEP_C_OutputSchemavar classSpecField end STEP_C_InputSchema:::classSpec -- fn_c --> STEP_C_OutputSchema:::classSpec end %% Link steps (edges) STEP_A:::classStep -..-> STEP_B:::classStep STEP_B:::classStep -..-> STEP_C:::classStep end OUTERLEVEL:::classOuter ~~~ OUTERLEVEL:::classOuter %% Styling definitions classDef classOuter fill:#cbd7e2,stroke:#000,stroke-width:0px; classDef classStep fill:#eeedff,stroke:#000,stroke-width:2px; classDef classSpec fill:#f8f7ff,stroke:#000,stroke-width:1px; classDef classSpecField fill:#fff,stroke:#000,stroke-width:1px;
TransformSpecifications.mermaidifyMethod
mermaidify(dag::NoThrowDAG; direction="LR",
           style_step="fill:#eeedff,stroke:#000,stroke-width:2px;",
           style_spec="fill:#f8f7ff,stroke:#000,stroke-width:1px;",
           style_outer="fill:#cbd7e2,stroke:#000,stroke-width:0px;",
           style_spec_field="fill:#fff,stroke:#000,stroke-width:1px;")

Generate mermaid plot of dag, suitable for inclusion in markdown documentation.

Args:

  • direction: option that specifies the orientation/flow of the dag's steps; most useful options for dag plotting are LR (left to right) or TB (top to bottom); see the mermaid documentation for full list of options.
  • style_step: styling of the box containing an individual dag step (node)
  • style_spec: styling of the boxes containing the input and output specifications for each step
  • style_outer: styling of the box bounding the entire DAG
  • style_spec_field: styling of the boxes bounding each specification's individual field(s)

For each style kwarg, see the mermaid documentation for style string options.

To include in markdown, do

```mermaid
{{mermaidify output}}
```

or for html (i.e., for Documenter.jl), do

<div class="mermaid">
{{mermaidify output}}
</div>

For an example of the raw output, see NoThrowDAG; for an example of the rendered output, see the built documentation.

source

TransformSpecifications interface

TransformSpecifications provides a general interface which allows the creation of new subtypes of AbstractTransformSpecification that can be used to implement transformation.

New transformation types must subtype AbstractTransformSpecification, and implement the following required methods.

Required interface type

TransformSpecifications.AbstractTransformSpecificationType
abstract type AbstractTransformSpecification

Transform specifications are represented by subtypes of AbstractTransformSpecification. Each leaf should be immutable and define methods for

  • input_specification returns type expected/allowed as transform input
  • output_specification returns output type generated by successfully completed processing
  • transform!, which transforms an input of type input_specification and returns an output of type output_specification.

It may additionally define a custom non-mutating transform function.

source

Required interface methods

TransformSpecifications.transform!Function
transform!(ts::AbstractTransformSpecification, input)

Return result of applying ts to an input of type input_specification(ts), where result is an output_specification(ts). May mutate input.

See also: transform

source

Other interface methods

These methods have reasonable fallback definitions and should only be defined for new types if there is some reason to prefer a custom implementation over the default fallback.

TransformSpecifications.transformFunction
transform(ts::AbstractTransformSpecification, input)

Return result of applying ts to an input of type input_specification(ts), where result is an output_specification(ts). May not mutate input.

See also: transform!

source