TransformSpecifications.jl

Enabling structured transformations via defined I/O specifications.

Introduction & Overview

This package provides tools to define explicitly-specified transformation components. Such components can then be used to define pipelines that are themselves composed of individual explicitly-specified components, or facilitate distributed computation. One primary use-case is in creating explicitly defined pipelines that chain components together. These pipelines are in the form of directed acyclic graphs (DAGs), where each node of the graph is a component, and the edges correspond to data transfers between the components. The graph is "directed" since data flows in one direction (from the outputs of a component to the inputs of another), and "acyclic" since cycles are not allowed; one component cannot supply data to another which then supplies data back to the original component.

Later in the documentation, we will get into a lot more details about the tools that this package provides. But first, let us look at the high-level steps one follows to define such a pipeline using this package.

Define the inputs and outputs of each step. TransformSpecifications itself does not provide (nor require) specific types for defining inputs and outputs, but this is commonly implemented via Legolas.jl schemas.
Define functions that takes each set inputs to the corresponding outputs. For the purposes of setting up the pipeline, these can be placeholder functions that don't actually do anything, but once you want to run the pipeline, these will need to do whatever work is required in order to generate the outputs from the inputs. Again, this step is independent of any code in TransformSpecifications.jl itself.
Package up steps (1) and (2) into AbstractTransformSpecifications, like TransformSpecification and NoThrowTransform. These are the "components", the nodes of the graph.
Create input_assemblers for each component to route necessary outputs of previous components into the inputs of the component. This creates the edges of the graph.
Create a DAG using DAGStep or NoThrowDAG to assemble all of the components and assemblers into a DAG.
Use it! Apply the DAG to inputs using transform! or transform, and create a mermaid diagram using mermaidify.

With these general steps in mind, it can help to see some examples.

For example of all of these steps together, see NoThrowDAG.
For a basic concrete transform, see TransformSpecification
For transforms that catch exceptions and return them as formatted violations, see NoThrowTransform (and NoThrowResult).
For the abstract interface, see TransformSpecifications interface
For a compound transform that is itself a concrete AbstractTransformSpecification and is constructed from a DAG of AbstractTransformSpecifications, see NoThrowDAG
- For a plotted graph visualization of such a DAG, see Plotting NoThrowDAGs.

TransformSpecifications.jl
API
- Exported functions and types
- Non-exported functions and types

`TransformSpecification`

TransformSpecifications.TransformSpecification — Type

TransformSpecification{T<:Type,U<:Type} <: AbstractTransformSpecification

Basic component that specifies a transform that, when applied to input of type T, will return output of type U.

`NoThrowTransform`

NoThrowTransforms are a way to wrap a transform such that any errors encountered during the application of the transform will be returned as a NoThrowResult rather than thrown as an exception.

Debugging tip

To get the stack trace for a violation generated by a NoThrowTransform, call transform_force_throw! on it instead of transform!.

TransformSpecifications.NoThrowResult — Type

NoThrowResult(result::T, violations, warnings) where {T}
NoThrowResult(result; violations=String[], warnings=String[])
NoThrowResult(; result=missing, violations=String[], warnings=String[])

Type that specifies the result of a transformation, indicating successful application of a transform through presence (or lack thereof) of violations. Consists of either a non-missing result (success state) or non-empty violations and type Missing (failure state).

Note that constructing a NoThrowTransform from an input result of type NoThrowTransform, e.g., NoThrowTransform(::NoThrowTransform{T}, ...), collapses down to a singleNoThrowResult{T}; any inner and outer warnings and violations fields are concatenated and returned in the resultantNoThrowResult{T}`.

`NoThrowDAG`

NoThrowDAGs are a way to compose multiple specified transforms (DAGStep) into a DAG, such that any errors errors encountered during the application of the DAG will be returned as a NoThrowResult rather than thrown as an exception.

Debugging tips

To debug the source of a returned violation from a NoThrowDAG, call transform_force_throw! on it instead of transform!. Errors (and their stack traces) will be thrown directly, rather than returned nicely as NoThrowResults. Alternatively/additionally, create your DAG from a subset of its constituent steps. Bisecting the full DAG chain can help zero in on errors in DAG construction: e.g., transform!(NoThrowDAG(steps[1:4]), input), etc.

TransformSpecifications.DAGStep — Type

DAGStep

Helper struct, used to construct NoThrowDAGs. Requires fields

name::String: Name of step, must be unique across a constructed DAG
input_assembler::TransformSpecification: Transform used to construct step's input; see input_assembler for details.
transform_spec::AbstractTransformSpecification: Transform applied by step

source

TransformSpecifications.NoThrowDAG — Type

NoThrowDAG <: AbstractTransformSpecification
NoThrowDAG(steps::AbstractVector{DAGStep})

Transform specification constructed from a DAG of transform specification nodes (steps), such that calling transform! on the DAG iterates through the steps, first constructing that step's input from all preceding upstream step outputs and then appling that step's own transform to the constructed input.

The DAG's input_specification is that of the first step in the DAG; its output_specification is that of the last step. As the first step's input is by definition the same as the overall input to the DAG, its step.input_assembler must be nothing.

DAG construction tip

As the input to the DAG at is by definition the input to the first step in that DAG, only the first step will have access to the input directly passed in by the caller. To grant access to this top-level input to downstream tasks, construct the DAG with an initial step that is an identity transform, i.e., is_identity_no_throw_transform(first(steps)) returns true. Downstream steps can then depend on the output of specific fields from this initial step. The single argument TransformSpecification constructor creates such an identity transform.

DAG construction warning

It is the caller's responsibility to implement a DAG, and to not introduce any recursion or cycles. What will happen if you do? To quote Tom Lehrer, "well, you ask a silly question, you get a silly answer!"

Storage of intermediate values

The output of each step in the DAG is stored locally in memory for the entire lifetime of the transform operation, whether or not it is actually accessed by any later steps. Large intermediate outputs may result in unexpected memory pressure relative to function composition or even local evaluation (since they are not visible to the garbage collector).

Fields

The following fields are constructed automatically when constructing a NoThrowDAG from a vector of DAGSteps:

step_transforms::OrderedDict{String,AbstractTransformSpecification}: Ordered dictionary of processing steps
step_input_assemblers::Dict{String,TransformSpecification}: Dictionary with functions for constructing the input for each key in step_transforms as a function that takes in a Dict{String,NoThrowResult} of all upstream step_transforms results.
_step_output_fields::Dict{String,Dict{Symbol,Any}}: Internal mapping of upstream step outputs to downstream inputs, used to e.g. valdiate that the input to each step can be constructed from the outputs of the upstream steps.

Example

using Legolas: @schema, @version

@schema "example-one-var" ExampleOneVarSchema
@version ExampleOneVarSchemaV1 begin
    var::String
end

@schema "example-two-var" ExampleTwoVarSchema
@version ExampleTwoVarSchemaV1 begin
    var1::String
    var2::String
end

# Say we have three functions we want to chain together:
fn_a(x) = ExampleOneVarSchemaV1(; var=x.var * "_a")
fn_b(x) = ExampleOneVarSchemaV1(; var=x.var * "_b")
fn_c(x) = ExampleOneVarSchemaV1(; var=x.var1 * x.var2 * "_c")

# First, specify these functions as transforms: what is the specification of the
# function's input and output?
step_a_transform = NoThrowTransform(ExampleOneVarSchemaV1, ExampleOneVarSchemaV1, fn_a)
step_b_transform = NoThrowTransform(ExampleOneVarSchemaV1, ExampleOneVarSchemaV1, fn_b)
step_c_transform = NoThrowTransform(ExampleTwoVarSchemaV1, ExampleOneVarSchemaV1, fn_c)

# Next, set up the DAG between the upstream outputs into each step's input:
step_b_assembler = input_assembler(upstream -> (; var=upstream["step_a"][:var]))
step_c_assembler = input_assembler(upstream -> (; var1=upstream["step_a"][:var],
                                                var2=upstream["step_b"][:var]))
# ...note that step_a is skipped, as there are no steps upstream from it.

steps = [DAGStep("step_a", nothing, step_a_transform),
         DAGStep("step_b", step_b_assembler, step_b_transform),
         DAGStep("step_c", step_c_assembler, step_c_transform)]
dag = NoThrowDAG(steps)

# output
NoThrowDAG (ExampleOneVarSchemaV1 => ExampleOneVarSchemaV1):
  🌱  step_a: ExampleOneVarSchemaV1 => ExampleOneVarSchemaV1: `fn_a`
   ·  step_b: ExampleOneVarSchemaV1 => ExampleOneVarSchemaV1: `fn_b`
  🌷  step_c: ExampleTwoVarSchemaV1 => ExampleOneVarSchemaV1: `fn_c`

This DAG can then be applied to an input, just like a regular TransformSpecification can:

input = ExampleOneVarSchemaV1(; var="initial_str")
transform!(dag, input)

# output
NoThrowResult{ExampleOneVarSchemaV1}: Transform succeeded
  ✅ result:
ExampleOneVarSchemaV1: (var = "initial_str_ainitial_str_a_b_c",)

Similarly, this transform will fail if the input specification is violated–-but because it returns a NoThrowResult, it will fail gracefully:

# What is the input specification?
input_specification(dag)

# output
ExampleOneVarSchemaV1

transform!(dag, ExampleTwoVarSchemaV1(; var1="wrong", var2="input schema"))

# output
NoThrowResult{Missing}: Transform failed
  ❌ Input to step `step_a` doesn't conform to specification `ExampleOneVarSchemaV1`. Details: ArgumentError("Invalid value set for field `var`, expected String, got a value of type Missing (missing)")

To visualize this DAG, you may want to generate a plot via mermaid, which is a markdown-like plotting language that is rendered automatically via GitHub and various other platforms. To create a mermaid plot of a DAG, use mermaidify:

mermaid_str = mermaidify(dag)

# No need to dump full output string here, but let's check that the results are
# the same as in our generated ouptut test, so that we know that the rendered graph
# in the documentation stays synced with the code.
print(mermaid_str)

# output
flowchart

%% Define steps (nodes)
subgraph OUTERLEVEL["` `"]
direction LR
subgraph STEP_A["Step a"]
  direction TB
  subgraph STEP_A_InputSchema["Input: ExampleOneVarSchemaV1"]
    direction RL
    STEP_A_InputSchemavar{{"var::String"}}
    class STEP_A_InputSchemavar classSpecField
  end
  subgraph STEP_A_OutputSchema["Output: ExampleOneVarSchemaV1"]
    direction RL
    STEP_A_OutputSchemavar{{"var::String"}}
    class STEP_A_OutputSchemavar classSpecField
  end
  STEP_A_InputSchema:::classSpec -- fn_a --> STEP_A_OutputSchema:::classSpec
end
subgraph STEP_B["Step b"]
  direction TB
  subgraph STEP_B_InputSchema["Input: ExampleOneVarSchemaV1"]
    direction RL
    STEP_B_InputSchemavar{{"var::String"}}
    class STEP_B_InputSchemavar classSpecField
  end
  subgraph STEP_B_OutputSchema["Output: ExampleOneVarSchemaV1"]
    direction RL
    STEP_B_OutputSchemavar{{"var::String"}}
    class STEP_B_OutputSchemavar classSpecField
  end
  STEP_B_InputSchema:::classSpec -- fn_b --> STEP_B_OutputSchema:::classSpec
end
subgraph STEP_C["Step c"]
  direction TB
  subgraph STEP_C_InputSchema["Input: ExampleTwoVarSchemaV1"]
    direction RL
    STEP_C_InputSchemavar1{{"var1::String"}}
    class STEP_C_InputSchemavar1 classSpecField
    STEP_C_InputSchemavar2{{"var2::String"}}
    class STEP_C_InputSchemavar2 classSpecField
  end
  subgraph STEP_C_OutputSchema["Output: ExampleOneVarSchemaV1"]
    direction RL
    STEP_C_OutputSchemavar{{"var::String"}}
    class STEP_C_OutputSchemavar classSpecField
  end
  STEP_C_InputSchema:::classSpec -- fn_c --> STEP_C_OutputSchema:::classSpec
end

%% Link steps (edges)
STEP_A:::classStep -..-> STEP_B:::classStep
STEP_B:::classStep -..-> STEP_C:::classStep

end
OUTERLEVEL:::classOuter ~~~ OUTERLEVEL:::classOuter

%% Styling definitions
classDef classOuter fill:#cbd7e2,stroke:#000,stroke-width:0px;
classDef classStep fill:#eeedff,stroke:#000,stroke-width:2px;
classDef classSpec fill:#f8f7ff,stroke:#000,stroke-width:1px;
classDef classSpecField fill:#fff,stroke:#000,stroke-width:1px;

See this rendered plot in the built documentation.

To display a mermaid plot via e.g. Documenter.jl, additional setup will be required.

source

TransformSpecifications.get_step — Method

get_step(dag::NoThrowDAG, name::String) -> DAGStep
get_step(dag::NoThrowDAG, step_index::Int) -> DAGStep

Return DAGStep with name or step_index.

source

TransformSpecifications.input_assembler — Method

input_assembler(conversion_fn) -> TransformSpecification{Dict{String,Any}, NamedTuple}

Special transform used to convert the outputs of upstream steps in a NoThrowDAG into a NamedTuple that can be converted into that type's input specification.

conversion_fn must be a function that

takes as input a Dictionary with keys that are the names of upstream steps, where the value of each of these keys is the output of that upstreamstep, as specified by `outputspecification(upstream_step)`.
returns a NamedTuple that can be converted, via convert_spec, to the specification of an AbstractTransformSpecification that it is paired with in a DAGStep.

Note that the current implementation is a stopgap for a better-defined implementation defined in https://github.com/beacon-biosignals/TransformSpecifications.jl/issues/8

source

TransformSpecifications.input_specification — Method

input_specification(dag::NoThrowDAG)

Return input_specification of first step in dag, which is the input specification of the entire DAG.

source

TransformSpecifications.output_specification — Method

output_specification(dag::NoThrowDAG) -> Type{<:Legolas.AbstractRecord}

Return output_specification of last step in dag, which is the output specification of the entire DAG.

source

TransformSpecifications.transform! — Method

transform!(dag::NoThrowDAG, input; verbose_violations=false)

Return NoThrowResult of sequentially transform!ing all dag.step_transforms, after passing input to the first step.

Before each step, that step's input_assembler is called on the results of all previous processing steps; this constructor generates input that conforms to the step's input_specification.

If verbose_violations=true, then much more verbose violation strings will be generated in the case of unexpected violations (including full stacktraces).

Plotting `NoThrowDAG`s

Here is the mermaid plot generated for the example DAG in NoThrowDAG:

flowchart %% Define steps (nodes) subgraph OUTERLEVEL["` `"] direction LR subgraph STEP_A["Step a"] direction TB subgraph STEP_A_InputSchema["Input: ExampleOneVarSchemaV1"] direction RL STEP_A_InputSchemavar{{"var::String"}} class STEP_A_InputSchemavar classSpecField end subgraph STEP_A_OutputSchema["Output: ExampleOneVarSchemaV1"] direction RL STEP_A_OutputSchemavar{{"var::String"}} class STEP_A_OutputSchemavar classSpecField end STEP_A_InputSchema:::classSpec -- fn_a --> STEP_A_OutputSchema:::classSpec end subgraph STEP_B["Step b"] direction TB subgraph STEP_B_InputSchema["Input: ExampleOneVarSchemaV1"] direction RL STEP_B_InputSchemavar{{"var::String"}} class STEP_B_InputSchemavar classSpecField end subgraph STEP_B_OutputSchema["Output: ExampleOneVarSchemaV1"] direction RL STEP_B_OutputSchemavar{{"var::String"}} class STEP_B_OutputSchemavar classSpecField end STEP_B_InputSchema:::classSpec -- fn_b --> STEP_B_OutputSchema:::classSpec end subgraph STEP_C["Step c"] direction TB subgraph STEP_C_InputSchema["Input: ExampleTwoVarSchemaV1"] direction RL STEP_C_InputSchemavar1{{"var1::String"}} class STEP_C_InputSchemavar1 classSpecField STEP_C_InputSchemavar2{{"var2::String"}} class STEP_C_InputSchemavar2 classSpecField end subgraph STEP_C_OutputSchema["Output: ExampleOneVarSchemaV1"] direction RL STEP_C_OutputSchemavar{{"var::String"}} class STEP_C_OutputSchemavar classSpecField end STEP_C_InputSchema:::classSpec -- fn_c --> STEP_C_OutputSchema:::classSpec end %% Link steps (edges) STEP_A:::classStep -..-> STEP_B:::classStep STEP_B:::classStep -..-> STEP_C:::classStep end OUTERLEVEL:::classOuter ~~~ OUTERLEVEL:::classOuter %% Styling definitions classDef classOuter fill:#cbd7e2,stroke:#000,stroke-width:0px; classDef classStep fill:#eeedff,stroke:#000,stroke-width:2px; classDef classSpec fill:#f8f7ff,stroke:#000,stroke-width:1px; classDef classSpecField fill:#fff,stroke:#000,stroke-width:1px;

TransformSpecifications.mermaidify — Method

mermaidify(dag::NoThrowDAG; direction="LR",
           style_step="fill:#eeedff,stroke:#000,stroke-width:2px;",
           style_spec="fill:#f8f7ff,stroke:#000,stroke-width:1px;",
           style_outer="fill:#cbd7e2,stroke:#000,stroke-width:0px;",
           style_spec_field="fill:#fff,stroke:#000,stroke-width:1px;")

Generate mermaid plot of dag, suitable for inclusion in markdown documentation.

Args:

direction: option that specifies the orientation/flow of the dag's steps; most useful options for dag plotting are LR (left to right) or TB (top to bottom); see the mermaid documentation for full list of options.
style_step: styling of the box containing an individual dag step (node)
style_spec: styling of the boxes containing the input and output specifications for each step
style_outer: styling of the box bounding the entire DAG
style_spec_field: styling of the boxes bounding each specification's individual field(s)

For each style kwarg, see the mermaid documentation for style string options.

To include in markdown, do

```mermaid
{{mermaidify output}}
```

or for html (i.e., for Documenter.jl), do

<div class="mermaid">
{{mermaidify output}}
</div>

For an example of the raw output, see NoThrowDAG; for an example of the rendered output, see the built documentation.

source

TransformSpecifications interface

TransformSpecifications provides a general interface which allows the creation of new subtypes of AbstractTransformSpecification that can be used to implement transformation.

New transformation types must subtype AbstractTransformSpecification, and implement the following required methods.

Required interface type

TransformSpecifications.AbstractTransformSpecification — Type

abstract type AbstractTransformSpecification

Transform specifications are represented by subtypes of AbstractTransformSpecification. Each leaf should be immutable and define methods for

input_specification returns type expected/allowed as transform input
output_specification returns output type generated by successfully completed processing
transform!, which transforms an input of type input_specification and returns an output of type output_specification.

It may additionally define a custom non-mutating transform function.

source

Required interface methods

TransformSpecifications.transform! — Function

transform!(ts::AbstractTransformSpecification, input)

Return result of applying ts to an input of type input_specification(ts), where result is an output_specification(ts). May mutate input.

Other interface methods

These methods have reasonable fallback definitions and should only be defined for new types if there is some reason to prefer a custom implementation over the default fallback.

TransformSpecifications.transform — Function

transform(ts::AbstractTransformSpecification, input)

Return result of applying ts to an input of type input_specification(ts), where result is an output_specification(ts). May not mutate input.