Upgrading from Legolas v0.4 to v0.5

This guide is incomplete; please add to it if you encounter items which would help other upgraders along their journey.

See here for a comprehensive log of changes from Legolas v0.4 to Legolas v0.5.

Some main changes to be aware of

  • In Legolas v0.4, every Legolas.Row field's type was available as a type parameter of Legolas.Row; for example, the type of a field y specified as y::Real in a Legolas.@row declaration would be surfaced like Legolas.Row{..., NamedTuple{(...,:y,...),Tuple{...,typeof(y),...}}. In Legolas v0.5, the schema version author controls which fields have their types surfaced as type parameters in Legolas-generated record types via the field::(<:F) syntax in Legolas.@version.
    • Additionally, to include type parameters associated to fields in a parent schema, they must be re-declared in the child schema. For example, the package LegolasFlux declares a ModelV1 version with a field weights::(<:Union{Missing,Weights}). LegolasFlux includes an example with a schema extension DigitsRowV1 which extends ModelV1. This @version call must re-declare the field weights to be parametric in order for the DigitsRowV1 struct to also have a type parameter for this field.
  • In Legolas v0.4, @row-generated Legolas.Row constructors accepted and propagated any non-schema-declared fields provided by the caller. In Legolas v0.5, @version-generated record type constructors will discard any non-schema-declared fields provided by the caller. When upgrading code that formerly "implicitly extended" a given schema version by propagating non-declared fields, it is advisable to instead explicitly declare a new extension of the schema version to capture the propagated fields as declared fields; or, if it makes more sense for a given use case, one may instead define a new schema version that adds these propagated fields as declared fields directly to the schema (likely declared as ::Union{Missing,T} to allow them to be missing).
  • Before Legolas v0.5, the documented guidance for schema authors surrounding new fields' impact on schema version breakage was misleading, implying that adding a new declared field to an existing schema version is non-breaking if the field's type allowed for Missing values. This is incorrect. For clarity, adding a new declared field to an existing schema version is a breaking change unless the field's type and value are both completely unconstrained in the declaration, i.e. the field's type constraint must be ::Any and may not feature a value-constraining or value-transforming assignment expression.

Deserializing old tables with Legolas v0.5

Generally, tables serialized with earlier versions of Legolas can be de-serialized with Legolas v0.5, making it only a "code-breaking" change, rather than a "data-breaking" change. However, it is strongly suggested to have reference tests with checked in (pre-Legolas v0.5) serialized tables which are deserialized and verified during the tests, in order to be sure.

Additionally, serialized Arrow tables containing nested Legolas-v0.4-defined Legolas.Row values (i.e. a table that contains a row that has a field that is, itself, a Legolas.Row value, or contains such values) require special handling to deserialize under Legolas v0.5, if you wish users to be able to deserialize them with Legolas.read using the Legolas-v0.5-ready version of your package. Note that these tables are still deserializable as plain Arrow tables regardless, so it may not be worthwhile to provide a bespoke deprecation/compatibility pathway in the Legolas-v0.5-ready version package unless your use case merits it (i.e. the impact surface would be high for your package's users).

If you would like to provide such a pathway, though:

Recall that under Legolas v0.4, @row-generated Legolas.Row constructors may accept and propagate arbitrary non-schema-declared fields, whereas Legolas v0.5's @version-generated record types may only contain schema-declared fields. Therefore, one must decide what to do with any non-declared fields present in serialized Legolas.Row values upon deserialization. A common approach is to implement a deprecation/compatibility pathway within the relevant surrounding @version declaration. For example, this LegolasFlux example uses a function compat_config to handle old Legolas.Row values, but does not add any handling for non-declared fields, which will be discarded if present. If one did not want non-declared fields to be discarded, these fields could be handled by throwing an error or warning, or defining a schema version extension that captured them, or defining a new version of the relevant schema to capture them (e.g. adding a field like extras::Union{Missing, NamedTuple}).