Upgrading from Legolas v0.4 to v0.5
This guide is incomplete; please add to it if you encounter items which would help other upgraders along their journey.
See here for a comprehensive log of changes from Legolas v0.4 to Legolas v0.5.
Some main changes to be aware of
- In Legolas v0.4, every
Legolas.Row
field's type was available as a type parameter ofLegolas.Row
; for example, the type of a fieldy
specified asy::Real
in aLegolas.@row
declaration would be surfaced likeLegolas.Row{..., NamedTuple{(...,:y,...),Tuple{...,typeof(y),...}}
. In Legolas v0.5, the schema version author controls which fields have their types surfaced as type parameters in Legolas-generated record types via thefield::(<:F)
syntax inLegolas.@version
.- Additionally, to include type parameters associated to fields in a parent schema, they must be re-declared in the child schema. For example, the package LegolasFlux declares a
ModelV1
version with a fieldweights::(<:Union{Missing,Weights})
. LegolasFlux includes an example with a schema extensionDigitsRowV1
which extendsModelV1
. This@version
call must re-declare the fieldweights
to be parametric in order for theDigitsRowV1
struct to also have a type parameter for this field.
- Additionally, to include type parameters associated to fields in a parent schema, they must be re-declared in the child schema. For example, the package LegolasFlux declares a
- In Legolas v0.4,
@row
-generatedLegolas.Row
constructors accepted and propagated any non-schema-declared fields provided by the caller. In Legolas v0.5,@version
-generated record type constructors will discard any non-schema-declared fields provided by the caller. When upgrading code that formerly "implicitly extended" a given schema version by propagating non-declared fields, it is advisable to instead explicitly declare a new extension of the schema version to capture the propagated fields as declared fields; or, if it makes more sense for a given use case, one may instead define a new schema version that adds these propagated fields as declared fields directly to the schema (likely declared as::Union{Missing,T}
to allow them to be missing). - Before Legolas v0.5, the documented guidance for schema authors surrounding new fields' impact on schema version breakage was misleading, implying that adding a new declared field to an existing schema version is non-breaking if the field's type allowed for
Missing
values. This is incorrect. For clarity, adding a new declared field to an existing schema version is a breaking change unless the field's type and value are both completely unconstrained in the declaration, i.e. the field's type constraint must be::Any
and may not feature a value-constraining or value-transforming assignment expression.
Deserializing old tables with Legolas v0.5
Generally, tables serialized with earlier versions of Legolas can be de-serialized with Legolas v0.5, making it only a "code-breaking" change, rather than a "data-breaking" change. However, it is strongly suggested to have reference tests with checked in (pre-Legolas v0.5) serialized tables which are deserialized and verified during the tests, in order to be sure.
Additionally, serialized Arrow tables containing nested Legolas-v0.4-defined Legolas.Row
values (i.e. a table that contains a row that has a field that is, itself, a Legolas.Row
value, or contains such values) require special handling to deserialize under Legolas v0.5, if you wish users to be able to deserialize them with Legolas.read
using the Legolas-v0.5-ready version of your package. Note that these tables are still deserializable as plain Arrow tables regardless, so it may not be worthwhile to provide a bespoke deprecation/compatibility pathway in the Legolas-v0.5-ready version package unless your use case merits it (i.e. the impact surface would be high for your package's users).
If you would like to provide such a pathway, though:
Recall that under Legolas v0.4, @row
-generated Legolas.Row
constructors may accept and propagate arbitrary non-schema-declared fields, whereas Legolas v0.5's @version
-generated record types may only contain schema-declared fields. Therefore, one must decide what to do with any non-declared fields present in serialized Legolas.Row
values upon deserialization. A common approach is to implement a deprecation/compatibility pathway within the relevant surrounding @version
declaration. For example, this LegolasFlux example uses a function compat_config
to handle old Legolas.Row
values, but does not add any handling for non-declared fields, which will be discarded if present. If one did not want non-declared fields to be discarded, these fields could be handled by throwing an error or warning, or defining a schema version extension that captured them, or defining a new version of the relevant schema to capture them (e.g. adding a field like extras::Union{Missing, NamedTuple}
).