May 24th, 2024 Building a Flexible Type System By Jeffrey M. Barber

When I started building Adama, I decided to simply focus on making maximal forward progress. With respect to the type system, this has been a beautiful mistake. I have my work cut out for me in improving the type system to make it better, but I’m happy where things are at the moment in terms of being useful. I have a very limited form of type inference in play, and I want to cover one non-trivial facet of it today: type unification.

Now, the beauty of writing a programming language is that you have this opportunity to make nice things (or, at least, try). One of the things I like about many dynamic languages is the ability to make an inline or anonymous object. It’s super convenient to describe a piece of data in a compact way without ceremony. Thus, I wanted this in Adama.

public formula some_object = {name:"John Jones"};

In dynamic languages, the type of some_object is simply an object as objects default to associative maps. Since Adama is strongly statically typed and translates to Java, I didn’t want objects to simply be a proxy to a map. Instead, I want the type of some_object to be a class where the object fields map to class fields. This introduces a problem: I need to invent names for anonymous objects. So, some_object has a class name of RTxAnonymousObject1 or something. This introduces another problem: deduplicating classes as the naive approach for:

public formula some_object1 = {name:"John Jones"};
public formula some_object2 = {name:"James John"};

will produce two class names: RTxAnonymousObject1 and RTxAnonymousObject2. Beyond blowing up the bytecode and cache locality, this requires solving finding an existing class based on the object field types.

Boss Fight

The boss fight for this complexity appeared with arrays.

public formula many_objects = [{name:"John Jones"}, {name:"James John"}];

This forced me to realize that I needed a two pass algorithm to join two types such that the type of many_objects is like RTxAnonymousObject1[]. This is a feature at the level of expressions within Adama via the SupportsTwoPhaseTyping interface. The first phase is to estimate a type which asks the expression “Hey, given this type, please estimate a compatible type or type yourself in isolation”. In the case of many_objects, {name:"John Jones"} is going to type as RTxAnonymousObject1 and {name:"John Jones"} as RTxAnonymousObject2. Care must be taken during estimation to be read-only as estimation is going to create intermediate types (potentially many) which need to be thrown away.

After all the types have been estimates, they are joined into a maximum type. In our previous example, it will pick RTxAnonymousObject1 as the common type. However, we can look at a more complex case:

public formula complex_many_objects = [{x:1}, {y:2}];

Each of the objects ({x:1} and {y:2}) will type as different classes. However, when we compute the maximum type, an entirely new class is created to look like this:

message MaxUnion1 {
  maybe<int> x;
  maybe<iny> y;
}

This is the final type. Once this is computed, the second phase is evaluated which is “upgrade type”. In the case of objects, the anonymous objects will introduce default values for y and x. This will unify the objects such that the final type is RTxMaxUnion1[] in Java.

Fun Edge Case

Arrays were the annoying case, but there is also the ternary operation:

public bool picker;
public formula xyz = picker ? {x:1} : (({y:2}));

Broken shit

The two phase typing only works with a few expressions, so a new expression needs to be introduced to convert a type. For example, this is currently broken

message JustX { int x; }

function jx(int x) -> JustX {
  return {x:1};
}

public formula xyz = [jx(1),{y:2}];

Fixing this is a reasonable project for a savage hire, so let me know if you want to part of cleaning up a type system.