A Weird Imagination

Devlog: Factorio reflection mod

The problem#

When developing the "garbage collector" for the Pacifist mod, I noted that I couldn't actually know which strings should be treated as references. As a workaround, I just assumed all strings were references, which worked well enough, but I wondered if there was a way to get more precise type information. Additionally, when I was trying to figure this out, the developer of exfret's randomizer expressed interest in getting access to such information for that mod.

The solution#

The Factorio documentation includes a machine-readable version of the prototype API documentation. My ReflectionLibrary mod provides access to that information from within a Factorio mod, effectively faking a reflection API for type information on data.raw during the Prototype Stage:

local data_raw = ReflectionLibraryMod.typed_data_raw

local bb = data_raw['blueprint-book']['blueprint-book']
log(bb.inventory_size._type.typeKind)  # prints "literal"
bb.inventory_size = 42
log(bb.inventory_size._type.typeKind)  # prints "alias"
log(bb.inventory_size._type.name)   # prints "ItemStackIndex"

The details#

Getting the machine-readable prototype API into the mod#

As discussed in my previous post, I wrote a script convert-to-lua.sh that downloads prototype-api.json from the Factorio website and turns it into a file prototype-api.lua which assigns that JSON data as a Lua table to a global, ReflectionLibraryMod.prototype_api.

How to know if I'm interpreting it right?#

Getting the data was the easy part. Now I had to actually interpret it correctly. My basic strategy was to write a type-checker that asserted that data.raw as generated by the base mod satisfied my interpretation of the prototype API. That is, all of the required properties are present and all of the properties contain values of the expected types, recursively. Since the base-game data is definitely correct, if that type check failed, that would almost certainly indicate a bug in my type checker.

(Awkwardly, my type-checker initially had a bug that took me a bit to notice that meant it was barely checking anything since reading a Lua dictionary as an array almost always results in an empty array, so it was "passing" a trivial type check of analyzing zero objects.)

Resolving types#

The core of the type checker is in the ReflectionLibrary mod's functions.lua in the resolve_type() function. That function takes a value and its type and returns its type, which is not as trivial as it sounds because the type passed in is the declared type and returned type is the actual concrete runtime type.

Easier cases#

For a concrete declared type (struct, array, dictionary, tuple, literal, or builtin), that involves checking that it actually is a valid value of that type (which usually involves recursively checking its properties) and basically returning the declared type back with a little extra information if the checks pass. Aliases are only slightly more complicated as they basically just recurse, keeping the alias name as it's important for identifying the semantics of builtin values like EntityID.

One detail that I missed at first is that even for types with no options like string, the code still has to always check the value really is of that type because otherwise unions containing that type will not get handled correctly. And it took me a few tries to get the check right on builtins.

Slightly more complicated is prototypes which have subtypes as seen in the inheritance tree in the docs, so any value declared as a prototype might actually be of one of its subtypes, but it's straightforward to figure out which one by reading the type property.

Union types#

The most complicated case is union types. Since a union type is a list of multiple possible types for a value, that means that a valid value of a union is valid for one of those types—and presumably invalid for all of others. Which introduces the complication that some type checks really will fail while processing valid values because they're type checks for the options of unions that aren't being used. Needless to say, debugging issues where a type-checker bug caused none of the types listed in a union to match got pretty confusing, especially when the value was nested inside multiple layers of unions.

A lot of types (e.g. CapsuleAction or NoiseExpression) are documented as distinguishing the union just by reading the type property and similarly NoiseFunctionApplication is documented as reading the function_name property, so the union resolution does special-case checking those first, but it can't rely on that entirely as other unions like Animation4Way don't have a type property.

Property checks#

Since there's a few different way an object can have multiple properties (a prototype, a struct, or a struct nested inside a union or array), the logic to handlet that is in a helper function resolve_struct_properties() that doesn't actually return anything, it just takes a value and type and attempts to resolve all the properties specified by the type and errors out if it fails on any of them (and the union logic catches this error).

Error handling#

As I got the type-checker closer to actually working, I had to adjust the error messages to be more precise to help track down less obvious bugs. Originally, the messages just said which section of data.raw it failed to process:

ERROR: Type check failed on data.raw.noise-expression

The first refinement was to show the specific value:

ERROR: Type check failed on data.raw.noise-expression.0_17-lakes-elevation

At this point, I was still reporting errors by having functions return nil on failure, mostly because I was new to Lua and didn't know to its exception handling worked. Once the issues were down to non-obvious edge cases, it was more important to know more information about the errors. But also, as mentioned above, understanding type-checking failures was difficult because the union logic means a value failing to type-check isn't always an error: it might just really not be that option of a union. But that means distinguishing between a bug and a correct type-checking failure inside a union can be really confusing.

I switched to error() to throw exceptions instead of returning nil, with the union code using pcall to catch the exceptions and try other options, only throwing itself (with all of the errors concatenated together) if all of the options fail. After I added error() in more places and added a tag to filter out for "errors" that only exist to filter union members (which was buggy so needed to be fixed in a later commit), the errors looked like

__ReflectionLibrary__/functions.lua:118: No options matched for union (10 options):
__ReflectionLibrary__/functions.lua:118: No options matched for union (34 options):
__ReflectionLibrary__/functions.lua:49: Tuple element 3 (table, .type=function-application) does not match expected type NoiseNumber. __ReflectionLibrary__/functions.lua:118: No options matched for union (38 options):
[...skipping 123(!) lines...]
__ReflectionLibrary__/functions.lua:118: No options matched for union (34 options):

Needless to say, over a hundred lines of that was pretty incomprehensible, so I also added a parameter that gets passed down to keep track of the full name of the value being type-checked so it can be displayed in errors, making it much easier to find the values in data.raw to examine them. Then the updated error looked like

__ReflectionLibrary__/functions.lua:132: No options matched for union (34 options) at data.raw["noise-expression"]["0_17-lakes-elevation"].expression.arguments[3].arguments[1].arguments[1].arguments[1].arguments[1].arguments[1]:
__ReflectionLibrary__/functions.lua:61: Tuple element 1 (table, .type=function-application) does not match expected type NoiseNumber in data.raw["noise-expression"]["0_17-lakes-elevation"].expression.arguments[3].arguments[1].arguments[1].arguments[1].arguments[1].arguments. __ReflectionLibrary__/functions.lua:132: No options matched for union (38 options) at data.raw["noise-expression"]["0_17-lakes-elevation"].expression.arguments[3].arguments[1].arguments[1].arguments[1].arguments[1].arguments[1]:
__ReflectionLibrary__/functions.lua:132: No options matched for union (34 options) at data.raw["noise-expression"]["0_17-lakes-elevation"].expression.arguments[3].arguments[1].arguments[1].arguments[1].arguments[1].arguments[1]:
[...skipping 123(!) lines...]
__ReflectionLibrary__/functions.lua:132: No options matched for union (34 options) at data.raw["noise-expression"]["0_17-lakes-elevation"].expression.arguments[3].arguments[1].arguments[1].arguments[1].arguments[1].arguments[1]:

That gave enough information to know where to look, and was readable enough for me to notice the bug mentioned above that was accidentally omitting some useful messages. Because of course, I need an even longer error message:

__ReflectionLibrary__/functions.lua:132: No options matched for union (10 options) at data.raw["noise-expression"]["0_17-lakes-elevation"].expression:
__ReflectionLibrary__/functions.lua:132: No options matched for union (34 options) at data.raw["noise-expression"]["0_17-lakes-elevation"].expression:
__ReflectionLibrary__/functions.lua:61: Tuple element 3 (table, .type=function-application) does not match expected type NoiseNumber in data.raw["noise-expression"]["0_17-lakes-elevation"].expression.arguments. __ReflectionLibrary__/functions.lua:132: No options matched for union (38 options) at data.raw["noise-expression"]["0_17-lakes-elevation"].expression.arguments[3]:
[...skipping 187(!!) lines...]
__ReflectionLibrary__/functions.lua:310: Value's .type property does not match expected value of array-construction (was variable) in data.raw["noise-expression"]["0_17-lakes-elevation"].expression.arguments[3].arguments[1].arguments[1].arguments[1].arguments[1].arguments[1].arguments.points.

Although verbose, that final line in fact points out the exact relevant issue.

It's the docs that are wrong.#

My refinements to the error display eventually pointed me at a failure on

data.raw["noise-expression"]["0_17-lakes-elevation"]
  .expression.arguments[3].arguments[1].arguments[1]
  .arguments[1].arguments[1].arguments[1]
  .arguments.points

the many nested .arguments made it difficult to narrow down where the actual error was occurring since each one is a union type, but between adding the path to the error messages and filtering out the definitely uninteresting errors for unions, I was able to determine that specific property was causing problems due to containing a NoiseVariable but that functions's arguments.points was documented as being of type NoiseArrayConstruction (it has since been fixed to be a union that can be either type).

At that point I determined my type-checker was working well enough that it seemed to be able to find bugs in the documentation as opposed to bugs in itself. I carefully looked at the values and documentation and most of the other issues were fields that were documented as being optional under some condition in the human-readable text but not marked with the machine-readable optional flag. I decided I could express that slightly more precisely than the documentation format by adding optional_if and rest_optional_if to indicate a property may be nil if some other property is not nil, which is a common pattern.

I put all of the adjustments I made to the documentation in my mod's data.lua and also posted a bug report to the forums reporting the issues both so they'll get fixed in the actual documentation if they are mistakes on their end and so I could get feedback to confirm I had not made mistakes myself identifying those issues.

Next steps#

At this point, the code is functional. It can be used to inspect and edit data.raw. But it's a thown-together barely-working prototype. Some follow-ups I'd like to do:

  1. Streamline code and avoid recomputation. There's various things that are more complicated than they need to be because I wrote the code as I was figuring how they worked. For example, the prototype subtype resolution could be simplified by caching the type hierarchy information.
  2. Update to latest version of documentation. Some of the overrides can be removed for places the documentation has been fixed in response to my bug report.
  3. Improve the API. The information is all available, but it isn't as easy to use as it could be. The example above uses the metatable-based API I will cover in a future blog post, which is more ergonomic than having to call the functions in the ReflectionLibrary mod for every operation, but it still needs some work.

Comments

Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.