Iteratively adding types to our multi-platform codebase

9 November 2023 (11 minute read)

development

At SimplyDo we maintain a relatively large and feature rich ecosystem of products and services, especially given the size of the engineering team. We are proud of the fact that everyone in the team has the knowledge and ability to touch any part of this system. In most situations this is great, it lets us work on features and resolve issues incredibly quickly, and since there is a huge amount of shared accountability we can check anyone else’s work before it is pushed to our clients.

However, there is a drawback to this: if we make changes in the stack or infrastructure it needs to be greenlit and understood by the entire team first. Through this we try to avoid “silo”-ing as much as possible, but as a consequence it is difficult for us to make sweeping changes or migrate to newer, shinier tech. This means we try to keep the knowledge required for the different platforms and libraries we use minimal, e.g., all our frontend platforms, namely the web, mobile and Microsoft Teams applications are all built with React (Native).

While this works great and maintains an overall similarity between projects, we ran into the problem that we still failed to make them slot together perfectly into one cohesive codebase. Mobile and Teams joined the web application as separate projects at some point along the expansion of SimplyDo’s feature suite. For the most part they were developed by one or maybe two engineers and don’t share a lot of code aside from some copied snippets.

Also, even though TypeScript had been around for a while, we still mostly used plain JavaScript for these packages. While everyone else was seemingly just adopting TypeScript as the new default it was difficult for us to prioritise it while keeping up with the rest of our work. As the Teams application became an interesting new focus during the heavy push to remote work during Covid, it turned out that it also came with a boilerplate template of TypeScript - this finally led us to consider adopting it more widely. After all, we had no reason not to use it and imagined that the type-safety would bring some more confidence, less bugs and a better development experience through IDE hints. Unfortunately this proved slightly more difficult than we initially thought…

Let the types commence

Our team loves innovation in technology and we always try to keep our product up to date as much as possible. Updating to the likes of Vite and pnpm has massively improved some of our core issues with previous tooling. We were all onboard with adopting TypeScript, but with thousands of files and hundreds of thousands lines of code this was not a change we could commit to all at once.

Doing so would require us to invest a significant amount of time into just rewriting all these files with oftentimes multiple components per file into TypeScript. We have yet to even fully move on from React class components, just because of how quickly the ecosystem as a whole is moving forward. There is also an aspect of not wanting to completely rewrite files with hundreds of lines of code all at once, since they have been working perfectly fine and it would just potentially risk introducing bugs.

This meant that we slowly started migrating components to TS as we touched the files for other work. The positive part of this was that we had time to figure out how to work with this new approach. We knew if we just migrated everything at once we would fall into all kinds of traps and anti-patterns, likely causing us having to redo a lot of the work again later. Thus we were able to slowly learn and improve along the way.

Unfortunately we didn’t escape this fate completely, as we didn’t really have a precise roadmap of how and where these types would be used and stored, we slowly accumulated an ever-growing type definition file. It became very hard to maintain an overview of what types already existed, in which files they were shared and how they affected the system overall. As some endpoints dynamically add or remove fields from the objects which are in our database, we often found ourselves writing either separate types in individual components or declaring fields as optional. This caused many head-scratch moments as we tried to figure out how, where and why types were located and fit together.

  // An example of patching a user response type

  type IUser = { [...] };

  type IUserMe = Omit<IUser, "groups" | "organisation"> & {
    organisation: [...]
    groups: [...]
  };

Something I have not mentioned yet but is important to add, our backend is not written in JavaScript or TypeScript, instead it is a Python Flask application. This means that we cannot easily share types between the frontend and backend. Any change to an API response required a change to the TS schemas which sometimes cascaded down to other components which weren’t supposed to be affected by the change. Our python code is also not typed, however we do use the schema package for runtime validation of payloads. This meant we had to make sure to manually synchronise request payloads in our TypeScript definitions with our Python schema definition whenever we made a change to either. Thus, adding and modifying anything was a chore. It felt like we were backing ourselves further and further into a corner while trying to maintain a sensible application.

Standardisation to the rescue

To combat these issues we were looking for a solution which would give us a strong source of truth for our database models plus route-based type definitions which could be exported into Python and TypeScript. Luckily, there is a well maintained, widely recognised standard with a large community: the OpenAPI specification. By adopting OpenAPI we were able to escape the lock-in of our technology stack and focus on just creating the specification of our schemas and types separately to mimic the desired functionality of the product.

This is the general layout of our openapi package:

  openapi
  ├─ index.yaml
  ├─ routes
  │  ├─ ideas.yaml
  │  ├─ users.yaml
  │  └─ [...]
  ├─ schemas
  │  ├─ assigns
  │  │  ├─ ideaAuthors.yaml
  │  │  └─ [...]
  │  ├─ ideas.yaml
  │  ├─ users.yaml
  │  └─ [...]

The main index file just imports all "schemas" and "routes". All you need to be aware of here is that "/" needs to be replaced by "~1" as per a quirk of OpenAPI’s specification language.

Excerpt of the index file:

 1# index.yaml
 2
 3components:
 4  schemas:
 5    Idea:
 6      $ref: './schemas/ideas.yaml'
 7    User:
 8      $ref: './schemas/users.yaml'
 9
10paths:
11  /ideas/{id}:
12    $ref: './routes/ideas.yaml#/~1{id}'

"schemas" reflect the data structure of our database as-is. This means that every file represents exactly one collection in our database, and with that all the possible types of the documents within. By creating these schemas as a base to our types, we can always be sure that we have a very well defined reflection of the data in our database. On top of this we can then build the actual API responses.

An example of such a file:
```
 1# schemas/ideas.yaml
 2
 3description: >
 4  User generated ideas  
 5type: object
 6required:
 7  - _id
 8properties:
 9  _id:
10    $ref: '../schemas/other/objectid.yaml'
11  user:
12    $ref: '../schemas/other/objectid.yaml'
13  [...]
```

"routes" are reflective of the paths in our API. For organisation purposes we use the first element in the path of a request as the filename. For example, "/ideas/{_id}" is hosted within "routes/ideas.yaml".

An example of what a route might look like:

 2  3 4  5  6  7  8  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

style="display:flex;"> 1# routes/ideas.yaml style="color:#ae81ff">/{id}: get: tags: - ideas summary: Get idea by id description: | 9      Get idea by id parameters: - name: id in: path description: The idea id required: true schema: $ref: '../schemas/other/objectid.yaml' responses: '200': description: OK content: application/json: schema: allOf: - $ref: '../schemas/ideas.yaml' - $ref: '../schemas/assigns/ideaAuthors.yaml'

As you can see this particular route returns the schema of an idea almost exactly as it exists in our database. The only addition to the base-model is an "assign" action, we use these for for situations where multiple endpoints enrich the same data at return time. We define these as separate files that act as mixins, this lets us add information to our return data without needing to muddle it with our strong database types.

One example of what such an assignment file may look like:

1# schemas/assigns/ideaAuthors.yaml
2
3type: object
4properties:
5  owner:
6    properties:
7      profile:
8        $ref: '../../schemas/users.yaml'

In this example, we add an "owner" to the idea which has the type of a user object. This is useful as it lets us attach a user’s profile instead of just the user’s ID itself.

Bringing it all together

Now the definition of the API with all its responses and every definition in our database is nice for documentation’s sake but does not actually help us in the code yet. To actually use them in TypeScript and Python requires an additional transformation.

First we build the entire definition into a single .json file using the openapi-generator-cli. This is useful for further processing in TS and Python. We also use this to host the docs using a very simple Express server and Swagger UI (swagger-ui-dist).

This single file is then exported as a TypeScript schema using openapi-typescript.

Right away we started replacing a lot of our manual type definitions with the auto generated ones. In most cases a simple drop in replacement does the trick.

1- import { IUser } from 'simplydo/schemas';
2+ import { OpenAPI } from 'simplydo/schemas';
3
4type IUserChipBase = {
5  - user: Partial<IUser>,
6  + user: Partial<OpenAPI.Schemas["User"]>,
7}

For more complex situations where we need the actual return type of specific routes, we added the following helpers. They allow us to extract the “success” response body directly as a type. Generally our API returns JSON when successful and an HTTP error code otherwise, so we don’t have to worry about extraction of error data.

 8type Paths = OpenAPI.paths;
 9
10type RawResponses<Path extends keyof Paths, Method> = Method extends keyof Paths[Path] ? Paths[Path][Method] extends {responses: infer Y} ? Y : never : never
11
12type Response<Path extends keyof Paths> = {
13  [method in keyof Paths[Path]]: 200 extends keyof RawResponses<Path, method> ? RawResponses<Path, method>[200] extends {content: {'application/json': infer Y}} ? Y : never : never
14}

This is further narrowed down to the individual response types. By just supplying the type with a path it automatically resolves to the correct payload.

13export type GET<Path extends keyof Paths> = Response<Path> extends {get: infer R} ? R : never;
14
15export type POST<Path extends keyof Paths> = Response<Path> extends {post: infer R} ? R : never;
16
17export type PUT<Path extends keyof Paths> = Response<Path> extends {put: infer R} ? R : never;
18
19export type DELETE<Path extends keyof Paths> = Response<Path> extends {delete: infer R} ? R : never;

We use this in all of those situations where our API responds with a custom payload instead of the actual database schema (which *hint* happens a lot). For example, to store the result of the "/ideas/{id}" call we can use this response type instead of just the base schema. This enables access to the "owner" object as assigned by the mixin we defined earlier:

const [idea, setIdea] = useState<Schemas<"Idea">>();
> idea.owner // x ts-error here

const [idea, setIdea] = useState<GET<"/ideas/{id}">>();
> idea.owner // ✓ this is now allowed

Finally, we also export the definition as a Python schema validator. We built a custom code generator for this, since most solutions out there create way more boilerplate code than we needed. It simply reads the tree-like structure of OpenAPI and exposes an "is_valid(payload, get_reason=False)" call to check any payload.

On the python side we can import and use our openapi package as follows:
```
65from openapi.validation import idea_validator
66
67is_valid = idea_validator.is_valid({
68    "name": name,
69})
70if not is_valid:
71    raise errors.APIException("Invalid request", status=400)
```

This approach allowed us to slowly implement types and thus more confidence into our everyday work. The only commitment we have is to always enforce OpenAPI schema changes for pull requests if and when parts of the API have changed. Apart from that, we can now work on the important parts of the product and let the types help us, without compromising huge chunks of time just to introduce them everywhere. We also don’t have to think about how we structure the types, as they already exist exactly in the format that our API actually responds with.

Hopefully this has provided some insight into how OpenAPI can be an incredibly useful tool to incrementally add typing to a large codebase such as ours.