How to Maintain Clean Core APIs for Research

Building a library for research and experiments is quite different from building other types of software. A key challenge is that, in research, abstractions and APIs are rarely set in stone: users may want to propose a slight variant or modification to literally ANYWHERE in the whole program, just because they have a new idea.

In deep learning libraries, these variants can be a different implementation of a layer, a change in optimization algorithm, or a small modification to the training logic, etc.

Designing and maintaining these "research APIs" is difficult thanks to how frequently users want to change their behaviors. Such changes are often implemented by simply adding features to the target API they want to modify, e.g. by adding a new flag to the API, or by adding a new abstraction that generalizes the target API towards the users’ use case.

However, when maintaining a generic, core library meant to be adopted by diverse use cases for a long term, the above approach does not scale and poses many problems (discussed more below).

This note lists a few principles when working with "research APIs" that should help answer:

  • How to maintain a clean set of core APIs in research libraries.
  • How library maintainers & users collaborate to achieve users’ diverse needs without complicating the core APIs.

Core does not aim to implement all use cases

Researchers' job is about doing things in new ways. Hence their needs are so diverse that a core library should not aim to include or implement features for all possible use cases. The library should aim to only include the most popular and standardized features (more on the criteria later).

Core should allow most features to be implemented out-of-core

For features not included in the core, ideally there should be a way for users to implement them out-of-core as extensions, without too much overhead / repetition.

This requires a continuous design evolution to make the core more modular and composable, so that core code can be reused in users’ new implementation.

A good sanity check for library maintainers is to ask the following question:

For any feature currently in the core library, suppose we remove it today, how much effort would it take for users to reimplement it out-of-core?

A well-designed library should be decoupled such that most of its features are just extensions of itself, and they can be implemented out-of-core the same way as it is in the core.

Criteria for feature inclusion

There are 3 criteria for feature inclusion in core, ordered by their importance.

  • Popularity: Whether the feature is used by many users
  • Standardization: Whether the feature’s API is standardized/agreed among its users
  • Simplicity: Whether the feature is simple to implement

To understand the criteria more, let’s ask: what if the feature is —

Popular but not standardized: sometimes a feature is popular, but its users don’t yet align on the proper parameterization, its API, or the subtle implementation details. Including such features is risky, as it may create unclear semantics or impede its standardization in the future. It’s still OK to include it if it’s very popular (popularity is the #1 most important criteria), but try to do it in a composable way and with warning signs.

As a negative example, "Transformer" is a popular but not standarized feature. It's included in Pytorch, but received many complaints, and many projects (e.g. fairseq, detr) eventually have to fork and reimplement their own Transformer.

Simple but not popular/standardized: Simplicity alone is not sufficient for inclusion, no matter how simple it is. Because if everyone adds a simple feature they need, together it becomes complex.

Popular, standardized but not simple: Simplicity is the #3 important factors. If something is complex but very popular & standardized (e.g. BatchNorm being a headache for DL library developers), it should be included. In fact this is where a library could provide a lot of value to users.

Concern of New Arguments

When a user wants to change the behavior of a "research API" def func() defined in core, adding new arguments is often the quickest way to get things done. But it may introduce a number of maintenance problems.

Simple Flags / Options

New flag New argument
def func(flag=False):
...
if flag:
Variant here
...
def func(multiplier_of_x=1.0):
...
x = x * multiplier_of_x
..

Adding a simple argument to control the behavior like above is OK, if we think that the new option is very clear and popular. But as a "research API", many users will want to add their own customizations. This could lead to the following problems:

  • Poor Code health: The library may gradually accumulates too many features that are:

    • Hard to read due to branching (as there are too many flags). Ideally, readers should not pay too much extra mental overhead for logic they don’t care about
    • Hard to maintain because the contextual knowledge about them is distributed among different developers
  • Confusing behaviors: More and more features added over time may not interact with each other in a clear way, causing confusing or silent wrong behaviors

    • E.g. featureA becomes a no-op when featureB is enabled
    • E.g. featureA and featureB are conflicting / overlapping in semantics
    • E.g. featureA’s semantics becomes undefined when featureB is enabled
  • "More general" may mean "less general": A common argument for adding options like this, is that it doesn't change existing behavior and "makes the function more general".

    However, keep in mind that when a function becomes more general in one aspect, it's often less general in other aspects. Generalizing towards one direction may not be a net win, because research code has too many possible directions to generalize towards, and picking one direction may affect its eligibility to pick others in the future. We will show what this means shortly.

Callbacks

New behaviors can also be encapsulated inside an argument:

Inject custom behaviors through callbacks: Use object.method as callbacks:
def func(callback):
...
callback(x)
...
def func(obj):
...
obj.do(x)
...

This appears useful, since the custom logic is not implemented in core, but in a user-provided callback. For example, given the original code below (left), a researcher who wants to compute y differently may propose a compute_y_fn argument like below (right).

Original: With callbacks:
def func():
a = something ...
x = something ...
y = x.norm()
z = ...
...
def func(compute_y_fn=lambda x: x.norm()):
a = something ...
x = something ...
y = compute_y_fn(x)
z = ...
...

However, this design may be even more problematic:

  • Premature abstractions: Assumptions/constraints are implicitly created about where the callback is triggered, what arguments it needs and what it returns. These assumptions may be bad.

    For example, a 2nd researcher may want to compute y using both x and a; a 3rd researcher may want to compute y, z in one function compute_y_z_fn because it's more efficient. These variants conflict with the 1st researcher's design.

    In the future, after seeing enough use cases, we might realize that a xyz = compute_xyz(a) is a truly good abstraction. However, at that time the premature abstraction of compute_y_fn will get in our way implementing compute_xyz. In other words, although the current design makes the computation of y "more general", the abstraction limits our ability to generalize the function in other ways. That's why we said earlier that "more general means less general".

  • Obscure logic: readers can't easily figure out what this function does: they need to look at the caller of this function to see which callback is supplied, and then look at the implementation of the callback function. The aforementioned issue of "confusing behaviors" also applies here.

Sometimes callbacks are good and useful abstractions. But because it is too powerful, I saw it frequently abused to alter a behavior into something that's strongly overfitted to a small number of use cases. In code reviews, I usually frown upon APIs that require callbacks/user-defined functions.

Prefer forks over new arguments

To customize a "research API" def func() defined in core, we have the following options:

  1. Out-of-core, e.g. a def func_v2() in user code. (Or a class ClassV2 for classes).
  2. In-core, but keep existing APIs unaffected, e.g. a def func_v2() in core.
  3. In-core, and change existing APIs, e.g. a new option in def func(option).

The best choice is heavily subjective and should be evaluated case-by-case. Due to the concern of new arguments, in general we recommend methods (1) and (2), i.e. prefer forking func() over changing func().

  • If a fork will create significant code duplication, choose (2) and try to reduce duplication with private abstractions (see next section).
  • Adding flags / simple args is acceptable for simple, clean, popular additions.
  • Adding callbacks / new abstraction requires scrutiny, and should come with more than a handful of use cases in mind.

This also echoes Flax design philosophy that says "prefer duplication over adding options / bad abstractions".

Accept duplication, but aim to reduce them later

Users/developers may find that the core design is not good enough yet, and recreating a variant of func() without touching it may lead to too much code duplication. For example, ... is duplicated between the two functions below.

Existing API in core New variant
def func():
...
...
def func_v2():
...
New custom logic in between.
...

Such duplication is acceptable for a short term. We do NOT mean to encourage users to heavily fork core code. Instead, users and core developers should engage and aim to evolve the core design to reduce duplication — but design change takes time to happen, and duplication is preferred before a good design is found.

How to reduce duplication

The most risk-free way to reduce duplications is by moving them into shared reusable code:

Existing API in core New variant
def func():
... # fewer duplications than before
_reusable_parts()
...
def func_v2():
... # fewer duplications than before
_reusable_parts()
Variant logic inserted here
...

This should be the preferred way to reduce duplications. The benefits are:

  • No change to the API of func(), hence little risk.
  • Create reusable sub-routines that may be useful to new use cases.

However, there are also challenges:

  • This adds a new API (_reusable_parts()) to maintain.
  • Sometimes it's difficult to identify a clean & reusable subset that can be easily split from the duplicated code. It may require small refactoring to expose a clean subset. Also, remember that the approach that reduces the most duplications might not be the one with the best abstraction.

The above challenges are less significant if _reusable_parts() is private. Therefore:

  • If func_v2() is in core, make _reusable_parts() private.
  • If func_v2() must be out-of-core, consider _reusable_parts() as "internal/experimental APIs".

Inheritance, e.g. class ModuleV2(ModuleCore) may also reduce duplication between two variants. However, this is generally less preferable than composition like above. The reason is similar to why callbacks are not preferred: overriding methods is like passing callbacks - they are both user-defined functions and suffer from the same limitations: users are constrained by the assumption of when/where/how the methods/callbacks are triggered.

Prefer branching at shallower code path

We generally prefer adding a new implementation over adding new conditional branches to the existing implementation, but branches probably will happen somewhere anyway – after all, the new feature variant probably ends up as a new option/argument in the end-users' config.

If branching has to happen, we prefer it at earlier, shallower code path:

Branch earlier Branch later
class Module():
def __call__(self, flag):
...
if flag:
func()
else:
func_v2()
...
def func(flag):
...
if flag:
...
class Module():
def __call__(self, flag):
...
func(flag)
...

By branching earlier, we keep a clean func() unaffected by the new variant. This recommendation is consistent with the preference to fork func_v2(), not to add flag to func().

Comments