How to Maintain Clean Core APIs for Research

Building a library for research and experiments is quite different from building other types of software. A key challenge is that, in research, abstractions and APIs are rarely set in stone: users may want to propose a slight variant or modification to literally ANYWHERE in the whole program, just because they have a new idea.

In deep learning libraries, these variants can be a different implementation of a layer, a change in optimization algorithm, or a small modification to the training logic, etc.

When users want to make such changes, they often implement variants by simply adding features to the target API they want to modify, e.g. by adding a new flag to the API plus some control statements, or by adding a new abstraction that generalizes the target API towards the users’ use case.

However, when maintaining a generic, core library meant to be adopted by diverse use cases for a long term, the above approach does not scale and poses many problems (discussed more below).

This document lists a few principles about:

  • How to maintain a clean set of core APIs in research libraries.
  • How library maintainers & users can work together to achieve users’ diverse needs without complicating the core APIs.

Core does not aim to implement all use cases

Researchers' job is about doing things in new ways. Hence their needs are so diverse that a core library should not aim to include or implement features for all possible use cases. The library should aim to only include the most popular and standardized features (more on the criteria later).

Core should allow most features to be implemented out-of-core

For features not included in the core, ideally there should be a way for users to implement them out-of-core as extensions, without too much overhead / repetition.

This requires a continuous design evolution to make the core more modular and composable, so that core code can be reused in users’ new implementation.

A good sanity check for library maintainers is to ask the following question: for any feature currently in the core library, suppose we remove it today, how much effort would it take for users to reimplement it out-of-core? A well-designed library should be decoupled such that most of its features are just extensions of itself, and they can be implemented out-of-core the same way as it is in the core.

Criteria for feature inclusion

There are 3 criteria for feature inclusion in core, ordered by their importance.

  • Popularity: Whether the feature is used by many users
  • Standardization: Whether the feature’s API is standardized/agreed among its users
  • Simplicity: Whether the feature is simple to implement

To understand the criteria more, let’s ask: what if the feature is —

Popular but not standardized: sometimes a feature is popular, but its users don’t yet align on the proper parameterization, its API, or the subtle implementation details. Including such features is risky, as it may create unclear semantics or impede its standardization in the future. It’s still OK to include it if it’s very popular (popularity is the #1 most important criteria), but try to do it in a composable way and with warning signs.

As a negative example, "Transformer" is a popular but not standarized feature. It's included in Pytorch, but received many complaints, and many projects (e.g. fairseq, detr) eventually have to fork and reimplement their own Transformer.

Simple but not popular/standardized: Simplicity alone is not sufficient for inclusion, no matter how simple it is. Because if everyone adds a simple feature they need, together it becomes complex.

Popular, standardized but not simple: Simplicity is the #3 important factors. If something is complex but very popular & standardized (e.g. BatchNorm being a headache for DL library developers), it should be included. In fact this is where a library could provide a lot of value to users.

Prefer new functions/classes over complicating existing APIs

Suppose a user wants to change the behavior of a function def func() defined in core. Based on assessment of the above 3 criteria, this new behavior may be determined to be implemented in one of the following ways:

  1. Out-of-core, e.g. a def func_v2() in user code. (Or a class ClassV2 for classes).
  2. In-core, but keep existing APIs unaffected, e.g. a def func_v2() in core.
  3. In-core, and change existing APIs, e.g. a new option in def func(option).

We recommend that methods (1) and (2), i.e. adding a separate implementation func_v2(), should generally be preferred over (3).

For features to be included in core, adding them like (3) is often a quick way to get the job done, but could lead to long-term issues. To show why, let’s look at the two typical ways options are added:

  1. New flags / arguments that control the behavior:

    New flag Other new argument
    def func_core(flag=False):
    if flag:
    Variant here
    func_v2 = partial(func_core, flag=True)
    def func_core(multiplier_of_x=1.0):
    x = x * multiplier_of_x

    This is OK if we determine the new option is very clear and popular. But be aware of the potential problems:

    • Poor Code health: The core may gradually accumulates too many features that are:
      • Hard to read due to branching. Ideally, readers should not pay too much extra mental overhead for logic they don’t care about
      • Hard to maintain because knowledge about them is distributed among different developers
      • Inconsistent in style/convention due to distributed responsibility
    • Confusing behaviors: More and more features added over time may not interact with each other in a clear way, causing confusing or silent wrong behaviors
      • E.g. featureA becomes a no-op when featureB is enabled
      • E.g. featureA and featureB are conflicting / overlapping in semantics
      • E.g. featureA’s semantics becomes undefined when featureB is enabled
  2. New logic encapsulated in new abstractions

    Adding an object to control behavior Adding a callback to control behavior
    def func_core(obj):
    def func_core(callback):

    This may appear nice, since the variant logic is not implemented in core, but in a user-provided obj or callback. However, it’s very easy to create premature abstractions this way.

    For example, the callback-based interface needs to make assumptions/constraints on where the callback is triggered, what arguments it needs and what it returns. A single use case may not be sufficient to make good assumptions on them.

    Sometimes callbacks are good and useful abstractions. But it is often abused to alter a behavior in existing code into something that's strongly overfitted to a small number of use cases. In code reviews, I often frown upon APIs that contain callbacks/user-defined functions.

    Other than the potentially premature abstraction, the extra redirection caused by the new abstraction also makes code harder to read and maintain.

Therefore, the recommendation is, for variants to be added in core:

  • In general, prefer creating new functions/classes over adding features to existing APIs, unless the repetition is too significant (next section talks about ways to reduce them)
  • Adding flags/args is acceptable for simple, clean additions.
  • Adding new abstraction requires scrutiny, and should come with more than a handful of use cases in mind.

Accept duplication, but aim to reduce them later

Users/developers may find that the core design is not good enough yet, and implementing a variant of func_core() without touching it may lead to too much code duplication. For example, ... is duplicated between the two functions below.

Existing API in core New variant
def func_core():
def func_v2():
Variant logic inserted here.

Such duplication is acceptable for a short term. This also echoes Flax philosophy that says "prefer duplication over adding options / bad abstractions".

We do NOT mean to encourage users to heavily fork core code. Instead, users and core developers should engage and aim to evolve the core design to reduce duplication — but design change takes time to happen, and duplication is preferred before a good design is found.

How to reduce duplication

The most risk-free way to reduce duplications is by moving them into shared reusable code:

Existing API in core New variant
def func_core():
... # fewer duplications than before
def func_v2():
... # fewer duplications than before
Variant logic inserted here

This should be the preferred way to reduce duplications. The benefits are:

  • No change to the API of func_core(), hence little risk.
  • Create reusable sub-routines that may be useful to new use cases.

However, there are also challenges:

  • This adds a new API (_reusable_parts()) to maintain.
  • Sometimes it's difficult to identify a clean & reusable subset that can be easily split from the duplicated code. It may require small refactoring to expose a clean subset. Also, remember that the approach that reduces the most duplications might not be the one with the best abstraction.

The above challenges are less significant if _reusable_parts() is private. Therefore:

  • If func_v2() is in core, make _reusable_parts() private.
  • If func_v2() is out-of-core, consider _reusable_parts() as "internal/experimental APIs".

Inheritance, e.g. class ModuleV2(ModuleCore) may also reduce duplication between two variants. However, this is generally less preferable than composition like above. The reason is similar to why callbacks are not preferred: overriding methods is like passing callbacks - they are both user-defined functions and suffer from the same limitations: users are constrained by the assumption of when/where/how the methods/callbacks are triggered.

Prefer branching at shallower code path

We generally prefer adding a new implementation over adding new conditional branches to the existing implementation, but branches probably will happen somewhere anyway – after all, the new feature variant probably ends up as a new option/argument in the end-users' config.

If branching has to happen, we prefer it at earlier, shallower code path:

Branch earlier Branch later
class Module():
def __call__(self, flag):
if flag:
def func_core(flag):
if flag:
class Module():
def __call__(self, flag):

By branching earlier, we keep a clean func_core() unaffected by the new variant. This recommendation is a natural consequence of our preference of new implementation func_v2() (vs. adding flag to func_core()).