Safe Static Initialization, No Destruction

Since I joined Google Brain, I brought PyTorch to Google's internal infra and owned its maintenance. Being a "tech island", it's well known that almost everything in Google works differently from the outside world, and that creates many challenges when building a massive library like PyTorch.

Among those challenges, there are a few tricky bugs related to static initialization order fiasco (SIOF) and their destructions. This time I was forced to learn a lot more details than I'd like to know about these topics, so it's good to write them down before I forget.

Terminology

"Static initialization" is an ambiguous term because "static" is very overloaded in C++. In our context, it is supposed to mean "initialization of objects that have static storage duration", i.e. objects that live through the lifetime of a program. The word "static" actually talks about the object lifetime, not about initialization.

Meanwhile, initialization of such objects can have two steps:

  • Zero/Constant initialization (it's also confusingly called "static initialization" 😱): allocate memory and fill in zero or constant bytes. For objects with static storage duration, this step is done during compile time.
  • Dynamic initialization: call constructor (if non-trivial, and if not constexpr) of the object.

Objects with static storage duration can be categorized into following two types, based on when their "dynamic initialization" happen:

  • Global: these objects are dynamic-initialized at program launch (before main()).
    Object a1;
    static Object a2;
    class A {
    static Object a3;
    };
  • Function-local static:
    void func() {
    static Object a4;
    }
    They are dynamic-initialized the first time it's reached. Since C++11, they are guaranteed to initialize once and only once even with multiple threads.

Static Initialization Order Fiasco

SIOF typically refers to the problem that the dynamic initialization order of objects from different translation units is undefined, e.g.:

// a.cpp:
Object a;
// b.cpp:
AnotherObject b;

If a and b have non-trivial constructors, and the constructor of b somehow needs to access a, the program may crash or behave unexpectedly because a may be initialized after b.

PyTorch heavily uses registrations, which all have static storage duration. A few SIOF bugs were found when I tried to build PyTorch in Google. As an example, when an ATen operator has many overloads, initialization order affects which overload is called, because an overload that's initialized earlier will be preferred over those initialized later.

Standard ways to avoid SIOF problems are:

  • Avoid dynamic initialization: change object type to something that can be zero/const-initialized. totw/140 shows a few examples on how to replace std::string with non-dynamic counterparts.

  • Use well-defined initialization order: there is a guarantee that objects within the same translation unit are dynamically initialized according to the well-defined program order. So we can sometimes just move code into the same translation unit. In another PyTorch bug where one global depends on another, I simply merged two files so that their constructors are properly sequenced.

  • Construct on first use: it's often not practical to merge files. A better solution is the "construct on first use" idiom:

    ❌ Don't use globals ✅ Use function-local static:
    Object a;
    Object& get_a() {
    static Object* a = new Object();
    return *a;
    }
    // or:
    Object& get_a() {
    static Object a;
    return a;
    }

    By doing this, anyone that needs to access a will have to call get_a(). Because function-local static is guaranteed to initialize on first use, we can rest assured that a will not be used before initialization.

    The "construct on first use" idiom may look differently, because sometimes we don't need to use a directly but do need to observe the side effects of its constructor. In such cases we just manually call get_a to make sure a is constructed. I used this to fix another PyTorch bug .

Safe Destructions

There are more ways things can go wrong in the destruction of objects with static storage duration.

In general, we have to carefully avoid use-after-free, i.e. access a global/function-local variable after it's destructed. This is typically protected by this rule:

Non-local objects with static storage duration are destroyed in the reverse order of the completion of their constructor.

Given this rule, we can deduce that: The above result sounds nice and is often enough protection, but people tend to overlook a few ways things can still go wrong:

  • The program can, in theory, pass around the address of b. This should be discouraged, but it means that technically ANY object could access b in their destructor. If any of these objects are destructed after b, we're doomed.
  • When one thread crashes and destructs objects, there is a possibility that at this point another thread is still running and have access to the object. We're doomed again.
  • The above "reverse-ordering" rule, though written in cppreference.com, is not always true under certain build options!
    // bar.cpp:
    #include <stdio.h>
    #include <stdlib.h>
    extern void register_B();
    struct A {
    A() { register_B(); puts("Finishing A()"); }
    ~A() { puts("~A()"); }
    };
    A a;
    // main.cpp:
    #include <stdio.h>
    #include <stdlib.h>
    struct B {
    B() { puts("Finishing B()"); }
    ~B() { puts("~B()"); }
    };
    void register_B() { static B b; }
    int main(void) { puts("main");}
    $ g++ -fPIC -shared bar.cpp -o bar.so
    $ g++ -pie -Wl,--no-as-needed main.cpp ./bar.so -o main-pie && ./main-pie
    Finishing B()
    Finishing A()
    main
    ~B()
    ~A()
    $ g++ -no-pie -Wl,--no-as-needed main.cpp ./bar.so -o main-no-pie && ./main-no-pie
    Finishing B()
    Finishing A()
    main
    ~A()
    ~B()
    I discovered this the hard way when debugging another PyTorch issue: Still trying to understand if this is considered a compiler bug.

Given the above issues, the Google C++ style guide bluntly forbids such destructions:

Objects with static storage duration are forbidden unless they are trivially destructible.

This "no destruction" rule implies that the following code is illegal

Object& get_a() { static Object a; return a;}

if Object is not trivially destructible. C++ FAQ advises the same.

Writing static Object* a = new Object; return *a; is safe as long as we never call delete, but this introduces a heap-allocation overhead. The last trick is to use a NoDestructor wrapper class to bypass RAII (the trick is placement new operator):

Safe, but has heap allocation overhead Safe and low overhead
Object& get_a() {
static Object* a = new Object();
return *a;
}
Object& get_a() {
static NoDestructor<Object> a;
return *a;
}

Finally, as an alternative to "no destruction", another way to safely run destructors is to ref-counting all such objects, but it's perhaps not worth the complexity. "No destruction" is usually a good enough solution.

Summary

In conclusion, to safely construct and destruct objects with static storage duration + dynamic initialization, follow these rules of thumb:

  • Safe Initialization: use "construct on first use" idiom
  • No Destruction: don't run any non-trivial destructors

Comments