Yuxin's Blog

Posted 06/Apr/2023

Safe Static Initialization, No Destruction

Since I joined Google Brain, I brought PyTorch to Google's internal infra and owned its maintenance. Being a "tech island", it's well known that almost everything in Google works differently from the outside world, and that creates many challenges when building a massive library like PyTorch.

Among those challenges, there are a few tricky bugs related to static initialization order fiasco (SIOF) and their destructions. This time I was forced to learn a lot more details than I'd like to know about these topics, so it's good to write them down before I forget.

Posted 24/Dec/2022

Demystify RAM Usage in Multi-Process Data Loaders

A typical PyTorch training program on 8 GPUs with 4 dataloader workers per GPU would create at least processes. A naive use of PyTorch dataset and dataloader can easily replicate your dataset's RAM usage by 40 times. This issue has probably affected everyone who has done anything nontrivial with PyTorch. In this post, we will explain why it happens, and how to avoid the 40x RAM usage.

Posted 16/Jun/2022

Automatically Flatten & Unflatten Nested Containers

This post is about a small functionality that is found useful in TensorFlow / JAX / PyTorch.

Low-level components of these systems often use a plain list of values/tensors as inputs & outputs. However, end-users that develop models often want to work with more complicated data structures: Dict[str, Any], List[Any], custom classes, and their nested combinations. Therefore, we need bidirectional conversion between nested structures and a plain list of tensors. I found that different libraries invent similar approaches to solve this problem, and it's interesting to list them here.

Posted 22/May/2022

TorchScript: Tracing vs. Scripting

PyTorch provides two methods to turn an nn.Module into a graph represented in TorchScript format: tracing and scripting. This article will:

Compare their pros and cons, with a focus on useful tips for tracing.
Try to convince you that torch.jit.trace should be preferred over torch.jit.script for deployment of non-trivial models.

Posted 11/Jun/2021

Where Are Pixels? -- a Deep Learning Perspective

Technically, an image is a function that maps a continuous domain, e.g. a box , to intensities such as (R, G, B). To store it on computer memory, an image is discretized to an array array[H][W], where each element array[i][j] is a pixel.

How does discretization work? How does a discrete pixel relate to the abstract notion of the underlying continuous image? These basic questions play an important role in computer graphics & computer vision algorithms.

This article discusses these low-level details, and how they affect our CNN models and deep learning libraries. If you ever wonder which resize function to use or whether you should add/subtract 0.5 or 1 to some pixel coordinates, you may find answers here. Interestingly, these details have contributed to many accuracy improvements in Detectron and Detectron2.

Posted 02/Apr/2021

Patching STB_GNU_UNIQUE of Buggy Binaries

开源工具链里有很多陈年小 "feature", 最初由于各种原因 (例如作为 workaround) 实现了之后, 即使语义模糊或设计不合理, 也因为兼容性被留到了今天.

Posted 25/Jan/2021

PyTorch 写 Model 可以用 IfElse? 幻觉

吐个小槽. 很久以前有次我在知乎上的一个回答里夸了 TensorFlow 1.x, 然后被人抱怨说 graph mode 写不了 IfElse 不能忍.

然而, PyTorch 就可以写 IfElse 了?

Posted 28/Aug/2020

Fight Against Silent Bugs in Deep Learning Libraries

TL;DR: How to find out if your favorite deep learning library is occasionally giving you wrong results? Such bugs happen from time to time, and are extremely difficult to notice, report, and debug.

Popular Posts

Tags