Demystify RAM Usage in Multi-Process Data Loaders
A typical PyTorch training program on 8 GPUs with 4 dataloader
workers per GPU would create at least
Demystify RAM Usage in Multi-Process Data Loaders
A typical PyTorch training program on 8 GPUs with 4 dataloader
workers per GPU would create at least
Not Every Model Has a Separate "Loss Function"
"Loss function" is one of the most basic concepts today in deep learning. Despite that, it is actually not necessarily a good programming abstraction when designing general-purpose systems. A system should not assume that a model always comes together with a "loss function".
How to Maintain Clean Core APIs for Research
Building a library for research and experiments is quite different from building other types of software. A key challenge is that, in research, abstractions and APIs are rarely set in stone: users may want to propose a slight variant or modification to literally ANYWHERE in the whole program, just because they have a new idea.
Automatically Flatten & Unflatten Nested Containers
This post is about a small functionality that is found useful in TensorFlow / JAX / PyTorch.
Low-level components of these systems often use a plain list of values/tensors
as inputs & outputs.
However, end-users that develop models often want to work with more
complicated data structures:
Dict[str, Any]
, List[Any]
, custom classes, and their nested combinations.
Therefore, we need bidirectional conversion between nested structures and a plain list of tensors.
I found that different libraries invent similar approaches to solve this problem, and it's interesting to list them here.
TorchScript: Tracing vs. Scripting
PyTorch provides two methods to turn an nn.Module
into a
graph represented in TorchScript format: tracing and scripting.
This article will:
torch.jit.trace
should be preferred over torch.jit.script
for deployment of non-trivial models.How To Do Ablation Experiments
延续 上一篇文章, 再说一说怎么科学的在 paper 里做 ablations.
Where Are Pixels? -- a Deep Learning Perspective
Technically, an image is a function that maps a continuous domain, e.g.
a box array[H][W]
, where each element
array[i][j]
is a pixel.
How does discretization work? How does a discrete pixel relate to the abstract notion of the underlying continuous image? These basic questions play an important role in computer graphics & computer vision algorithms.
This article discusses these low-level details, and how they affect our CNN models and deep learning libraries. If you ever wonder which resize function to use or whether you should add/subtract 0.5 or 1 to some pixel coordinates, you may find answers here. Interestingly, these details have contributed to many accuracy improvements in Detectron and Detectron2.
Deep Learning Experiments and Claims
这几年来, 从 FAIR 的几位大佬身边学习到的最多的是对待 research 的态度. 因此说说写 paper 和做实验的体会.
实验是为了证明或强化文章里给出的 claim/hypothesis 的.
Ross ICCV 2019 tutorial 最后谈了谈怎么写 paper. 第 126 页说, 文章中所有的 claim, 理想情况下都应该要么是文献中已有的 claim, 要么是有实验能够证明的 claim.
PyTorch 写 Model 可以用 IfElse? 幻觉
吐个小槽. 很久以前有次我在知乎上的一个回答里夸了 TensorFlow 1.x, 然后被人抱怨说 graph mode 写不了 IfElse 不能忍.
然而, PyTorch 就可以写 IfElse 了?
Fight Against Silent Bugs in Deep Learning Libraries
TL;DR: How to find out if your favorite deep learning library is occasionally giving you wrong results? Such bugs happen from time to time, and are extremely difficult to notice, report, and debug.