Where Are Pixels? -- a Deep Learning Perspective
Technically, an image is a function that maps a continuous domain, e.g.
a box array[H][W]
, where each element
array[i][j]
is a pixel.
How does discretization work? How does a discrete pixel relate to the abstract notion of the underlying continuous image? These basic questions play an important role in computer graphics & computer vision algorithms.
This article discusses these low-level details, and how they affect our CNN models and deep learning libraries. If you ever wonder which resize function to use or whether you should add/subtract 0.5 or 1 to some pixel coordinates, you may find answers here. Interestingly, these details have contributed to many accuracy improvements in Detectron and Detectron2.