Where Are Pixels? -- a Deep Learning Perspective

Technically, an image is a function that maps a continuous domain, e.g. a box , to intensities such as (R, G, B). To store it on computer memory, an image is discretized to an array array[H][W], where each element array[i][j] is a pixel.

How does discretization work? How does a discrete pixel relate to the abstract notion of the underlying continuous image? These basic questions play an important role in computer graphics & computer vision algorithms.

This article discusses these low-level details, and how they affect our CNN models and deep learning libraries. If you ever wonder which resize function to use or whether you should add/subtract 0.5 or 1 to some pixel coordinates, you may find answers here. Interestingly, these details have contributed to many accuracy improvements in Detectron and Detectron2.

Read more

About Research

这个领域里, 什么都特别快.

三个月前看到 Bengio 组的 BinaryConnect. 脸草的同事都很喜欢模型加速 / 压缩的主题, 因此立刻就重现了结果开始改进. 当时就说要做成 Binary Activation, 并且搞一个 GPU runtime. 正当同事们回家过年, 我在 yy 这学期 parallel 大作业要不就写这个 runtime 的时候, 昨天看到 Bengio 新的 paper 挂出来, 已经都做完了. 更夸张的是, 在前天 arxiv 挂了另一篇文章, 方法基本一样.

三个月, 能专心做的话并不难, 然而我要应付作业, 要去 oculus 写代码, 还有其他好玩的东西在分心. 想着有空慢慢做的时候, 别人已经不等你了.

Read more

SIFT and Image Stitching


这方面的技术当然已经很成熟了, 开源界最著名的当属 hugin, 拼全景图效果非常好. 在学术界也已经不是难题了, Lowe 在 IJCV2007 的一篇 Automatic Panoramic Image Stitching using Invariant Features 是一个完整的流程介绍. MSRA 的 Szeliski 有过一本几十页的 Image Alignment and Stitching: A Tutorial, 也详细的介绍了图像拼接的众多方法. 我基本就照着 Lowe, Szeliski 的一堆论文的方法在搞.

Read more