Content-Sensitive Supervoxels via Uniform Tessellations on Video Manifolds
Published in CVPR, 2018
Abstract: Supervoxels are perceptually meaningful atomic regions in videos, obtained by grouping voxels that exhibit coherence in both appearance and motion. In this paper, we propose content-sensitive supervoxels (CSS), which are regularly-shaped 3D primitive volumes that possess the following characteristic: they are typically larger and longer in content-sparse regions (i.e., with homogeneous appearance and motion), and smaller and shorter in content-dense regions (i.e., with high variation of appearance and/or motion). To compute CSS, we map a video Ξ to a 3-dimensional manifold M embedded in R 6 , whose volume elements give a good measure of the content density in Ξ. We propose an efficient Lloyd-like method with a splitting-merging scheme to compute a uniform tessellation on M, which induces the CSS in Ξ. Theoretically our method has a good competitive ratio O(1). We also present a simple extension of CSS to stream CSS for processing long videos that cannot be loaded into main memory at once. We evaluate CSS, stream CSS and seven representative supervoxel methods on four video datasets. The results show that our method outperforms existing supervoxel methods.
Recommended citation: Ran Yi, Yong-Jin Liu, Yu-Kun Lai. Content-Sensitive Supervoxels via Uniform Tessellations on Video Manifolds. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 18), pages 646-655, 2018. Source code.