[Update (Aug. 2024)] We have extended the compact 3D Gaussian splatting for dynamic scene representation.
We broadens the scope of the compact 3D Gaussian representation to include dynamic scenes with significant enhancements: 1) We successfully extend the learnable masking approach for Gaussians moving over time, demonstrating its wide applicability. For static scenes, several methods have attempted to estimate and remove non-essential Gaussians after training, yielding promising results. However, removing non-essential Gaussians after training has been more challenging in dynamic scenes, as it requires measuring the importance of each Gaussian over the entire duration. In contrast, our proposed masking strategy simplifies the process by eliminating such complexities, learning the actual rendering impact of each Gaussian across all timestamps during training iterations through gradient descent. 2) To compactly represent the motions of Gaussians, we propose learning representative temporal trajectories by applying the codebook-based approach to temporal attributes. We successfully represent temporal attributes parameter-efficiently and validate that other compact representations for geometry and color are applicable for dynamic scenes as well as for static scenes. 3) Extensive experiments and analysis demonstrate the effectiveness of our approach in dynamic settings. We achieve more than a tenfold increase in parametric efficiency compared to STG, the state-of-the-art method for dynamic scene representation, while maintaining comparable performance.
3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25x reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering.
In addition to the end-to-end trained models (Ours), we implemented straightforward post-processing techniques on the model attributes, a variant we denote as Ours+PP. These post-processing steps include: 1) Applying 8-bit min-max quantization to hash grid parameters and scalar attributes. 2) Pruning hash grid parameters with values below 0.1. 3) Sorting Gaussians in Morton order. 4) Applying Huffman encoding on the 8-bit quantized values (hash parameters and scalar attributes) and R-VQ indices, and compressing the results using DEFLATE
We used the project page of Masked Wavelet NeRF as a template.