3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25x reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering.
Masking can significantly reduce the number of Gaussians while retaining high quality.
In addition to the proposed method (Ours), we implemented straightforward post-processing techniques on the model attributes, a variant we denote as Ours+PP. These post-processing steps include: 1) Applying 8-bit min-max quantization to opacity and hash grid parameters. 2) Pruning hash grid parameters with values below 0.1. 3) Applying Huffman encoding on the quantized opacity and hash parameters, and R-VQ indices.
Dataset | Mip-NeRF 360 | Tanks&Temples | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | PSNR | SSIM | LPIPS | Train | FPS | Storage | PSNR | SSIM | LPIPS | Train | FPS | Storage |
Plenoxels | 23.08 | 0.626 | 0.463 | 25m 49s | 6.79 | 2.1 GB | 21.08 | 0.719 | 0.379 | 25m 05s | 13.0 | 2.3 GB |
INGP-base | 25.30 | 0.671 | 0.371 | 05m 37s | 11.7 | 13 MB | 21.72 | 0.723 | 0.330 | 05m 26s | 17.1 | 13 MB |
INGP-big | 25.59 | 0.699 | 0.331 | 07m 30s | 9.43 | 48 MB | 21.92 | 0.745 | 0.305 | 06m 59s | 14.4 | 48 MB |
Mip-NeRF 360 | 27.69 | 0.792 | 0.237 | 48h | 0.06 | 8.6 MB | 22.22 | 0.759 | 0.257 | 48h | 0.14 | 8.6 MB |
3DGS | 27.21 | 0.815 | 0.214 | 41m 33s | 134 | 734 MB | 23.14 | 0.841 | 0.183 | 26m 54s | 154 | 411 MB |
3DGS* | 27.46 | 0.812 | 0.222 | 24m 07s | 120 | 746 MB | 23.71 | 0.845 | 0.178 | 13m 51s | 160 | 432 MB |
Ours | 27.08 | 0.798 | 0.247 | 33m 06s | 128 | 48.8 MB | 23.32 | 0.831 | 0.201 | 18m 20s | 185 | 39.4 MB |
Ours+PP | 27.03 | 0.797 | 0.247 | - | - | 29.1 MB | 23.32 | 0.831 | 0.202 | - | - | 20.9 MB |
Dataset | Deep Blending | |||||
---|---|---|---|---|---|---|
Method | PSNR | SSIM | LPIPS | Train | FPS | Storage |
Plenoxels | 23.06 | 0.795 | 0.510 | 27m 49s | 11.2 | 2.7 GB |
INGP-base | 23.62 | 0.797 | 0.423 | 06m 31s | 3.26 | 13 MB |
INGP-big | 24.96 | 0.817 | 0.390 | 08m 00s | 2.79 | 48 MB |
Mip-NeRF 360 | 29.40 | 0.901 | 0.245 | 48h | 0.09 | 8.6 MB |
3DGS | 29.41 | 0.903 | 0.243 | 36m 02s | 137 | 676 MB |
3DGS* | 29.46 | 0.900 | 0.247 | 21m 52s | 132 | 663 MB |
Ours | 29.79 | 0.901 | 0.258 | 27m 33s | 181 | 43.2 MB |
Ours+PP | 29.73 | 0.900 | 0.258 | - | - | 23.8 MB |
We used the project page of Masked Wavelet NeRF as a template.