G3Splat: Geometrically Consistent Generalizable Gaussian Splatting

Mehdi Hosseinzadeh1*,   Shin-Fang Chng2*,   Yi Xu2,   Simon Lucey1,   Ian Reid1,3,   Ravi Garg1
1 Australian Institute for Machine Learning, Adelaide University     2 Goertek Alpha Labs     3 MBZUAI
G3Splat teaser
G3Splat enables geometrically consistent, pose-free generalizable Gaussian splatting across backbones. Left: our VGGT-based adaptation without / with the proposed priors. Right: our DUSt3R-based adaptation without / with the proposed priors. We visualize reconstructions on a Sora-generated video (150 input views) and RealEstate10K (2 input views). Our priors encourage geometrically consistent Gaussians and markedly reduce floating artifacts. Sora prompt: “Generate a video inside the Louvre Museum, including the paintings.”
Datasets: RealEstate10K · ACID · ScanNet · NYU Depth V2 · Tanks and Temples · In-the-Wild (Sora-generated) Backbones: VGGT · DUSt3R Outputs: Depth (novel and source views) · Relative Pose · Gaussian Splats · Meshes (from novel views) · Novel View Synthesis ·
Abstract

3D Gaussians have recently emerged as an effective scene representation for real-time splatting and accurate novel-view synthesis, motivating several works to adapt multi-view structure prediction networks to regress per-pixel 3D Gaussians from images. However, most prior work extends these networks to predict additional Gaussian parameters—orientation, scale, opacity, and appearance—while relying almost exclusively on view-synthesis supervision. We show that a view-synthesis loss alone is insufficient to recover geometrically meaningful splats in this setting.

We analyze and address the ambiguities of learning 3D Gaussian splats under self-supervision for pose-free generalizable splatting, and introduce G3Splat, which enforces geometric priors to obtain geometrically consistent 3D scene representations. Trained on RE10K, our approach achieves state-of-the-art performance in (i) geometrically consistent reconstruction, (ii) relative pose estimation, and (iii) novel-view synthesis. We further demonstrate strong zero-shot generalization on ScanNet, substantially outperforming prior work in both geometry recovery and relative pose estimation. Code and pretrained models are released on our project page.

Method Overview
Why priors matter: without them, self-supervised Gaussian parameters are ill-posed.

G3Splat targets generalizable Gaussian splatting: instead of per-scene optimization, a feed-forward model predicts a dense set of per-pixel 3D Gaussians (means, scales, orientations, opacity, appearance) from a few input views.

Why priors matter: view-synthesis supervision alone can match target images while still producing geometrically degenerate Gaussians—means can look plausible, yet orientations and scales drift (and opacities may become unstable). This comes from three coupled issues highlighted in the paper:
  • Overparameterization: Gaussians are richer than depth/points, but generalizable training often lacks corresponding geometric constraints.
  • Geometric ambiguity: many different Gaussian configurations can render similarly, so photometric supervision can “explain images” without recovering geometry.
  • Lack of heuristics: per-scene 3DGS relies on split/duplicate/prune heuristics; feed-forward predictors keep everything “alive”, amplifying failure modes.

We address this by adding a small set of geometry-minded priors that are lightweight, backbone-agnostic, and compatible with both full-rank 3DGS and surfel-like 2DGS. The goal is turning self-supervised learning into a problem that explicitly favors geometrically consistent splats.

Orientation prior Pixel-alignment prior (3DGS) scale regularization
  • Orientation prior: encourages each Gaussian to behave like a tiny surface element—so orientations track local surface geometry rather than texture or view-synthesis quirks.
  • Pixel-alignment prior: ties each predicted Gaussian back to its originating pixel, reducing structure–pose ambiguity and stabilizing pose-free reconstruction.
  • Scale regularization (3DGS): discourages near-isotropic “blobs” in the full 3DGS setting, biasing Gaussians toward more surfel-like anisotropy when that improves geometric stability.
What changes “without priors” vs “with priors”?

In short: without priors, models can render well while learning unstable orientations/scales; with priors, the learned Gaussians become more surface-consistent, which supports reliable depth rendering, TSDF-fused meshes, and stronger relative pose estimation. The paper has the full breakdown, ablations, and comparisons.

Qualitative comparison of predicted Gaussian parameters
Qualitative comparison of predicted Gaussian parameters. A visual sanity check of what Gaussians “mean” in practice: baselines often learn orientations/scales that look plausible for rendering but are weak as geometry, while our priors yield more coherent surface structure and fewer floating artifacts. (See the paper for definitions and details.)
Reconstructed Gaussians (Interactive Demo)
Datasets - Input Views
Reconstructed Gaussians Without Priors
Reconstructed Gaussians With Our Priors
RealEstate10K
• In-domain.
• 2 input views.
RE10K input image 1 RE10K input image 2
DUSt3R Adaptation (Without Priors)
Viewer
DUSt3R Adaptation (With Our Priors)
Viewer
RealEstate10K
• In-domain.
• 2 input views.
RE10K input image 1 RE10K input image 2
DUSt3R Adaptation (Without Priors)
Viewer
DUSt3R Adaptation (With Our Priors)
Viewer
Sora-generated
• Prompt used to generate the video: "A single unbroken orbital camera move through a vast, empty gothic library, with static architecture, medium-wide framing, warm steady lighting, and crisp sharp geometric details".
• Cross-dataset zero-shot generalization.
• 24 input views.
VGGT Adaptation (Without Priors)
Viewer
VGGT Adaptation (With Our Priors)
Viewer
ScanNet
• Cross-dataset zero-shot generalization.
• 2 input views.
ScanNet input image 1 ScanNet input image 2
DUSt3R Adaptation (Without Priors)
Viewer
DUSt3R Adaptation (With Our Priors)
Viewer
ScanNet
• Cross-dataset zero-shot generalization.
• 2 input views.
ScanNet input image 1 ScanNet input image 2
DUSt3R Adaptation (Without Priors)
Viewer
DUSt3R Adaptation (With Our Priors)
Viewer
Tanks and Temples
• Cross-dataset zero-shot generalization.
• Francis Scene.
• 20 input views.
VGGT Adaptation (Without Priors)
TnT Gaussians without priors
VGGT Adaptation (With Our Priors)
TnT Gaussians with our priors
Evaluations
Depth & meshes, relative pose, and NVS.

Geometry (Depth & Mesh)

We evaluate geometric consistency by rendering virtual depth maps from predicted Gaussians and also fusing them into meshes via TSDF-Fusion.

ScanNet Depth (Novel-view)
Scatter plot: AbsRel vs delta
ScanNet Depth (Source-view)
Scatter plot: AbsRel vs delta
Novel-view depth evaluation figure
Novel-view depth on RE10K (first row), ACID (second row), and ScanNet (last row).
ScanNet Mesh — Accuracy
Dumbbell chart: mesh accuracy
ScanNet Mesh — Completeness
Dumbbell chart: mesh completeness
ScanNet Mesh — Chamfer
Dumbbell chart: mesh chamfer
ScanNet
• Cross-dataset zero-shot generalization.
• 2 input views.
ScanNet input image 1 ScanNet input image 2
Ground Truth
No Prior (VGGT)
With Our Priors (VGGT)
No Prior (DUSt3R)
With Our Priors (DUSt3R)
ScanNet
• Cross-dataset zero-shot generalization.
• 2 input views.
ScanNet input image 1 ScanNet input image 2
Ground Truth
No Prior (VGGT)
With Our Priors (VGGT)
No Prior (DUSt3R)
With Our Priors (DUSt3R)
Mesh reconstruction figure
Meshes Reconstructed from TSDF-Fusion of virtual depths (ScanNet, 2 source views).

Relative Pose (AUC)

We report AUC of the cumulative pose error curve at three thresholds, across in-domain RE10K and zero-shot ScanNet/ACID.

Pose AUC heatmap (PnP+RANSAC)
Pose AUC heatmap (refinement)

RE10K Novel-view Rendered RGB and Depth

ScanNet (cross-dataset, zero-shot) Novel-view Rendered RGB and Depth

Citation
@inproceedings{g3splat,
        title={G3Splat: Geometrically Consistent Generalizable Gaussian Splatting}, 
        author={Mehdi Hosseinzadeh and Shin-Fang Chng and Yi Xu and Simon Lucey and Ian Reid and Ravi Garg},
        booktitle = {arXiv:2512.17547},
        year={2025},
        url={https://arxiv.org/abs/2512.17547}, 
        }