G³Splat: Geometrically Consistent Generalizable Gaussian Splatting

Mehdi Hosseinzadeh¹^*^†, Shin-Fang Chng²^*, Yi Xu², Simon Lucey¹, Ian Reid^1,3, Ravi Garg¹

¹ Australian Institute for Machine Learning, Adelaide University ² Goertek Alpha Labs ³ MBZUAI

G3Splat teaser — **G³Splat enables geometrically consistent, pose-free generalizable Gaussian splatting across backbones.** **Left:** our VGGT-based adaptation *without* / *with* the proposed priors. **Right:** our DUSt3R-based adaptation *without* / *with* the proposed priors. We visualize reconstructions on a Sora-generated video (150 input views) and RealEstate10K (2 input views). Our priors encourage *geometrically consistent* Gaussians and markedly reduce floating artifacts. Sora prompt: “Generate a video inside the Louvre Museum, including the paintings.”

Datasets: RealEstate10K · ACID · ScanNet · NYU Depth V2 · Tanks and Temples · In-the-Wild (Sora-generated) Backbones: VGGT · DUSt3R Outputs: Depth (novel and source views) · Relative Pose · Gaussian Splats · Meshes (from novel views) · Novel View Synthesis ·

Abstract

3D Gaussians have recently emerged as an effective scene representation for real-time splatting and accurate novel-view synthesis, motivating several works to adapt multi-view structure prediction networks to regress per-pixel 3D Gaussians from images. However, most prior work extends these networks to predict additional Gaussian parameters—orientation, scale, opacity, and appearance—while relying almost exclusively on view-synthesis supervision. We show that a view-synthesis loss alone is insufficient to recover geometrically meaningful splats in this setting.

We analyze and address the ambiguities of learning 3D Gaussian splats under self-supervision for pose-free generalizable splatting, and introduce G³Splat, which enforces geometric priors to obtain geometrically consistent 3D scene representations. Trained on RE10K, our approach achieves state-of-the-art performance in (i) geometrically consistent reconstruction, (ii) relative pose estimation, and (iii) novel-view synthesis. We further demonstrate strong zero-shot generalization on ScanNet, substantially outperforming prior work in both geometry recovery and relative pose estimation. Code and pretrained models are released on our project page.

Method Overview

Why priors matter: without them, self-supervised Gaussian parameters are ill-posed.

G³Splat targets generalizable Gaussian splatting: instead of per-scene optimization, a feed-forward model predicts a dense set of per-pixel 3D Gaussians (means, scales, orientations, opacity, appearance) from a few input views.

Why priors matter: view-synthesis supervision alone can match target images while still producing geometrically degenerate Gaussians—means can look plausible, yet orientations and scales drift (and opacities may become unstable). This comes from three coupled issues highlighted in the paper:

Overparameterization: Gaussians are richer than depth/points, but generalizable training often lacks corresponding geometric constraints.
Geometric ambiguity: many different Gaussian configurations can render similarly, so photometric supervision can “explain images” without recovering geometry.
Lack of heuristics: per-scene 3DGS relies on split/duplicate/prune heuristics; feed-forward predictors keep everything “alive”, amplifying failure modes.

We address this by adding a small set of geometry-minded priors that are lightweight, backbone-agnostic, and compatible with both full-rank 3DGS and surfel-like 2DGS. The goal is turning self-supervised learning into a problem that explicitly favors geometrically consistent splats.

Orientation prior Pixel-alignment prior (3DGS) scale regularization

Orientation prior: encourages each Gaussian to behave like a tiny surface element—so orientations track local surface geometry rather than texture or view-synthesis quirks.
Pixel-alignment prior: ties each predicted Gaussian back to its originating pixel, reducing structure–pose ambiguity and stabilizing pose-free reconstruction.
Scale regularization (3DGS): discourages near-isotropic “blobs” in the full 3DGS setting, biasing Gaussians toward more surfel-like anisotropy when that improves geometric stability.

What changes “without priors” vs “with priors”?

In short: without priors, models can render well while learning unstable orientations/scales; with priors, the learned Gaussians become more surface-consistent, which supports reliable depth rendering, TSDF-fused meshes, and stronger relative pose estimation. The paper has the full breakdown, ablations, and comparisons.

**Qualitative comparison of predicted Gaussian parameters.** A visual sanity check of what Gaussians “mean” in practice: baselines often learn orientations/scales that look plausible for rendering but are weak as geometry, while our priors yield more coherent surface structure and fewer floating artifacts. (See the paper for definitions and details.)

Reconstructed Gaussians (Interactive Demo)

Datasets - Input Views

Reconstructed Gaussians Without Priors

Reconstructed Gaussians With Our Priors

RealEstate10K

• In-domain.

• 2 input views.

DUSt3R Adaptation (Without Priors)

Viewer

Open in SuperSplat

DUSt3R Adaptation (With Our Priors)

Viewer

Open in SuperSplat

RealEstate10K

• In-domain.

• 2 input views.

DUSt3R Adaptation (Without Priors)

Viewer

Open in SuperSplat

DUSt3R Adaptation (With Our Priors)

Viewer

Open in SuperSplat

Sora-generated

• Prompt used to generate the video: "A single unbroken orbital camera move through a vast, empty gothic library, with static architecture, medium-wide framing, warm steady lighting, and crisp sharp geometric details".

• Cross-dataset zero-shot generalization.

• 24 input views.

VGGT Adaptation (Without Priors)

Viewer

Open in SuperSplat

VGGT Adaptation (With Our Priors)

Viewer

Open in SuperSplat

ScanNet

• Cross-dataset zero-shot generalization.

• 2 input views.

DUSt3R Adaptation (Without Priors)

Viewer

Open in SuperSplat

DUSt3R Adaptation (With Our Priors)

Viewer

Open in SuperSplat

ScanNet

• Cross-dataset zero-shot generalization.

• 2 input views.

DUSt3R Adaptation (Without Priors)

Viewer

Open in SuperSplat

DUSt3R Adaptation (With Our Priors)

Viewer

Open in SuperSplat

Tanks and Temples

• Cross-dataset zero-shot generalization.

• Francis Scene.

• 20 input views.

VGGT Adaptation (Without Priors)

VGGT Adaptation (With Our Priors)

Evaluations

Depth & meshes, relative pose, and NVS.

Geometry (Depth & Mesh)

We evaluate geometric consistency by rendering virtual depth maps from predicted Gaussians and also fusing them into meshes via TSDF-Fusion.

ScanNet Depth (Novel-view)

ScanNet Depth (Source-view)

Novel-view depth on RE10K (first row), ACID (second row), and ScanNet (last row).

ScanNet Mesh — Accuracy

ScanNet Mesh — Completeness

ScanNet Mesh — Chamfer

ScanNet

• Cross-dataset zero-shot generalization.

• 2 input views.

Ground Truth

No Prior (VGGT)

With Our Priors (VGGT)

No Prior (DUSt3R)

With Our Priors (DUSt3R)

ScanNet

• Cross-dataset zero-shot generalization.

• 2 input views.

Ground Truth

No Prior (VGGT)

With Our Priors (VGGT)

No Prior (DUSt3R)

With Our Priors (DUSt3R)

Meshes Reconstructed from TSDF-Fusion of virtual depths (ScanNet, 2 source views).

Relative Pose (AUC)

We report AUC of the cumulative pose error curve at three thresholds, across in-domain RE10K and zero-shot ScanNet/ACID.

RE10K Novel-view Rendered RGB and Depth

ScanNet (cross-dataset, zero-shot) Novel-view Rendered RGB and Depth

Citation

@inproceedings{g3splat,
        title={G3Splat: Geometrically Consistent Generalizable Gaussian Splatting}, 
        author={Mehdi Hosseinzadeh and Shin-Fang Chng and Yi Xu and Simon Lucey and Ian Reid and Ravi Garg},
        booktitle = {arXiv:2512.17547},
        year={2025},
        url={https://arxiv.org/abs/2512.17547}, 
        }