RVizSplat — Real-time 3D Gaussian Splatting in RViz 2

Demo

A 30-second screen capture of the plugin running live: a trained scene loaded from a .ply, rendered into RViz 2 alongside the standard tooling.

Why we built it

3D Gaussian Splatting has become the leading photorealistic scene representation — used in dense visual SLAM (Gaussian-SLAM, MonoGS, SplaTAM), in sim-to-real environment capture, and as a backdrop for teleoperation. But every GS viewer out there (the Inria reference, splatviz, antimatter15's WebGL splat) is a standalone tool, disconnected from a robot's TF tree, sensor data and markers.

That's the gap RVizSplat fills. Trained splats reach RViz 2 either as a .ply or as a splat_msgs/SplatArray on a ROS topic, then render with custom Ogre-side shaders at interactive framerates alongside TFs, markers and the rest of the scene. Every knob is exposed as a real RViz Property.

What it does

Two source modes

Load a .ply file once, or subscribe to a splat_msgs/SplatArray topic for live updates. Switch at runtime.

Two transparency modes

Exact back-to-front Sorted blending (default), or four-pass Weighted Blended OIT — toggleable per scene from the property tree.

Two sort backends

CPU pdqsort on a worker thread, or CUDA radix sort. Hot-swappable; the render thread never blocks on either.

Marker coexistence

Splats live in render queue 95, opaque RViz primitives at queue 94. The compositor catches them automatically.

Clip-box ROI

Axis-aligned crop to trim floaters, hide ceilings, isolate a workspace — without touching the source .ply.

Built-in PerfMonitor

Sliding-window FPS, per-stage timings (cpu_sort, cuda_sort, render), TBO byte stats. Logs survive in robot logs.

Architecture

A typed message and two source loaders feed an Ogre MovableObject that owns a packed GPU layout, a worker-thread sorter and an EWA + spherical-harmonics shader. From there the splats take one of two parallel transparency paths.

Sorted mode runs the depth sort on a worker thread (CPU pdqsort or CUDA CUB radix behind one ISplatSorter interface), reads the result from a lock-free pickup, and the GPU draws back-to-front with standard pre-multiplied alpha. Exact, predictable, fastest on small scenes.

WBOIT mode skips the per-frame sort entirely. A four-pass Ogre compositor (opaque scene, weighted accumulation, transmittance product, full-screen resolve) approximates correct transparency in one render pass. It coexists with native RViz primitives without a global sort, and is the right default when the CPU is contended.

Render quality

We compare RVizSplat against the two most-used standalone GS viewers using the canonical novel-view metrics — PSNR, SSIM, LPIPS — on identical viewport coordinates per scene.

Renderer	PSNR ↑	SSIM ↑	LPIPS ↓
splatviz	23.025	0.823	0.126
antimatter15/splat (WebGL)	21.985	0.765	0.163
RVizSplat	22.568	0.771	0.148

RVizSplat sits between the two reference viewers on every metric — well within the margin you'd expect from rasterisation differences.

How we compare

We hold the global viewport pose fixed across all three viewers, render each scene, and compute PSNR / SSIM / LPIPS against the held-out reference views. The full evaluation script is in the repo at evaluation/eval.py — image quality reports are first-class output.

Compared to other viewers

	Sort backend	Lives inside RViz 2	ROS integration
splatviz	GPU only	—	—
antimatter15/splat (WebGL)	CPU only	—	—
RVizSplat	CPU + CUDA	Yes	Yes (splat_msgs)

Standalone GS viewers are useful for inspecting a trained scene; none of them can sit inside a robot's debug pipeline alongside TFs, markers and the rest of the scene graph.

Recommendations for your system

Which sort backend and transparency mode to pick depends on your GPU, scene size, and how much CPU you can spare for rendering.

Hardware	Scene size	Recommendation	Why
Discrete GPU	Large > 7–8M splats	Sorted + CUDA radix	Beyond roughly 7–8M splats CPU pdqsort starts to lag; the GPU radix sort takes over and eliminates the bottleneck.
Discrete GPU	Small / medium ≲ 6M splats	Sorted + CPU pdqsort	pdqsort holds up well into several-million-splat territory in our tests; CUDA setup overhead isn't worth it at this scale.
Discrete GPU	any (CPU contended)	WBOIT	When other robot modules are competing for cores, skip the sort entirely.
Integrated GPU	Small / medium	Sorted + CPU pdqsort	CPU sort holds up at this scale; visual fidelity stays exact.
Integrated GPU	Large or CPU-contended	WBOIT	CPU sort becomes the bottleneck on integrated systems; WBOIT trades a small fidelity hit for steady framerate.
CPU only	Small	Sorted + CPU pdqsort	Medium and large scenes are unusable without a GPU; keep splats under ~500k.

Visual quality caveat. WBOIT is an approximation — a depth-weighted average instead of true back-to-front blending. With dense, high-opacity stacks the colors drift from the Sorted result. For demos where photorealism trumps CPU cost, prefer Sorted regardless of which row you fall in.

Team

Videh Patel
Suchetan Saravanan
Akash Chikhalikar
Aditya Mathur

Install

git clone --recurse-submodules https://github.com/RVizSplat/RVizSplat
cd RVizSplat
colcon build --symlink-install
source install/setup.bash

Option A — load a PLY file directly

The fastest path. No publisher node needed.

rviz2
# Add → Gaussian Splats
# Source:     PLY File
# Splat File: /path/to/scene.ply

Option B — stream over a ROS topic

For live splat updates from another node, or when the scene gets republished.

ros2 run splat_publisher ply_splat_publisher \
  --ros-args -p ply_path:=/path/to/scene.ply
rviz2
# Add → Gaussian Splats
# Source: ROS Topic
# Topic:  /gaussian_splats

Requires ROS 2 Rolling. Optional CUDA Toolkit for the CUDA radix-sort path — the plugin falls back to CPU pdqsort if CUDA isn't present.

RVizSplat: Rendering Gaussian Splats in RViz2