ViserDex

Abstract

In-hand object reorientation requires precise estimation of the object pose to handle complex task dynamics. While RGB sensing offers rich semantic cues for pose tracking, existing solutions rely on multi-camera setups or costly ray tracing. We present a sim-to-real framework for monocular RGB in-hand reorientation that integrates 3D Gaussian Splatting (3DGS) to bridge the visual sim-to-real gap. Our key insight is performing domain randomization in the Gaussian representation space: by applying physically consistent, pre-rendering augmentations to 3D Gaussians, we generate photorealistic, randomized visual data for object pose estimation. The manipulation policy is trained using curriculum-based reinforcement learning with teacher–student distillation, enabling efficient learning of complex behaviors. Importantly, both perception and control models can be trained independently on consumer-grade hardware, eliminating the need for large compute clusters. Experiments show that the pose estimator trained with 3DGS data outperforms those trained using conventional rendering data in challenging visual environments. We validate the system on a physical multi-fingered hand equipped with an RGB camera, demonstrating robust reorientation of five diverse objects even under challenging lighting conditions. Our results highlight Gaussian splatting as a practical path for RGB-only dexterous manipulation.

Video

Pipeline Overview

We first train a teacher policy in simulation with full state access, then distill it into a recurrent student policy that operates from noisy observations. A monocular RGB pose estimator trained with 3D Gaussian Splatting data provides object poses to the student policy, enabling goal-conditioned dexterous manipulation on a real multi-fingered hand.

Pre-Rasterization Gaussian Augmentations

Structured perturbations across Gaussian clusters (Spatial, Color, and Global) allows simulating shadows, marks, spectral effects and environmental shifts from a single static 3D representation.

Slide to compare between different augmentation types:

Tablet Bottle

Cube

Real-World Deployment

Zero-shot transfer to an Allegro hand manipulating five diverse objects under varied lighting conditions.

Nominal Laboratory Lighting

Cube

Rubber Duck

Dynamic Lighting & Reflections

Globe

3D Printed Toy

Our system achieves over 25 consecutive successful reorientations on average under extreme visual conditions that cause traditional estimators to fail.

BibTeX

@article{bhardwaj2026viserdex,
  title={ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation},
  author={Bhardwaj, Arjun and Wilder-Smith, Maximum and Mittal, Mayank and Patil, Vaishakh and Hutter, Marco},
  journal={arXiv preprint arXiv:2604.11138},
  year={2026}
}

ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation

Robust, monocular RGB-based in-hand reorientation of complex objects using 3D Gaussian Splatting for visually diverse Sim-to-Real transfer.

Abstract

Video

Pipeline Overview

Pre-Rasterization Gaussian Augmentations

Structured perturbations across Gaussian clusters (Spatial, Color, and Global) allows simulating shadows, marks, spectral effects and environmental shifts from a single static 3D representation.

Tablet Bottle

Cube

Real-World Deployment

Zero-shot transfer to an Allegro hand manipulating five diverse objects under varied lighting conditions.

Nominal Laboratory Lighting

Dynamic Lighting & Reflections

BibTeX