Cross-embodiment robot learning is bottlenecked by fragmented infrastructure, not just limited data.

Cross-embodiment robot learning is bottlenecked by fragmented infrastructure.

Despite recent progress in training Vision-Language-Action models (VLAs), deploying them across different robots remains a major engineering challenge. Most robot code is highly specific to an exact hardware setup. Reproducing results on a different platform usually means rewriting the entire control stack. VLAs cannot be deployed out-of-the-box on new embodiments, and existing cross-embodiment datasets are aggregations of disjointed collection efforts across fragmented infrastructure.

RIO (Robot I/O) is an open-source Python framework that provides flexible, lightweight components for robot control, teleoperation, data formatting, sensor configuration, and policy deployment across diverse hardware platforms and morphologies. Users can freely choose robots, sensors, teleoperation interfaces, middlewares, data formats, and policies at every layer of the stack — and switch between them with minimal reconfiguration. We validate RIO on VLA deployment workflows across three morphologies (single-arm, bimanual, humanoid) and four hardware platforms, showcasing fine-tuned rollouts of π0.5 and GR00T N1.5 on household tasks.

RIO Overview

RIO

RIO is built on a node-middleware architecture. Nodes for teleoperation interfaces, sensors, robots, and policies are implemented from the same template, requiring minimal boilerplate. Each Node dynamically inherits from a given Middleware that handles message passing. Factory functions produce matched server-client Node pairs, supporting three execution patterns: publish-only (pub()), request-only (req()), and combined (pubreq()). Published data flows through ring buffers that continuously stream state at a fixed frequency, while requests flow through request queues that enable asynchronous command communication from multiple clients at arbitrary rates.

Because Nodes are middleware-agnostic, they can be paired with different backends depending on deployment requirements. Shared memory enables zero-copy data exchange for high-throughput local communication. Zenoh or ZeroRpc handle serialization and transport over TCP/IPC for distributed multi-machine deployments. Thread middleware simplifies debugging by running everything within a single process.

RIO is designed around five tenets: Flexible — agnostic to components, no locked-in choices; Reusable — lightweight building blocks that combine and modify quickly; Accessible — Python-based, single config file, quick to install; Performant — real-time control with asynchronous policy inference; Consistent — scalable, reproducible data collection.

Supported Hardware

RIO provides flexibility across robot hardware, teleoperation interfaces, cameras, and middlewares. These can be combined in any configuration depending on your requirements.

Humanoid Robots Unitree G1, Booster T1
Robot Arms UFactory (xArm5/6/7, 850, Lite6), UR (UR5e, UR7e), Franka (FR3, Panda), Kinova (Gen3), SO-100/SO-101
Robot Grippers UFactory, Franka, Robotiq (2F-85/2F-140), DH-Robotics (AG-105-145)
Teleop Interfaces Spacemouse, Gamepad, Keyboard, VR (Apple Vision Pro, Meta Quest), Leader-Follower (GELLO), Phone
Cameras RealSense, ZED, UVC (Webcams, USB), iPhone (Record3D)
Middlewares Shared Memory, Thread, Portal, Zenoh, ZeroRpc

Framework Comparison

Framework Humanoids Bimanual Single Arm Grippers Teleop Cameras Middleware(s) Data Format(s) Policies
Ark ✘ : LCM ✘ : Pickle
LeRobot ✘ : Threads/gRPC ✘ : LeRobotDataset
ManiUniCon ✘ : Shm ✘ : Zarr
PAPRLE ✘ : ROS ✘ : Pickle n/a
PyRobot ✘ : ROS ✘ : Pickle n/a
RCS ✘ : RPC ✘ : Parquet
RoBits ✘ : ZMQ ✘ : NPZ/JSON n/a
UMI, DP ✘ : Shm ✘ : Zarr ✘ : DP
RIO (ours) ✔ : any ✔ : any

A Minimal Main Loop

RIO's API is designed for simplicity. A complete teleoperation loop fits in a few lines:

from rio import time
from rio.envs.factory import make_env
from rio.middleware import ServerManager

# Factory function to create servers, clients, and environment
servers, clients, env = make_env(cfg)

# Start servers with the desired middleware
with ServerManager(cfg.mw, list(servers.values())):
    # Start clients
    with env, clients["teleop"]() as teleop:

        while True:
            # Query client APIs, all non-blocking
            cmd = teleop.poll()
            action = env.build_action(cmd)
            obs = env.step(action)
            time.precise_wait()

Robot Stations

A composable dataclass configuration specifies the hardware topology for each station. The same application logic operates over arbitrary station configurations without modification.

Robot Station configurations

Example robot station configurations combining different hardware, sensors, and teleoperation interfaces.

Observation Schema

Each robot morphology defines a dedicated observation structure extending a common base schema, ensuring standardized data representation across platforms regardless of the underlying hardware.

@dataclass
class Camera:
    rgb: np.ndarray | None = None
    depth: np.ndarray | None = None
    meta: dict = field(default_factory=dict)

@dataclass
class Observation:
    proprio: np.ndarray  # Defaults to policy action space
    cameras: dict[str, Camera] = field(default_factory=dict)

@dataclass
class Step:
    timestep: int | None
    observation: Observation
    instruction: str | None
    action: np.ndarray | None
    meta: dict | None = field(default_factory=dict)

Results

We deploy state-of-the-art VLAs (π0.5, GR00T N1.5) across 3 morphologies and 4 hardware platforms, achieving ≥60% success on all tasks with just 50 teleoperated demonstrations.

VLAs (π0.5 & GROOT)

xArm7 — Place Can
SO-100 — Fold Cloth
SO-100 — Scrub Bowl
Humanoid — GROOT

Diffusion Policy

xArm7 — Throw Ball
xArm7 — Flip Tortilla

RL Navigation (PPO)

Unitree G1 — Walk
Booster T1 — Walk

Policy Deployment

Robot Policy Task Success Rate Task Time (s) Demo Time (s) GPU Util (%)
xArm7 BC π0.5 Fold Shirt 92.5% 41.96 ± 14.58 41.57 ± 9.25 56.7 ± 1.7
xArm7 BC π0.5 Place Can 95.0% 16.08 ± 3.41 14.46 ± 2.00 54.6 ± 3.1
SO-100 BC π0.5 Fold Cloth 60.0% 27.50 ± 5.51 22.43 ± 3.30 46.3 ± 10.0
SO-100 BC π0.5 Scrub Bowl 64.0% 40.33 ± 13.68 27.66 ± 5.22 52.0 ± 4.8
Unitree G1 BC GR00T N1.5 Pick Box 95.0% 9.07 ± 6.10 10.38 ± 4.04 61.7 ± 4.7
Unitree G1 RL PPO Navigate 100% 31.27 ± 6.56 n/a 5.1 ± 0.1
Booster T1 RL PPO Navigate 100% 29.73 ± 4.49 n/a 5.3 ± 0.2

System Profiling

RIO reaches 130.3 ms end-to-end observation-to-action latency versus 581.2 ms for LeRobot — 4.46× lower for π0.5 inference — with sub-millisecond middleware round-trip times using Zenoh and shared memory.

RIO vs. LeRobot Latency

Observation-Action latency: RIO 130.3 ms vs LeRobot 581.2 ms

We profile end-to-end observation-to-action latency during π0.5 rollouts with an SO-100 in the loop, using three Intel RealSense cameras (two D415s and one D405) at 640×480 resolution. Under identical hardware, RIO reaches 130.3 ms versus 581.2 ms for LeRobot — a 4.46× reduction. The gain stems from RIO's streamlined architecture: whereas LeRobot threads observations before transmitting them over the network to an asynchronous policy server, RIO leverages the middleware directly for asynchronous inference, cutting both observation fetching and framework overhead. Lower pipeline latency translates to a higher effective control frequency, which is critical for dynamic, contact-rich tasks such as ball throwing and tortilla flipping.

Middleware Round-Trip Latency

Middleware Latency (ms)
Zenoh 0.43 ± 0.13
Shared Memory 0.54 ± 0.62
Thread 0.99 ± 0.30
ZeroRpc 1.05 ± 0.17
Portal 1.97 ± 0.34

Half of median round-trip time (1st/99th percentiles trimmed, per Open Messaging Benchmark), over 1,000 passes with a 2048-byte payload.

Node Profiling During Policy Deployment

Node profiling timeline

Timeline of π0.5 rollout on xArm7 with three RealSense cameras (two D415s, one D405) at 640×480. The main loop remains non-blocking; asynchronous inference (~85.8 ms forward pass) allows continuous control.

Get Started with RIO

RIO is designed to be quick to install and easy to use. Check out the repository to get started.

View on GitHub

Team

Pablo Ortega-Kral*,1, Eliot Xing*,1, Arthur Fender Coelho Bucker1, Vernon Luk1, Jason Kim2, Owen Kwon1, Angchen Xie1, Nikhil Sobanbabu1, Yifu Yuan1, Megan Lee1, Deepam Ameria1, Bhaswanth Ayapilla1, Jaycie Bussell3, Guanya Shi1, Jonathan Francis1,3, Jean Oh†,1,4
1Carnegie Mellon University,  2Delft University of Technology,  3Bosch Center for AI,  4Lavoro AI Research
*Equal contribution  Corresponding author