The Unitree G1 is a humanoid robot platform with full-body locomotion, arm gesture control, and agentic capabilities — no ROS required for basic operation.
- Unitree G1 (stock firmware)
- Ubuntu 22.04/24.04 with CUDA GPU (recommended), or macOS (experimental)
- Python 3.12
- ZED camera (mounted at chest height) for perception blueprints
- ROS 2 for navigation (the G1 navigation stack uses ROS nav)
First, install system dependencies for your platform:
Then install DimOS:
uv venv --python "3.12"
source .venv/bin/activate
uv pip install 'dimos[base,unitree]'No hardware? Start with simulation:
uv pip install 'dimos[base,unitree,sim]'
dimos --simulation run unitree-g1-basic-simThis runs the G1 in MuJoCo with the native A* navigation stack — same blueprint structure, simulated robot. Opens the command center at localhost:7779 with Rerun 3D visualization.
export ROBOT_IP=<YOUR_G1_IP>
dimos run unitree-g1-basicDimOS connects via WebRTC, starts the ROS navigation stack, and opens the command center.
| Module | What It Does |
|---|---|
| G1Connection | WebRTC connection to the robot — streams video, odometry |
| Webcam | ZED camera capture (stereo left, 15 fps) |
| VoxelGridMapper | Builds a 3D voxel map using column-carving (CUDA accelerated) |
| CostMapper | Converts 3D map → 2D costmap via terrain slope analysis |
| WavefrontFrontierExplorer | Autonomous exploration of unmapped areas |
| ROSNav | ROS 2 navigation integration for path planning |
| RerunBridge | 3D visualization in browser |
| WebsocketVis | Command center at localhost:7779 |
From the command center (localhost:7779):
- Click on the map to set navigation goals
- Toggle autonomous exploration
- Monitor robot pose, costmap, and planned path
Natural language control with an LLM agent that understands physical space and can command arm gestures:
export OPENAI_API_KEY=<YOUR_KEY>
export ROBOT_IP=<YOUR_G1_IP>
dimos run unitree-g1-agenticThen use the human CLI:
humancli
> wave hello
> explore the room
> give me a high fiveThe agent subscribes to camera and spatial memory streams and has access to G1-specific skills including arm gestures and movement modes.
The G1 agent can perform expressive arm gestures:
| Gesture | Description |
|---|---|
| Handshake | Perform a handshake gesture with the right hand |
| HighFive | Give a high five with the right hand |
| Hug | Perform a hugging gesture with both arms |
| HighWave | Wave with the hand raised high |
| Clap | Clap hands together |
| FaceWave | Wave near the face level |
| LeftKiss | Blow a kiss with the left hand |
| ArmHeart | Make a heart shape with both arms overhead |
| RightHeart | Make a heart gesture with the right hand |
| HandsUp | Raise both hands up in the air |
| RightHandUp | Raise only the right hand up |
| Reject | Make a rejection or "no" gesture |
| CancelAction | Cancel any current arm action and return to neutral |
| Mode | Description |
|---|---|
| WalkMode | Normal walking |
| WalkControlWaist | Walking with waist control |
| RunMode | Running |
Direct keyboard control via a pygame-based joystick:
export ROBOT_IP=<YOUR_G1_IP>
dimos run unitree-g1-joystick| Blueprint | Description |
|---|---|
unitree-g1-basic |
Connection + ROS navigation + visualization |
unitree-g1-basic-sim |
Simulation with A* navigation |
unitree-g1 |
Navigation + perception + spatial memory |
unitree-g1-sim |
Simulation with perception + spatial memory |
unitree-g1-agentic |
Full stack with LLM agent and G1 skills |
unitree-g1-agentic-sim |
Agentic stack in simulation |
unitree-g1-full |
Agentic + SHM image transport + keyboard teleop |
unitree-g1-joystick |
Navigation + keyboard teleop |
unitree-g1-detection |
Navigation + YOLO person detection and tracking |
unitree-g1-shm |
Navigation + perception with shared memory image transport |
uintree-g1-primitive-no-nav |
Sensors + visualization only (no navigation, base for custom blueprints) |
Blueprints compose incrementally:
primitive (sensors + vis)
├── basic (+ connection + navigation)
│ ├── basic-sim (sim connection + A* nav)
│ ├── joystick (+ keyboard teleop)
│ └── detection (+ YOLO person tracking)
├── perceptive (+ spatial memory + object tracking)
│ ├── sim (sim variant)
│ └── shm (+ shared memory transport)
└── agentic (+ LLM agent + G1 skills)
├── agentic-sim (sim variant)
└── full (+ SHM + keyboard teleop)
- Navigation Stack — path planning and autonomous exploration
- Visualization — Rerun, Foxglove, performance tuning
- Data Streams — RxPY streams, backpressure, quality filtering
- Transports — LCM, SHM, DDS
- Blueprints — composing modules
- Agents — LLM agent framework