Humanoids
Daily
Published on

NVIDIA Details Open-Source GR00T N1 Foundation Model and Hover Controller for Humanoids

Authors

TL;DR: At miniCON, NVIDIA’s Yuke Zhu unveiled Project GR00T, the company’s bold push to build open, generalist AI models for humanoid robots. Centered around the open-source GR00T N1 model and Hover whole-body controller, the platform combines vision, language, and real-time control. Inspired by the success of LLMs, NVIDIA aims to create “specialized generalists” — robots capable of learning and adapting across domains — and to provide the tools, not the bots, to power the future of embodied AI.

Image credit: Yuke Zhu, NVIDIA

NVIDIA Details Project GR00T and Open Models for Humanoid Robots

NVIDIA is pushing forward with its ambition to create foundational AI models for general-purpose humanoid robots, aiming to replicate the transformative impact large language models (LLMs) have had on natural language processing. At the recent miniCON Open Source AI conference, Yuke Zhu, Principal Research Scientist at NVIDIA Research, outlined the company's strategy and introduced key components of its robotics platform, including the open-source GR00T N1 model and the Hover whole-body controller.

From Specialist Systems to Generalist Foundations

Zhu framed NVIDIA's efforts, collectively known as Project GR00T (Generalist Robot 00 Technology), as a move beyond traditional robotics research, which has historically focused on creating highly specialized systems for narrow tasks (e.g., solving a Rubik's Cube, navigating specific terrain). Inspired by the success of LLMs, Project GR00T aims to build generalist foundation models trained on vast datasets.

Image credit: Yuke Zhu, NVIDIA

The idea, Zhu explained, is not to create a 'jack of all trades, master of none,' but rather a 'specialized generalist' – a robot with core competencies capable of continuous learning and adaptation to new domains. This generalist model can then serve as a base for developing more capable specialist applications, mirroring how LLMs underpin various NLP tools.

Simulation-First and the "Three Computer Problem"

NVIDIA leverages a simulation-first approach, emphasizing the critical role of synthetic data generated within its Omniverse and Isaac Sim platforms. This involves what Zhu termed the "three computer problem":

  1. OVX: Systems for generating high-quality synthetic data in simulation.
  2. DGX: Supercomputing infrastructure for large-scale model training.
  3. AGX: Edge computing platforms for deploying trained models onto robots.
Image credit: Yuke Zhu, NVIDIA

This pipeline underpins the development and deployment of NVIDIA's robotic foundation models.

GR00T N1: An Open Humanoid Foundation Model

Zhu introduced GR00T N1, described as the world's first open humanoid foundation model. It takes multimodal inputs (vision and natural language instructions) and outputs robot actions.

Image credit: Yuke Zhu, NVIDIA

The model employs a dual-system architecture inspired by human cognition ('Thinking, Fast and Slow'):

  • System 2 (Slow Brain): A Vision Language Model (VLM) processes images and language instructions, handling high-level reasoning, object identification, and spatial understanding.
  • System 1 (Fast Brain): A diffusion transformer model takes input from System 2 and robot state encodings to generate low-level, closed-loop control commands at 120 Hz.

Demonstrations showed GR00T N1 enabling robots to perform tasks like picking unseen fruits, collaborating with humans (handing over objects), and coordinating between two robots in a simulated factory setting – all autonomously and in real-time. Notably, the model exhibits cross-embodiment capabilities, having been deployed on hardware from partners like 1X Technologies for household chores.

NVIDIA has open-sourced GR00T N1, making it accessible via GitHub. Zhu highlighted that researchers can experiment with it even using low-cost (<$100) robot arms.

Image credit: Yuke Zhu, NVIDIA

Hover: A Unified Whole-Body Controller

Complementing the foundation model, NVIDIA also presented Hover, a neural whole-body controller designed to manage the diverse physical capabilities required by humanoids (e.g., locomotion, manipulation). Trained using reinforcement learning in Isaac Lab on retargeted human motion capture data, Hover aims to provide a single policy that supports multiple control modes. Isaac Lab from NVIDIA is "an open-source, unified framework for robot learning" with a goal to provide high-fidelity physics simulation. It is available on Github.

Interestingly, Zhu noted that this generalist controller outperformed previous specialist controllers designed for individual modes. Hover is also available on GitHub.

Enabling the Ecosystem

Throughout the presentation, Zhu emphasized that NVIDIA does not manufacture its own humanoid robots. Instead, the company focuses on building the enabling tools, computing platforms, and foundation models for its partners in the robotics ecosystem. The open-sourcing of GR00T N1 and Hover reflects this strategy, aiming to accelerate development across the field.

NVIDIA's push into generalist models and open platforms signals a significant effort to establish the foundational infrastructure for the next generation of humanoid robots.

Yuke Zhu Principal Research Scientist at NVIDIA
Yuke Zhu, Principal Research Scientist at NVIDIA Research

Watch the talk here: