Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
59.7
TFLOPS
s
Tom-Neverwinter
41
1
8
Follow
Aanuoluwapo65's profile picture
LeroyDyer's profile picture
21world's profile picture
5 followers
ยท
22 following
Tom-Neverwinter
AI & ML interests
Making improvements to help the world.
Recent Activity
reacted
to
ginigen-ai
's
post
with ๐
about 5 hours ago
๐ณ The RoboCasa Kitchen Leaderboard What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) โ and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control. RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks โ picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more โ inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck. The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison. This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables: ๐ Kitchen 24-task (matched) โ head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust. โ Other protocols โ self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate. ๐ค GR1-Tabletop โ a different, humanoid-based variant suite, separated to avoid confusion. Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself. ๐ https://huggingface.co/spaces/ginigen-ai/robocasa-kitchen-leaderboard
reacted
to
ginigen-ai
's
post
with ๐ฅ
about 5 hours ago
๐ณ The RoboCasa Kitchen Leaderboard What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) โ and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control. RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks โ picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more โ inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck. The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison. This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables: ๐ Kitchen 24-task (matched) โ head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust. โ Other protocols โ self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate. ๐ค GR1-Tabletop โ a different, humanoid-based variant suite, separated to avoid confusion. Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself. ๐ https://huggingface.co/spaces/ginigen-ai/robocasa-kitchen-leaderboard
new
activity
about 1 month ago
lunahr/Marlin-2B-ungated:
ungated?
View all activity
Organizations
None yet
models
4
Sort:ย Recently updated
Tom-Neverwinter/ew-lora
Updated
Aug 16, 2024
โข
3
Tom-Neverwinter/ts-lora
Updated
Aug 16, 2024
โข
4
Tom-Neverwinter/cr-lora
Updated
Aug 16, 2024
โข
1
Tom-Neverwinter/sw-lora
Updated
Aug 16, 2024
datasets
0
None public yet