ACT — ALOHA Single-Arm (Left) — 13.4k steps

Action Chunking Transformer (ACT) policy for a single-arm (LEFT) Trossen ALOHA manipulation task — autonomous O2 mask placement on a human surrogate (MEDEVAC-inspired).

This is the initial 13.4k-step baseline run (S001). For the production-shipped 40k retrain, see JHeisler/aloha_solo_left_4_6_26_act_left_40k.

Training Config

Field	Value
Architecture	ACT (ResNet18 backbone + 4-layer Transformer encoder + VAE chunking head)
Dataset	JHeisler/aloha_solo_left_4_6_26 — 50 episodes, 29,785 samples, 30 fps
State / action dim	9 / 9
Cameras	`cam_high`, `cam_left_wrist` (3×480×640 each)
Steps	13,400
Batch size	48
Learning rate	6e-5 (linear warmup 500 → cosine)
Total samples seen	~~640K (~~21 epochs)
AMP	enabled
torch.compile	enabled
Final loss	0.029
Final grad norm	0.80
Wall clock	~2h 3min on RTX A4500
LeRobot pin	`96c7052777aca85d4e55dfba8f81586103ba8f61`

Hardware

Trained locally on NVIDIA RTX A4500 (20 GB VRAM). Pipeline ported from the original Colab notebook (Tesla A100, batch=8, 80,000 steps); local-A4500 config preserves total samples seen at 2.5× the wall-clock speedup.

Project Lineage

This is part of a 4-policy comparison study on the same dataset:

Workstream	Model	Steps	Samples	HF
S001	ACT	13,400	640K	this repo
S002	Hybrid ACT+Diffusion	13,400	321K	act_diffusion
S003	ACT (shipped)	40,000	1.92M	act_left_40k
S004	Hybrid ACT+Diffusion	40,000	1.12M	act_diffusion_40k

Usage

from lerobot.common.policies.act.modeling_act import ACTPolicy
policy = ACTPolicy.from_pretrained("JHeisler/aloha_solo_left_4_6_26_act_left")

Citation / Course

EN.525.681 school project — JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.

Code reference: HuggingFace LeRobot at commit 96c7052.

Downloads last month: 61

Video Preview

Robotics

JHeisler
/

aloha_solo_left_4_6_26_act_left