ACT β€” ALOHA Single-Arm (Left) β€” 13.4k steps

Action Chunking Transformer (ACT) policy for a single-arm (LEFT) Trossen ALOHA manipulation task β€” autonomous O2 mask placement on a human surrogate (MEDEVAC-inspired).

This is the initial 13.4k-step baseline run (S001). For the production-shipped 40k retrain, see JHeisler/aloha_solo_left_4_6_26_act_left_40k.

Training Config

Field Value
Architecture ACT (ResNet18 backbone + 4-layer Transformer encoder + VAE chunking head)
Dataset JHeisler/aloha_solo_left_4_6_26 β€” 50 episodes, 29,785 samples, 30 fps
State / action dim 9 / 9
Cameras cam_high, cam_left_wrist (3Γ—480Γ—640 each)
Steps 13,400
Batch size 48
Learning rate 6e-5 (linear warmup 500 β†’ cosine)
Total samples seen 640K (21 epochs)
AMP enabled
torch.compile enabled
Final loss 0.029
Final grad norm 0.80
Wall clock ~2h 3min on RTX A4500
LeRobot pin 96c7052777aca85d4e55dfba8f81586103ba8f61

Hardware

Trained locally on NVIDIA RTX A4500 (20 GB VRAM). Pipeline ported from the original Colab notebook (Tesla A100, batch=8, 80,000 steps); local-A4500 config preserves total samples seen at 2.5Γ— the wall-clock speedup.

Project Lineage

This is part of a 4-policy comparison study on the same dataset:

Workstream Model Steps Samples HF
S001 ACT 13,400 640K this repo
S002 Hybrid ACT+Diffusion 13,400 321K act_diffusion
S003 ACT (shipped) 40,000 1.92M act_left_40k
S004 Hybrid ACT+Diffusion 40,000 1.12M act_diffusion_40k

Usage

from lerobot.common.policies.act.modeling_act import ACTPolicy
policy = ACTPolicy.from_pretrained("JHeisler/aloha_solo_left_4_6_26_act_left")

Citation / Course

EN.525.681 school project β€” JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.

Code reference: HuggingFace LeRobot at commit 96c7052.

Downloads last month
61
Video Preview
loading

Dataset used to train JHeisler/aloha_solo_left_4_6_26_act_left