Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<p align="center">
|
| 2 |
+
<img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanPortrait/refs/heads/main/assets/pics/logo.png" height=100>
|
| 3 |
+
</p>
|
| 4 |
+
|
| 5 |
+
<div align="center">
|
| 6 |
+
<h2><font color="red"> HunyuanPortrait </font></center> <br> <center>Implicit Condition Control for Enhanced Portrait Animation</h2>
|
| 7 |
+
|
| 8 |
+
<a href='https://arxiv.org/abs/2503.18860'><img src='https://img.shields.io/badge/ArXiv-2503.18860-red'></a>
|
| 9 |
+
<a href='https://kkakkkka.github.io/HunyuanPortrait/'><img src='https://img.shields.io/badge/Project-Page-Green'></a>  [](https://github.com/Tencent-Hunyuan/HunyuanPortrait)
|
| 10 |
+
</div>
|
| 11 |
+
|
| 12 |
+
## π Requirements
|
| 13 |
+
* An NVIDIA 3090 GPU with CUDA support is required.
|
| 14 |
+
* The model is tested on a single 24G GPU.
|
| 15 |
+
* Tested operating system: Linux
|
| 16 |
+
|
| 17 |
+
## Installation
|
| 18 |
+
|
| 19 |
+
```bash
|
| 20 |
+
git clone https://github.com/Tencent-Hunyuan/HunyuanPortrait
|
| 21 |
+
pip3 install torch torchvision torchaudio
|
| 22 |
+
pip3 install -r requirements.txt
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
## Download
|
| 26 |
+
|
| 27 |
+
All models are stored in `pretrained_weights` by default:
|
| 28 |
+
```bash
|
| 29 |
+
pip3 install "huggingface_hub[cli]"
|
| 30 |
+
cd pretrained_weights
|
| 31 |
+
huggingface-cli download --resume-download stabilityai/stable-video-diffusion-img2vid-xt --local-dir . --include "*.json"
|
| 32 |
+
wget -c https://huggingface.co/LeonJoe13/Sonic/resolve/main/yoloface_v5m.pt
|
| 33 |
+
wget -c https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/vae/diffusion_pytorch_model.fp16.safetensors -P vae
|
| 34 |
+
wget -c https://huggingface.co/FoivosPar/Arc2Face/resolve/da2f1e9aa3954dad093213acfc9ae75a68da6ffd/arcface.onnx
|
| 35 |
+
huggingface-cli download --resume-download tencent/HunyuanPortrait --local-dir hyportrait
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
And the file structure is as follows:
|
| 39 |
+
```bash
|
| 40 |
+
.
|
| 41 |
+
βββ arcface.onnx
|
| 42 |
+
βββ hyportrait
|
| 43 |
+
β βββ dino.pth
|
| 44 |
+
β βββ expression.pth
|
| 45 |
+
β βββ headpose.pth
|
| 46 |
+
β βββ image_proj.pth
|
| 47 |
+
β βββ motion_proj.pth
|
| 48 |
+
β βββ pose_guider.pth
|
| 49 |
+
β βββ unet.pth
|
| 50 |
+
βββ scheduler
|
| 51 |
+
β βββ scheduler_config.json
|
| 52 |
+
βββ unet
|
| 53 |
+
β βββ config.json
|
| 54 |
+
βββ vae
|
| 55 |
+
β βββ config.json
|
| 56 |
+
β βββ diffusion_pytorch_model.fp16.safetensors
|
| 57 |
+
βββ yoloface_v5m.pt
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
## Run
|
| 61 |
+
|
| 62 |
+
π₯ Live your portrait by executing `bash demo.sh`
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
video_path="your_video.mp4"
|
| 66 |
+
image_path="your_image.png"
|
| 67 |
+
|
| 68 |
+
python inference.py \
|
| 69 |
+
--config config/hunyuan-portrait.yaml \
|
| 70 |
+
--video_path $video_path \
|
| 71 |
+
--image_path $image_path
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Framework
|
| 75 |
+
<img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanPortrait/refs/heads/main/assets/pics/pipeline.png">
|
| 76 |
+
|
| 77 |
+
## TL;DR:
|
| 78 |
+
HunyuanPortrait is a diffusion-based framework for generating lifelike, temporally consistent portrait animations by decoupling identity and motion using pre-trained encoders. It encodes driving video expressions/poses into implicit control signals, injects them via attention-based adapters into a stabilized diffusion backbone, enabling detailed and style-flexible animation from a single reference image. The method outperforms existing approaches in controllability and coherence.
|
| 79 |
+
|
| 80 |
+
# πΌ Gallery
|
| 81 |
+
|
| 82 |
+
Some results of portrait animation using HunyuanPortrait.
|
| 83 |
+
|
| 84 |
+
More results can be found on our [Project page](https://https://kkakkkka.github.io/HunyuanPortrait/).
|
| 85 |
+
|
| 86 |
+
## Cases
|
| 87 |
+
|
| 88 |
+
<table>
|
| 89 |
+
<tr>
|
| 90 |
+
<td width="25%">
|
| 91 |
+
|
| 92 |
+
https://github.com/user-attachments/assets/b234ab88-efd2-44dd-ae12-a160bdeab57e
|
| 93 |
+
|
| 94 |
+
</td>
|
| 95 |
+
<td width="25%">
|
| 96 |
+
|
| 97 |
+
https://github.com/user-attachments/assets/93631379-f3a1-4f5d-acd4-623a6287c39f
|
| 98 |
+
|
| 99 |
+
</td>
|
| 100 |
+
<td width="25%">
|
| 101 |
+
|
| 102 |
+
https://github.com/user-attachments/assets/95142e1c-b10f-4b88-9295-12df5090cc54
|
| 103 |
+
|
| 104 |
+
</td>
|
| 105 |
+
<td width="25%">
|
| 106 |
+
|
| 107 |
+
https://github.com/user-attachments/assets/bea095c7-9668-4cfd-a22d-36bf3689cd8a
|
| 108 |
+
|
| 109 |
+
</td>
|
| 110 |
+
</tr>
|
| 111 |
+
</table>
|
| 112 |
+
|
| 113 |
+
## Portrait Singing
|
| 114 |
+
|
| 115 |
+
https://github.com/user-attachments/assets/4b963f42-48b2-4190-8d8f-bbbe38f97ac6
|
| 116 |
+
|
| 117 |
+
## Portrait Acting
|
| 118 |
+
|
| 119 |
+
https://github.com/user-attachments/assets/48c8c412-7ff9-48e3-ac02-48d4c5a0633a
|
| 120 |
+
|
| 121 |
+
## Portrait Making Face
|
| 122 |
+
|
| 123 |
+
https://github.com/user-attachments/assets/bdd4c1db-ed90-4a24-a3c6-3ea0b436c227
|
| 124 |
+
|
| 125 |
+
## Acknowledgements
|
| 126 |
+
|
| 127 |
+
The code is based on [SVD](https://github.com/Stability-AI/generative-models), [DiNOv2](https://github.com/facebookresearch/dinov2), [Arc2Face](https://github.com/foivospar/Arc2Face), [YoloFace](https://github.com/deepcam-cn/yolov5-face). We thank the authors for their open-sourced code and encourage users to cite their works when applicable.
|
| 128 |
+
Stable Video Diffusion is licensed under the Stable Video Diffusion Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.
|
| 129 |
+
This codebase is intended solely for academic purposes.
|
| 130 |
+
|
| 131 |
+
# πΌ Citation
|
| 132 |
+
If you think this project is helpful, please feel free to leave a starβοΈβοΈβοΈ and cite our paper:
|
| 133 |
+
```bibtex
|
| 134 |
+
@article{xu2025hunyuanportrait,
|
| 135 |
+
title={HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation},
|
| 136 |
+
author={Xu, Zunnan and Yu, Zhentao and Zhou, Zixiang and Zhou, Jun and Jin, Xiaoyu and Hong, Fa-Ting and Ji, Xiaozhong and Zhu, Junwei and Cai, Chengfei and Tang, Shiyu and Lin, Qin and Li, Xiu and Lu, Qinglin},
|
| 137 |
+
journal={arXiv preprint arXiv:2503.18860},
|
| 138 |
+
year={2025}
|
| 139 |
+
}
|
| 140 |
+
```
|