TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

1The University of Tokyo, 2CyberAgent AI Lab

Video 0

Video 1

Video 2

Video 3

Video 4

Video 5

Video 6

Video 7

Video 8

Video 9

Show-Oliver
60s Reference Video + Unseen Target Speech
Reference Image
Co-Speech Gesture Video Generated from TANGO