My App

EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer

现在是视频生成模型追求外观的保真度但是牺牲了基础的运动学原理学习, 无法生成人体复杂的动作. EchoMotion: 1)拓展了DiT到双分支结构支持不同的模态; 2) MVS-RoPE; 3) Motion-Video Two-Stage Training Strategy. 还提了个数据集HuMoVe, 80,000个视频-动作对

Loading...