[CVPR2024] OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Music] we present OMG a novel framework that synthesizes highquality motions from open vocabul text [Music] prompts we carefully tailor the pre-trained and fine-tune Paradigm into the text to motion generation first we leverage large scale unlabeled motion data to pre-train an unconditional diffusion model with up to 1 billion parameters then we freeze the pre-trained model and adopt a conditional fine-tuning scheme called motion control net to condition it on the text embeddings of the clip text encoder during inference the pre-trained unconditional denoiser and the fine-tuned conditional denoiser are combined with classifier free guidance generating realistic motions with zero shot text inputs here we show the results generated by our OMG model given various text prompts our method enables fine grained control of complicated and Abstract motion trait descriptions here we show more results our model effectively handles either a single phrase or longer natural sentences even generalized to zero shot open vocabulary text prompts We compare our method with previous state-of-the-art methods our method can generate high quality human motions that better align with text prompts we further conduct several ablation studies first we compare the four variants of romg model with different model sizes the largest OMG giant with 1 billion parameters significantly outperforms the others then we study the effect of our model designs the worst performance highlights our techn iCal contributions thanks for watching

Info

Channel: ShanghaiTech Digital Human

Views: 1,057

Rating: undefined out of 5

Keywords:

Id: 1M4c2eZFTk0

Channel Id: undefined

Length: 4min 19sec (259 seconds)

Published: Fri Dec 01 2023