AI companies are racing to master the art of video generation. Over the last few months, several players in the space, including Stability AI and Pika Labs, have released models capable of producing videos of different types with text and image prompts. Building on that work, Microsoft AI has dropped a model that aims to deliver more granular control over the production of a video.
Dubbed DragNUWA, the project supplements the known approaches of text and image-based prompting with trajectory-based generation. This allows users to manipulate objects or entire video frames with specific trajectories. This gives an easy way to achieve highly controllable video generation from semantic, spatial and temporal aspects – while ensuring high-quality output at the same time.
Microsoft has open-sourced the model weights and demo for the project, allowing the community to try out it. However, it is important to note that this is still a research effort and remains far from perfect.