Action Recognition and SynthesisThe overarching goal of this work is to recognize human actions in video with enough precision to drive a synthesis algorithm for video of human actions. Underlying this goal are difficult problems such as building priors on human actions, smooth interpolation between video of articulated figures, video matting, and figure tracking with occlusion and clutter. The two papers described below cover some of this work. Recognizing Action at a Distance presents a descriptor for motion similarity that is used in a simple nearest neighbor framework for action recognition. In addition the paper presents simple techniques for using this machinery to produce novel video by splicing together sections of video with desired motions. Video Based Motion Synthesis by Splicing and Morphing goes into more detail about how to find good sections of video to splice together, and how to smooth the transitions between splices when figures are at high enough resolution to resolve limbs. Recognizing Action at a Distance [pdf] [ps]Alexei A. Efros, Alexander C. Berg, Gregory P. Mori, Jitendra Malik International Conference on Computer Vision (ICCV) 2003, Nice, pp 726-733.
Videos: Best Matching Motions Football Action Classification Tennis Action Classification Do as I do Tennis Do as I say Tennis 1 Do as I say Tennis 2 Best match to video of a single player, no smoothing. Greg in the World Cup Video Based Motion Synthesis by Splicing and Morphing [pdf] [ps] Gregory P. Mori, Alexander C. Berg, Alexei A. Efros, Ashley Eden, Jitendra Malik U.C. Berkeley Technical Report UCB/CSD-4-1337 In this paper we present a method for synthesizing videos of human motion by splicing together clips of input video. There are two main contributions in this work. The first is developing a method for kinematically correct morphing of images of human figure, which is used to splice together the clips in input video in a manner that produces smooth output sequences. The second contribution of this work is the application of activity recognition algorithms to our input data in order to automatically extract action labels, which allow us to control the synthesized video by issuing high-level action commands. We present results of synthetic sequences on two domains: ballet and tennis. |