Alex Berg Home : Human Pose Estimation

Human Pose Estimation from Images

The goal of this work is to find the pose of humans in still images. This is a difficult problem because of the wide variety in appearance of people due to clothing, lighting, and, of course, pose. Dealing with this variation while exploiting high level knowledge about what people do and how they appear is an unsolved problem in computer vision.

So far we have explored two main directions: techniques based on exemplars, and based on a bottom up approach.

Skeletons taken from the nearest matching exemplar using geometric blur and warping for better alignment.		The exemplar based approach relies on modern computational power to store many possible examples of the appearance of humans in various poses, and requires a robust similarity measure (eg geometric blur based descriptors) to compare novel images to stored exemplars. The skeleton associated with the best matching exemplar is used for the novel image. An aligning transformation between the exemplar and the novel image can be applied to the skeleton to improve accuracy. Video or motion information can be used when available. This approach has the advantage that arbitrarily complex priors on pose and appearance can be handled by simply sampling the possibilities. The disadvantage is that the number of samples or exemplars may increase beyond a manageable level as the range of possible appearances increases. Papers
*Skeleton taken from the nearest matching exemplar using an optical flow based descriptor (middle). Matching to a rendered 3d figure (right).*

Finding the correspondence from parts of a model to locations on an image. The locations are selected using a bottom up approach starting with edge detection.

Example pose estimations (click for larger version).

Our bottom up approach uses low level detectors based on edges to find parts of an image that might be limbs. These are then assembled into an estimate of the whole body configuration using pairwise constraints and approximate integer quadratic programming as in our work on shape matching and object recognition. The IQP framework allows more general constraints between body parts than can be modeled with traditional tree structured models. The bottom up approach can provide better localization than exemplar based approaches and can handle a wide range of poses with lower computational and storage requirements, but suffers when the low level limb detectors fail. (More from Xiaofeng's page.)

Papers

We are working to build a hybrid of these two approaches and apply this to finding poses of humans in still images and video sequences.

Papers

	Finding human poses in still frames using exemplars and warping for fine alignment: Video Based Motion Synthesis by Splicing and Morphing [pdf] [ps] Gregory P. Mori, Alexander C. Berg, Alexei A. Efros, Ashley Eden, Jitendra Malik U.C. Berkeley Technical Report UCB/CSD-4-1337, June 2004.
	Finding human poses in video using exemplars based on optical flow: Recognizing Action at a Distance [pdf] [ps] Alexei A. Efros, Alexander C. Berg, Gregory P. Mori, Jitendra Malik International Conference on Computer Vision (ICCV) 2003, Nice, pp 726-733.
	Finding human poses in still frames from the bottom up using approximate integer quadratic programming: Recovering Human Body Configurations using Pairwise Constraints Between Parts [pdf] [ps] Xiaofeng Ren, Alexander C. Berg, Jitendra Malik International Conference on Computer Vision (ICCV) 2005, Beijing, pp. 824-831.