ICCV2009

tutorials

September 27th, Morning (9:30AM)

MAP Inference in Discrete Models, Part I
Lecturers:Pushmeet Kohli, M Pawan Kumar, Carsten Rother
Location:The Clock Tower, 1F, Centennial Hall
URL:http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/
Abstract:Many problems in Computer Vision involve computing the most probable values of certain random variables. This problem, known as Maximum a Posteriori (MAP) estimation has been widely studied in Computer Science and the resulting algorithms have led to accurate and reliable solutions for many problems in computer vision and information engineering. This tutorial is aimed at researchers who wish to use and understand these algorithms. The tutorial will answer the following questions: How to formalize and solve some known vision problems using MAP inference of a random field? What are the different genres of MAP inference algorithms, and how do they work? Which algorithm is suitable for which problem? and lastly, what are the recent developments and open questions in this field?
Variational Optical Flow Estimation
Lecturers:Thomas Brox, Andrés Bruhn
Location:The Clock Tower, 2F, International Conference Hall II
URL:http://www.mia.uni-saarland.de/bruhn/iccv2009/index.shtml
Abstract:Combining dense flow fields and subpixel accuracy within a sound optimization framework, variational methods are appealing for many computer vision tasks that require optical flow information. This course will provide a comprehensible introduction into the theoretical background of recent state-of-the-art methods and shows the modelling options that will allow the participants to adapt the model to their specific needs. The course will also cover the numerics behind the models which are decisive for implementation. The focus will be on efficiently solving the very large linear or nonlinear systems with multigrid methods, which allow to compute dense flow fields in real-time.
Local Texture Descriptors in Computer Vision
Lecturers:Matti Pietikäinen, Guoying Zhao
Location:The Clock Tower, 2F, International Conference Hall III
URL:http://www.ee.oulu.fi/~gyzhao/ICCVTutorial/index.htm
Abstract:This tutorial presents how local texture descriptors can be used for solving various computer vision problems. The local binary patterns (LBP) are used as example descriptors. Part I overviews the milestones of texture research since the 1960's. Part II deals with LBP operators in spatial domain, with applications in recognizing 3D textured surfaces, interest region description, face recognition, and background subtraction. Part III deals with local spatiotemporal operators. A simple spatiotemporal LBP-TOP operator is introduced, and applied to dynamic texture recognition and segmentation, facial expression recognition, visual speech recognition, recognition of actions and gait, and video texture synthesis. Finally, Part IV concludes the tutorial and presents some challenges for future research.
Computer Vision in the Analysis of Master Drawings and Paintings
Lecturer:David G. Stork
Location:Faculty of Engineering Bldg.#3, 2F, Room W2
URL:http://www.diatrope.com/stork/CourseDescriptions.html
Abstract:This course is an introduction to the application of computer vision and image analysis to problems in art and art history, specifically realist art. Realist paintings are a rich source of information, both of the scene portrayed and the techniques the artist used to render that scene. Students will learn the principles of perspective and how to apply perspective analysis to paintings to infer vanishing points, locate perspective inconsistencies and to determine whether the artist used perspective constructions or tools. Students will learn how to infer the number, color, and position of light sources based on position, color and blur of cast shadows and highlights along occluding boundaries. Students will learn how to estimate sizes of depicted objects based on perspective and fiducial or reference objects or relationships. Students will learn how to estimate "camera parameters" of the artist (or imaging system), such as the effective magnification, focal length and in some cases aberrations. Some of these methods require no more than ruler and pencil, others require commercial software (e.g., Adobe Illustrator), others were adapted from their use in forensic analysis of digital photographs and require powerful commercial image processing packages (including ones based on C++, Matlab, Mathematica), and yet others require researchers to write special code.

September 27th, Afternoon (2:00PM)

MAP Inference in Discrete Models, Part II
Lecturers:Pushmeet Kohli, M Pawan Kumar, Carsten Rother
Location:The Clock Tower, 1F, Centennial Hall
URL:http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/
Abstract:Many problems in Computer Vision involve computing the most probable values of certain random variables. This problem, known as Maximum a Posteriori (MAP) estimation has been widely studied in Computer Science and the resulting algorithms have led to accurate and reliable solutions for many problems in computer vision and information engineering. This tutorial is aimed at researchers who wish to use and understand these algorithms. The tutorial will answer the following questions: How to formalize and solve some known vision problems using MAP inference of a random field? What are the different genres of MAP inference algorithms, and how do they work? Which algorithm is suitable for which problem? and lastly, what are the recent developments and open questions in this field?
Human-centered Vision Systems
Lecturers:Hamid Aghajan, Nicu Sebe
Location:Faculty of Engineering Bldg.#3, 2F, Room W2
URL:http://www.science.uva.nl/~nicu/iccv09-tutorial.html
Abstract:We take a holistic approach to the human-centered vision systems problem. We aim to identify the opportunities in addressing novel applications, and the potentials for fruitful future research directions. In particular, we introduce key concepts, discuss technical approaches and open issues in three areas: multimodal interaction: visual (body, gaze, gesture) and audio (emotion) analysis; smart environments; distributed and collaborative fusion of visual information. The tutorial sets forth application design examples in which a user-centric methodology is adopted across the different stages from feature and pose estimation in early vision to user behavior modeling in high-level reasoning. The role of query for user's feedback will be discussed with examples in smart home applications. The course will motivate the use of multiple sensors in the environment as well as contextual information for effective data and decision fusion, and will focus on the user interaction techniques formulated from the perspective of key human factors such as adaptation to user preferences and behavior models. Several applications based on the notion of user-centric design will be introduced and discussed.
Modeling Natural Image Statistics for Computer Vision
Lecturers:Siwei Lyu, Stefan Roth
Location:Faculty of Engineering Bldg.#3, 1F, Room N1
URL:http://www.gris.informatik.tu-darmstadt.de/teaching/iccv2009/index.en.htm
Abstract:As David Marr put it, vision is a process that produces from images of the external world a description that is useful to the viewer. But, as the basic input to any computer vision or biological vision system, the total number of all possible images is enormous. As a simple example, there are about 101,000 different 8-bit gray-scale images of a size as small as 20 by 20 pixels, while the current estimated total number of atoms in the universe is only about 10100. However, most of these images, such as those that can be obtained by picking values for each pixel randomly, look like noise and lack any interesting structure. Moreover, they are very unlikely to be encountered by an imaging device (an eye or a camera) in the real world. Those that do, on the other hand, are loosely tagged as *natural images*.
Though occupying only a tiny fraction of the image space, natural images stand out with particular statistical properties, which play an essential role in low-level computer vision tasks, where corruptions that can affect higher-level vision tasks, such as noise, blur, damage, and low resolution, are reduced and removed. Similar challenges exist for a variety of other dense scene representations and their applications, including scene depth and image motion. Recently, we have witnessed a surge of interest in modeling statistics of natural images in the computer vision community with applications to problems ranging from low-level (e.g., denoising, super-resolution, inpainting, de-blurring), over mid-level (e.g., segmentation, color constancy, scene categorization) to high-level vision (e.g., object recognition). This short course will give an introduction to the basic aspects of natural image statistics, focusing on basic representations and statistical regularities. It will also describe recent developments in modeling natural image statistics and their applications to computer vision tasks.
Coloring Visual Search
Lecturers:Cees G. M. Snoek, Theo Gevers, Arnold W. M. Smeulders
Location:Faculty of Engineering Bldg.#3, 2F, Room N2
URL:http://staff.science.uva.nl/~cgmsnoek/coloringvisualsearch/
Abstract:We focus on the scientific challenges in visual search using color, present methods how to achieve state-of-the-art performance, and indicate how to obtain improvements in the near future. Moreover, we give an overview of the latest developments and future trends in the field of visual search based on the Pascal VOC and TRECVID benchmarks -- the leading benchmarks for image and video retrieval.

September 28th, Morning (9:30AM)

Sparse Coding and Dictionary Learning for Image Analysis
Lecturers:Francis Bach, Julien Mairal, Jean Ponce, Guillermo Sapiro
Location:The Clock Tower, 2F, International Conference Hall II and III
URL:http://www.di.ens.fr/~mairal/tutorial_iccv09/
Abstract:Sparse coding, that is, modelling data vectors as sparse linear combinations of basis elements is widely used in machine learning, neuroscience, signal processing, and statistics. This tutorial focuses on learning the basis set, also called dictionary, to adapt it to specific data, an approach that has recently proven to be very effective for signal reconstruction and classification in the audio and image processing domains. The course will provide an intuitive view of classical sparse decomposition and dictionary learning techniques and present a unique perspective that combines learning theory, optimization, image analysis and computer vision.
Physics-Based Human Motion Modelling for People Tracking
Lecturers:Marcus A. Brubaker, Leonid Sigal, David J. Fleet
Location:Faculty of Engineering Bldg.#3, 2F, Room W2
URL:http://www.cs.toronto.edu/~ls/iccv2009tutorial
Abstract:Physics-based models have proved to be effective in modeling how people move in, and interact with, their environment. In areas such as computer graphics, robotics and biomechanics physics-based models play a central role in modelling human motion. Recently, physics-based prior models have been successfully illustrated to address issues in human pose tracking such as out-of-plane rotations and foot skate. We posit that physics-based prior models are among the next important steps in developing more robust methods to track human motion over time. However, the models involved are conceptually challenging and carry a high overhead for those unfamiliar with Newtonian mechanics. This tutorial will cover the motivation for the use of physics-based models for tracking of articulated objects (e.g., people), as well as the formalism required for someone unfamiliar with these models to get started. We will provide the slides, notes, and Matlab code that will allow a capable novice to proceed along this innovative research path.
Structured Prediction in Computer Vision
Lecturers:Tibério Caetano, Richard Hartley
Location:Faculty of Engineering Bldg.#3, 3F, Room W3
URL:http://tiberiocaetano.com/iccv_tutorial/
Abstract:This tutorial will review basic methods of structured prediction, i.e., supervised learning of discriminative models when the output domain is extremely high dimensional and the output variables are interdependent. This is the case for many fundamental vision problems such as image labeling and image matching. As learning engines, we cover max-margin and maximum-likelihood estimators, including structured SVMs and CRFs. As inference engines, we cover graph-cuts, variable elimination and junction trees. The effectiveness of learning structured prediction models will be illustrated in real vision problems from several domains, including graph and point-pattern matching, image segmentation, joint object categorization and stereo matching.

September 28th, Afternoon (2:00PM)

Boosting and Random Forest for Visual Recognition
Lecturers:Tae-Kyun Kim, Jamie Shotton, Björn Stenger
Location:Faculty of Engineering Bldg.#3, 3F, Room W3
URL:http://mi.eng.cam.ac.uk/~tkk22/iccv09_tutorial
Abstract:The classification speed is not just a matter of time-efficiency but is often crucial to achieve good accuracy in many visual recognition tasks. In this tutorial, we review Boosting, Random Forest and present comparative studies with insightful discussions. A boosting classifier, a standard method in related fields, can be seen as a flat tree structure, which ensures reasonably smooth decision regions. Random Forest, an ensemble of random trees, has many short paths to reach the decision regions for fast classification. We compare the two methods in object detection and segmentation problems and highlight online learning of the methods for adaption and tracking.
Numerical Geometry of Non-Rigid Objects
Lecturers:Michael Bronstein, Alexander Bronstein
Location:Faculty of Engineering Bldg.#3, 2F, Room W2
URL:http://tosca.cs.technion.ac.il/book/course_iccv09.html
Abstract:Non-rigid shapes are ubiquitous in the world surrounding us, at all levels from micro to macro. The need to study such shapes and model their behavior arises in the fields of computer vision, pattern recognition, and graphics in a wide spectrum of applications ranging from medical imaging to security. The course is a self-contained comprehensive introduction to analysis and synthesis of non-rigid shapes, with a good balance between theory, numeric methods, and applications. One of the main emphases will be on practical methods. Examples of applications from computer vision and pattern recognition, computer graphics, and geometry processing will be shown.
Recognizing and Learning Object Categories: Year 2009
Lecturers:Li Fei-Fei, Rob Fergus, Antonio Torralba
Location:The Clock Tower, 2F, International Conference Hall II and III
URL:http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
Abstract:Learning and recognition of object categories have been one of the most important research topics in computer vision in the past decade. This tutorial is the third one offered by the same group of researchers since 2005. We will discuss classical papers in object recognition, as well as the most current advances in this topic.