This page provides guidelines to install and use the software of [1, 2, 3] for human activity recognition. Please contact me for any questions.


  1. Problem Overview
  2. Our Approach
  3. Download
  4. Installation
  5. How To Run
  6. Using Your Own Dataset

Problem Overview

Training a system to recognize human activities usually requires a large and correctly labeled dataset. However, labeling can be a very challenging task due to the following reasons:

  • Choosing the exact transition point between consecutive activities is hard (see Figure 1).
  • Some labels may be hard to discern. For example, in the case of occlusion.
  • The labeling tasks are usually distributed to multiple annotators, and the annotated labels may contradict each other.

Figure 1: The subject performs two activities, moving (green) and placing (blue). Labels of the first two and last two frames are easy to annotate. However, assigning labels for the frames between the two activities is purely based on personal preferences. Image source: CAD-120 dataset, Saxena’s lab, Cornell University. [4]

Our Approach

We solve the problem in three ways:

  • We propose a novel graphical model [2] which uses latent variable for modeling sub-level semantics of human activities, see Figure 2.
  • We introduce the idea of soft labeling [1], a method that allows labeling a single video segment with multiple choices. The name is defined in contrast to the hard assignment of a single label for each video segment.
  • We propose a novel loss function [1] to incorporate soft labeling for Max-Margin learning.

The proposed model was evaluated on the CAD-60 and CAD-120 datasets for comparing with the state-of-the-art approaches.

Figure 2: Visualization of the latent components. The columns are six activities and rows refer to the four latent components. Due to the limitation of space, here only 6 activities are illustrated. Image source: CAD-120 dataset, Saxena’s lab, Cornell University. [4]


The software can be downloaded from

$ git clone

Alternatively, you can download the zip file from


Install software dependencies (for libDAI)

$ sudo apt-get install g++ make graphviz libboost-dev libboost-graph-dev libboost-program-options-dev libboost-test-dev libgmp-dev

To compile the software, make sure MATLAB_ROOT_FOLDER/bin is added to the system path.
Compilation of the software is rather simple. Go to the activity_recognition folder and run

$ make

This compiles the required packages libDAI and SVMStruct into two mex functions

  • libDAI generates inference/libdai/doinference.mexa64. The file is used as the inference engine, which predicts the states of the nodes based on a given factor graph.
  • SVMStruct generates svm-struct-matlab-1.2/svm_struct_learn.mexa64. The file is used to learn parameters of the graphical model using Structured SVM.

How To Run

For a quick demo with the pre-trained model, open Matlab and run

>> example_test

For learning a new model, run

>> example_training

More descriptions about the arguments can be found in example_training.m.

Using Your Own Dataset

If you would like to apply the software to other datasets, you need to modify two files

  1. Modify the data loading function CAD120/load_CAD120.m. The function loads the CAD-120 dataset and return with proper format for the learning framework. The function is called inside the main loop of activity_recognition_demo.m. You can replace it CAD120/load_CAD120.m with any customized function that can load your own data. In CAD120/load_CAD120.m you can find more detailed description on how to format the data.
  2. Modify the constant numStateY in the script learning_CAD120.m. The constant specifies the total number of activities to be recognized. The value for the CAD-120 dataset is set to 10.


The work is funded by the European project ACCOMPANY under grant agreement No. 287624


V0.3 – 18/06/2014

  • Make Y and Z as separate nodes in the graphical model. Easier for other extensions.

V0.2 – 14/06/2014

  • First public release.


  1. Ninghang Hu, Zhongyu Lou, Gwenn Englebienne, Ben Kröse. Learning to Recognize Human Activities from Soft Labeled Data, in Robotics: Science and Systems (RSS), 2014
  2. Ninghang Hu, Gwenn Englebienne, Zhongyu Lou, Ben Kröse. Learning Latent Structure for Activity Recognition, in IEEE International Conference on Robotics and Automation (ICRA), 2014
  3. Ninghang Hu, Gwenn Englebienne, Ben Kröse. A Two-layered Approach to Recognize High-level Human Activities, in IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2014
  4. Hema S Koppula, Rudhir Gupta, Ashutosh Saxena. Learning Human Activities and Object Affordances from RGB-D Videos, in International Journal of Robotics Research (IJRR), 2013