Training a system to recognize human activities usually requires a large and correctly labeled dataset. However, labeling can be a very challenging task due to the following reasons:
- Choosing the exact transition point between consecutive activities is hard (see Figure 1).
- Some labels may be hard to discern. For example, in the case of occlusion.
- The labeling tasks are usually distributed to multiple annotators, and the annotated labels may contradict each other.
Figure 1: The subject performs two activities,
moving (green) and
placing (blue). Labels of the first two and last two frames are easy to annotate. However, assigning labels for the frames between the two activities is purely based on personal preferences. Image source: CAD-120 dataset, Saxena’s lab, Cornell University. 
We solve the problem in three ways:
- We propose a novel graphical model  which uses latent variable for modeling sub-level semantics of human activities, see Figure 2.
- We introduce the idea of soft labeling , a method that allows labeling a single video segment with multiple choices. The name is defined in contrast to the hard assignment of a single label for each video segment.
- We propose a novel loss function  to incorporate soft labeling for Max-Margin learning.
Figure 2: Visualization of the latent components. The columns are six activities and rows refer to the four latent components. Due to the limitation of space, here only 6 activities are illustrated. Image source: CAD-120 dataset, Saxena’s lab, Cornell University. 
The software can be downloaded from
$ git clone https://firstname.lastname@example.org/ninghang/activity_recognition_public.git
Alternatively, you can download the zip file from
Install software dependencies (for libDAI)
$ sudo apt-get install g++ make graphviz libboost-dev libboost-graph-dev libboost-program-options-dev libboost-test-dev libgmp-dev
To compile the software, make sure
MATLAB_ROOT_FOLDER/bin is added to the system path.
Compilation of the software is rather simple. Go to the
activity_recognition folder and run
- libDAI generates
inference/libdai/doinference.mexa64. The file is used as the inference engine, which predicts the states of the nodes based on a given factor graph.
- SVMStruct generates
svm-struct-matlab-1.2/svm_struct_learn.mexa64. The file is used to learn parameters of the graphical model using Structured SVM.
For a quick demo with the pre-trained model, open Matlab and run
For learning a new model, run
More descriptions about the arguments can be found in
If you would like to apply the software to other datasets, you need to modify two files
- Modify the data loading function
CAD120/load_CAD120.m. The function loads the CAD-120 dataset and return with proper format for the learning framework. The function is called inside the main loop of
activity_recognition_demo.m. You can replace it
CAD120/load_CAD120.mwith any customized function that can load your own data. In
CAD120/load_CAD120.myou can find more detailed description on how to format the data.
- Modify the constant
numStateYin the script
learning_CAD120.m. The constant specifies the total number of activities to be recognized. The value for the CAD-120 dataset is set to 10.
The work is funded by the European project ACCOMPANY under grant agreement No. 287624
V0.3 – 18/06/2014
- Make Y and Z as separate nodes in the graphical model. Easier for other extensions.
V0.2 – 14/06/2014
- First public release.
- Ninghang Hu, Zhongyu Lou, Gwenn Englebienne, Ben Kröse. Learning to Recognize Human Activities from Soft Labeled Data, in Robotics: Science and Systems (RSS), 2014
- Ninghang Hu, Gwenn Englebienne, Zhongyu Lou, Ben Kröse. Learning Latent Structure for Activity Recognition, in IEEE International Conference on Robotics and Automation (ICRA), 2014
- Ninghang Hu, Gwenn Englebienne, Ben Kröse. A Two-layered Approach to Recognize High-level Human Activities, in IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2014
- Hema S Koppula, Rudhir Gupta, Ashutosh Saxena. Learning Human Activities and Object Affordances from RGB-D Videos, in International Journal of Robotics Research (IJRR), 2013