This is an implementation of very deep two stream CNNs for action recognition. The implementation is inspired by Wang et. al., Some improvements from Wang's implementation include reading videos from LDMB database, faster

This is a fork used for video action recognition, mainly two-stream CNN networks. Some un-official layers developed or merged into this repo:

  1. FlowData layer: use a FlowData Reader to read flow data from LDMB database.
  2. Modified DataTransformer methods: which can read images from resized images, rescale back and then do transformations.


Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}

