Video Representation Learning


Several deep learning architectures have been proposed to classify videos. These architectures create spatio-temporal feature spaces in a hierarchical manner on their way to the final classification. This project creates a framework that allows the user to compare the feature space learnt by each architecture by running experiments using three versions of the BouncingMNIST dataset. More in depth information about this will soon be available in a preprint version of a scientific paper (currently in writing stage).

This project was part of my PhD.

this is a placeholder image
This is a figure caption. Photo by Arseny Togulev on Unsplash


The main goal of this project is to setup a framework that allows the user to compare the spatio-temporal feature spaces learnt by different deep learning architectures, allowing the user to make a more knowledgeable decision as to which architectures should be used in different circumstances.

Current State

The code is mostly done, however it is lacking documentation, which will be added soon. The results obtained from this project have been compiled into a paper, that will also be available in the near future.


Leave a comment