Re-implemented and enhanced a Hierarchical Deep Temporal Model for group activity recognition
(CVPR 2016), with significant performance improvements.
● Developed and evaluated multiple baselines:
○ Frame-level classifier (ResNet-50): 85.1% accuracy.
○ Person-level features (ResNet-50 + pooling): 75.2% accuracy.
○ Temporal model (ResNet-50 + LSTM, 9-frame sequences): 86.6% accuracy.
● Applied on a volleyball dataset with both frame-level and person-level annotations.
● Tech stack: PyTorch, Torchvision, ResNet-50, LSTM, Scikit-learn.