https://piazza.com/uoit.ca/fall2024/csci4440u/home
Lab times and locations are available here.
Labs and inclass exercises will be submitted through course canvas site.
This course explores the techniques and methodologies for the recognition and analysis of human actions in images and video sequences. Designed for senior-year computer science students, the course covers both classical and contemporary approaches to action recognition, integrating theory with hands-on application. This course is particularly well-suited for students who are interested in pursuing research or careers in computer vision, artificial intelligence, or related fields.
The students will be exposed to a curated selection of influential papers that cover a range of methodologies for action recognition. Each week, students will read assigned papers, critically analyze them, and participate in in-depth class discussions to explore the strengths, weaknesses, and potential applications of the proposed methods.
In addition to the discussions, students will choose one or more papers to implement. This hands-on project will allow them to reproduce key results, experiment with variations of the methods, and possibly propose and test their own improvements. Through this process, students will gain a deeper understanding of the technical challenges and considerations involved in human action recognition, as well as experience in implementing and evaluating research ideas.
By the end of the course, students will have not only gained knowledge of the latest trends and techniques in action recognition but also developed practical skills in reading, critiquing, and implementing complex research papers.
A working knowledge of machine learning and deep learning (CSCI 4050, CSCI 4052, or equivalent) and familiarity with basic computer vision techniques (CSCI 3240U, preferrably CSCI 4220U, or equivalent).
Fundamentals of Action Recognition: Introduction to the field, including definitions, challenges, and applications in areas such as surveillance, human-computer interaction, and sports analytics.
Feature Extraction and Representation: Study of feature extraction methods, such as optical flow, spatial-temporal interest points, and deep feature representations using Convolutional Neural Networks (CNNs) and Transformers.
Machine Learning and Deep Learning Techniques: Application of machine learning models, including Support Vector Machines (SVMs) and Random Forests, as well as deep learning architectures like RNNs, LSTMs, and 3D CNNs for action recognition tasks.
Spatio-Temporal Modeling: Techniques for capturing and modeling the spatial and temporal dimensions of action in videos, including the use of spatio-temporal graphs, attention mechanisms, and multi-stream networks.
Datasets and Evaluation: Overview of popular datasets used in the field, such as UCF101, Kinetics, and AVA, and discussion of evaluation metrics like accuracy, F1 score, and mean Average Precision (mAP).
Applications and Case Studies: Exploration of real-world applications and case studies in action recognition, from autonomous vehicles to entertainment and beyond.
Emerging Trends: Discussion of the latest trends in the field, such as zero-shot learning, self-supervised learning, and the integration of action recognition with other computer vision tasks like object detection and scene understanding.
By the end of this course, students will be able to:
A student must get 50% in the course project to pass the course. Furthermore, a student must get 50% in the two midterms to pass the course. Class attendence is not optional.
Ontario Tech University’s academic calendar that lists important dates (and deadlines) is available at here.
The list presented below is by no means complete.
Each week, a paper will be assigned to one or more students, who will lead the discussion on that paper.
The course project is an independent exploration of a specific problem within the context of this course.
The topic of the project will be decided in consultation with the instructor.
Project grade will depend on the ideas, how well you present them in the report, how well you position your work in the related literature, how thorough are your experiments and how thoughtful are your conclusions.
Teams of up to two students are allowed.
You are required to prepare a three-minutes video that provides an overview of your project. You may frame these videos as pitch videos to investors–having broad understanding of computers science, information technology, and artificial intelligence landscape–who are considering investing in your business that is built around the technology that you have developed in your project.
For your final project write-up you must use ACM SIG Proceedings Template (available at the ACM website). Project report is at most 12 pages long, plus extra pages for references.
Alternately, you can use the following template (from “Tech Report ala MIT AI Lab (1981):
We will use the following textbook to cover the fundamentals needed to understand the material covered in this course.