55:145 Pattern Recognition
This is the web site of the Pattern Recognition Course Fall 2007. Click here to go to the old 2005 web page. A one-page overview of the course can be found here.
Teachers
Teacher: Bram van Ginneken (bramvanginneken@gmail.com). If you send email to me, please include PR in the subject. I get a lot of e-mail and this makes it easier for me to filter your mail. Office: 4114 SC. Office hours: Tuesdays and Thursday whenever there is a lecture given by me (see schedule below) from 2:30P - 4:30P.
TA: Richard Downe (richard-downe@uiowa.edu).
Book and literature
The course is structured around the book "Pattern Recognition and Machine Learning" by Christopher Bishop. See the website of the book. You can download solutions to exercises from that site. The book is refered to as PRML, and PRML7 would refer to Chapter 7, and PRMLp154 would refer to page 154 of the book.
Articles:
- Jain, A.; Duin, R. & Mao, J. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22, 4-37. A concise and excellent overview of the field (although limited as the title states to statistical PR, but Bishop is also mainly about that). It is good to read this as background material. The paper includes an experiment with a range of classifiers and classification techniques for one application. The project you will focus on in the second part of the course is aimed at doing a similar type of experiment with a given data set or problem.
- Herlocker, J.; Konstan, J. & Riedl, J. An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press New York, NY, USA, 1999, 230-237. A good introductory article about collaborative filtering (the term used to describe the problem of the Netflix Competition).
- Herlocker, J.; Konstan, J.; Terveen, L. & Riedl, J. Evaluating collaborative filtering recommender systems ACM Transactions on Information Systems, 2004, 22, 5-53. Review article about how to evaluate collaborative filtering systems. Contains a part on ROC analysis; this will be covered in Lecture 7 and is not presented in PRML.
- Mitra, P.; Murthy, C. A. & Pal, S. K. Density Based Multiscale Data Condensation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002 , 24 , 734-747. Simple, neat condensation algorithm. Will be discussed in the lecture on kNN classifiers.
- Decoste, D. & Schölkopf, B. Training Invariant Support Vector Machines Machine Learning, 2002, 46, 161-190. This article describes how certain invariances can be built in a support vector machines through the magical art of kernel engineering. A system is described for handwritten digit recognition using such tweaked support vector machines. This approach will be contrasted with the method described in the next paper.
- Belongie, S.; Malik, J. & Puzicha, J. Shape Matching and Object Recognition Using Shape Contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24, 509-522. Describes a method to compute a distance between two 'shapes' or 'images'. Based on the distances to some prototype shapes (images), classification and other tasks can be performed. This method is applied to the same handwritten digit database as the method described in the previous paper.
- Duin, R. Superlearning and neural network magic. Pattern Recognition Letters, 1994, 15, 215-217. Interesting critique of the neural network hype 13 years ago. If you use pattern recognition in your own research in 2007 and the years to come, it may be interesting to read this article and mentally replace words like 'neural network' and 'fuzzy', with 'support vector machine', 'kernel' and 'Bayesian' before you turn to the recent literature.
- Duin, R. & Pekalska, E. Open Issues in Pattern Recognition. In: Proc. Fourth International Conference on Computer Recognition Systems, 2005, 27-42. What are the future challenges for pattern recognition? To be discussed during the last lecture.
- Krizek, P.; Kittler, J. & Hlavac, V. Feature selection based on the training set manipulation. In: ICPR 2006. 18th International Conference on Pattern Recognition, vol 2. Boosting like feature selection algorithm.
- Jain, A. & Zongker, D. Feature selection: evaluation, application and small sample performance IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19, 153-158. Study that compares some feature selection strategies.
- Pudil, P.; Novovicova, J. & Kittler, J. Floating search methods in feature selection Pattern Recognition Letters, 1994, 15, 1119-1125. Classical, often used feature selection method.
Other books
The following books are not officially used for the course, but they may be useful for those interested in further study of pattern recognition
- Christianini, N. & Shawe-Taylor, J., 'An introduction to support vector machines and other kernel-based learning methods', Cambridge University Press, 2000
- Heijden, F. v.d.; Duin, R.P.W.; Ridder, D. d. & Tax, D.M.J., 'Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB', John Wiley and Sons, 2004
- Duda, R. O.; Hart, P. E. & Stork, D. G. Pattern Classification John Wiley and Sons, 2001
- Shawe-Taylor, J. & Cristianini, N. Kernel Methods for Pattern Analysis Cambridge University Press, 2004
- Schoelkopf, B. & Smola, A., 'Learning with kernels: support vector machines, regularization, optimization, and beyond', MIT Press, 2000
Resources
- The Gaussian Processes Web Site is an excellent resource for more information on these techniques, which are partly covered in PRML. The web site contains an on-line version of the book Gaussian Processes for Machine Learning by Carl Edward Rasmussen and Chris Williams.
- The Kernel Machines web site hosts a large set of tutorials, literature pointers and links to code for kernel methods, that are discussed in PRML.
- Impact factors
Schedule and lectures
Lectures are Tuesday and Thursday 4:30P - 5:45P in 1245 SC. The schedule below is tentative and subject to change. So please check the website regularly. I'll try to e-mail changes around as well. Note also that the linked pdf documents of lecture slides may change frequently during the course, but typically not after a lecture has been given.
- August 28: no lecture
- August 30: Lecture 1. Introduction to the course (pdf) and introduction of the Netflix Prize (pdf)
- Sep 4: Lecture 2. Introduction to Pattern Recognition. A worked out example (pdf).
- Sep 6: Lecture 3. Introduction continued
- Sep 11: Brainstorm about Netflix Prize. Homework 1: each team will present the result of a simple system that uses overall statistics per movie and per customer (mean score and/or histogram of scores) to construct a simple prediction system and report results of this system on the probe set. Moreover, ideas for more compound systems will be presented and discussed.
- Sep 13: Lecture 4. Regression (PRML1) (pdf)
- Sep 18: Lecture 5. Linear models for regression (PRML3) (pdf)
- Sep 20: Lecture 6. Linear and quadratic classifiers 1 (PRML4) (pdf)
- Sep 25: Lecture 7. Linear and quadratic classifiers 2 (PRML4) (pdf) Homework 2: see slide 25.
- Sep 27: Lecture 8. Non-parametric methods (partly covered in PRML2.5) (pdf)
- Oct 2: Lecture 9. Evaluation of PR systems (pdf)
- Oct 4: Lecture 10. Neural networks (PRML 5) (pdf) (Hand in Homework 2 before lecture starts)
- Oct 9: Netflix progress reports Every team gives a ten minute overview of what they've done so far
- Oct 11: Lecture 11. Kernels and Gaussian Processes (PRML 6) (pdf)
- Oct 16: Lecture 12. Support Vector Machines (PRML 7) (pdf)
- Oct 18: Lecture 13. Handwritten digit recognition (pdf)
- Oct 21: 11.59pm: Deadline for handing in Netflix papers
- Oct 23: Netflix papers presentations & discussion
- Oct 25: no lecture
- Oct 30: Guest Lecture
- Nov 1 no lecture
- Nov 6: Lecture 14. Clustering (PRML 9) (pdf)
- Nov 8: Lecture 15. Dimension reduction (PRML 12) (pdf)
- Nov 13: Lecture 16. Classifier combination (PRML 14) (pdf)
- Nov 15: Lecture 17. Feature selection (pdf)
- Nov 20: no lecture (Thanksgiving)
- Nov 22: no lecture (Thanksgiving)
- Nov 27: no lecture
- Nov 29: no lecture
- Dec 4: Lecture 18. Open issues in Pattern Recognition (pdf)
- Dec 5: Deadline for handing in papers of the final project
- Dec 6: Presentations
- Dec 11: Presentations
- Dec 13: Presentations
- Dec 19: Exam: SC1245, 7:00PM - 9:00PM
Course Grade Determination
- Homework: 10%
- Netflix Project: 30%
- Final Project: 30%
- Exam: 30%
Projects
The practical work of the course is centered around two larger projects.
Netflix project
In the first half of the course, you'll be working in teams to build a system for predicting movie ratings. Each team will sign up for the Netflix Prize competition. You'll compete not only with your fellow students' teams but with over 20,000 other teams from around the world.
Here are the teams for the Netflix project:
1 Faisal Amer Goussous
1 Ahmed Fathi Halaweish
1 Senthil Kumar Premraj
1 Joo Hyun Song
2 Zhiyun Gao
2 Yinxiao Liu
2 Lucas Dale Van Tol
2 Ziyue Xu
3 Michael Joseph Anderson
3 Atulya Srisudarshan Ram Iyengar
3 Josiah Michael Service
4 Bhavna Josephine Antony
4 Steffen Christian Herbort
4 Thomas Nguyen Hornbeck
4 Patrick M Kellen
5 Kunlin Cao
5 Mingqing Chen
5 Kai Ding
5 Yin Yin
6 David Quackenbush
6 Jeffrey Robert Yager
6 Alexandru Dorin Iuga
Some C++ code to process the data can be found here. Note: there is no guarantee this code is all correct. You can download processed binary files here (215MB). With these binary data files it takes around 70 seconds to read the data (on a fast 2GB RAM PC) and a few second to do simple experiments with mean per movie and customer on the complete probe set.
PR project
The (final project) is an individual project in which you'll experiment with data from a real-world pattern recognition task. Preferably you'll think of your own project, one that suits your research. You can also choose a project together with your teacher. For all projects you are free to pick the programming environment of your choice.
Prepare an 8 minute presentation. As a guideline for the presentation, spend 30% of your time on introducing the problem, 40% on the method, and 30% on results and end with 1 slide with a one or two sentence conclusion.
Here is the list of projects, with names, titles and dates of presentation:
Thursday December 6:
Thomas Nguyen Hornbeck: Automatic Detection of Microcalcifications in
Digital Mammograms Using Wavelet Transform
Faisal Amer Goussous: Automatic Calculation of the Number of Clusters in K-means
Senthil Kumar Premraj: Analysis of Motion Features in Predicting
Connective Tissue Disorder in Aorta
Alexandru Dorin Iuga: Predicting if the Hawk Eyes will win
Lucas Dale Van Tol: Off-Line Digit Recognition with Linear and Decision
Approximation Methods
Zhiyun Gao: Handwritten Digit Recognition based on Convolutional Neural Network
Jeffrey Robert Yager: no title given
Tuesday December 11:
Michael Joseph Anderson: Handwritten Digit Recognition Using Binary Classification Tree
Ziyue Xu: Influence of Pre-processing on Handwritten Digit Recognition
Bhavna Josephine Antony: Rotation invariant object recognition
Kai Ding: Artificial Neural Network Based Weather Forecasting System
Ahmed Fathi Halaweish: Pattern Recognition Based Secure Login Protocol
Mingqing Chen: Comparison With Different Methods In Handwritten Digit Recognition
Josiah Michael Service: Face Recognition by Boosting a StrongLinear-Discriminant Learner
Patrick M Kellen: Segmentation of Bolus from Videofluoroscopic Swallowing Studies using Classifiers
Thursday December 13:
Atulya Srisudarshan Ram Iyengar: Information Processing in Olfactory Receptor Neurons and a Method
of Classifying Responses to Odorants
Yinxiao Liu: Digit Recognition using kNN
Kunlin Cao: Digital Recognition Based on Component Analysis and Discriminants
Steffen Christian Herbort: Face recognition using Eigenfaces
Joo Hyun Song: Virtual Weatherman: A pattern recognition approach to weather prediction
David Quackenbush: A Speaker Verification System Using MFCC Coefficients, Dynamic Time
Warping, and Gaussian Mixture Models
Yin Yin: Knee Cartilage Area Identification
Reports
For both projects, you need to write a report. (One per team for the Netflix project). These reports will be handed in in the form of a conference
paper - 4 pages maximum, preferably in LaTeX, 2-column, font Times Roman
11, line spacing 0.9, with figures and postscript images. An example of
this type of LaTeX document can be found in
~image/Public/LaTeX
under the name IAU-sample.ps.
Your report should consist of the following sections:
Abstract
1. Introduction - what is the problem, motivation, previous work of
others, your original approach
2. Materials - data description
3. Methods - detailed description of the new approach
4. Results
5. Discussion of Results - comparision to results of others, comparision
of results to your primary approach
6. Conclusions
7. References - in addition to the 4-page limit, add 5-10 references (page 5)
Exam
The exam will focus on the theory of the course (the projects cover the practical aspects of doing pattern recognition). In the exam I will try to ask a large number of small questions regarding the classifiers and other algorithms discussed in the course.
The exam will be Wednesday December 19, 7-9PM (so: in the evening!) in SC1245 (where the lectures are).
Logo on top of page was taken from Flickr.