标签:
Machine Learning
CMSC 422 Spring 2013
|
Jump to: [Schedule] [Homework] [Handin]
Machine learning is all about finding patterns in data. The whole idea is to replace the "human writing code" with a "human supplying data" and then let the system figure out what it is that the person wants to do by looking at the examples. The most central concept in machine learning is generalization: how to generalize beyond the examples that have been provided at "training time" to new examples that you see at "test time." A very large fraction of what we‘ll talk about has to do with figuring out what generalization means. We‘ll look at it from lots of different perspectives and hopefully gain some understanding of what‘s going on.
There are a few cool things about machine learning that I hope to get across in class. The first is that it‘s broadly applicable. These techniques have led to significant advances in many fields, including stock trading, robotics, machine translation, computer vision, medicine, etc. The second is that there is a very close connection between theory and practice. While this course is more on the "practical" side of things, almost everything we will talk about has a huge amount of accompanying theory. The third is that once you understand the basics of machine learning technology, it‘s a very open field and lots of progress can be made quickly, effectively by figuring out ways to formalize whatever we can figure out about the world.
Prerequisites: I take prerequisites seriously. There will be a lot of math in this class and if you do not come prepared, life will be rough. You must be able to take derivatives by hand (preferably of multivariate functions). You must know what the chain rule of probability is, and Bayes‘ rule. More background is not necessary but is helpful: for instance, dot products and their relationship to projections onto subspaces, and what a Gaussian is and why it‘s okay if it‘s density is greater than one. I‘ve provided some reading material to refresh these issues in your head, but if you haven‘t at least seen these things before, you should beef up your math background before class begins. On the programming side, projects will be in Python; you should understand basic computer science concepts (like recursion), basic data structures (trees, graphs), and basic algorithms (search, sorting, etc.). (If you know matlab, here‘s a nice cheat sheet.)
At most half of class time will be "lecture-ish," which means you must read. We will spend the rest of class time doing exercises, working interactively on projects (mostly on Wednesdays and Fridays), and having youpresent homework solutions (on Mondays). Students are encouraged to bring laptops to class: in fact, if you do not have one and would like one (during class), let me know and I‘ll arrange to have some supplied by the CS department.
Your responsibilities are as follows:
Given that this is a three credit class, I expect you to spend nine hours per week working on ML stuff. 2h15m will be in class. Of the remaining 6h45m, I expect about three to be spent reading/watching videos (one hour per class period), including written homeworks and three to be spent on projects (the remaining 45m spread elsewhere). If things are taking significantly more time than this, you should talk to us to see if we need to adjust or if there‘s some key background piece we‘ve incorrectly assumed.
The purpose of grading (in my mind) is to provide extra incentive for you to keep up with the material and to ensure that you exit the class as a machine learning genius. If everyone gets an A, that would make me happy (sadly, it hasn‘t happened yet). The components of grading are:
15% | Written homeworks There are fifteen written homeworks (roughly one per week). Each is worth 1% of your final grade. They are graded on a high-pass (100%), low-pass (50%) or fail (0%) basis. These are to be completed individually. (The initial homework, HW00, is not graded, but required if you do not want to fail.) |
|
30% | Programming projects There are three programming projects, each worth 10% of your final grade. You will be graded on both code correctness as well as your analysis of the results. These must be completed in teams of two or three students. |
|
15% | Midterm exam There will be an in-class midterm exam, obviously to be completed individually. You may bring one sheet of notes. |
|
30% | Final exam There will be a (cumulative) final exam, during the official slot, to be completed individually. You may bring one sheet of notes. |
|
10% | Participation Both in-class and Piazza-based participation. You can get participation points by volunteering for stuff in class or answering or asking questions on Piazza. |
Late homeworks are not allowed. Period. No exceptions. The time deadlines are automatic and unforgiving. Late projects are allowed: you get two extra days. However, once the project is 1 minute late, you lose 25% (absolute). We will post notes on Piazza when assignments have been graded. If you handed something in and do not get a score for an assignment, you have a one week moritorium on complaints.
Your overall grade in the class will be based on the following scale: 90+ (A), 80+ (B), 70+ (C), 60+ (D). If you‘re in the "012" range (eg, 90-92) then you‘ll get a "minus"; if you‘re in the "789" range (eg., 87-89) you‘ll get a "plus." These letter grades are lower bounds: I may adjust them up, but will not adjust them down. You can view your grades on grades.cs for individual assignments.
There are no official books for this course.
Our primary source will be a collection of notes (aka CIML) I have been writing.
Other recommended (but not required) books:
|
The following schedule is subject to change, but likely not by very much. The readings listed are readings that you should have finished by that date. One thing that students have pointed out in the past that I‘ll point out to you is that Wikipedia has a bunch of good articles related to machine learning and statistics. Especially basic statistics stuff (various distributions, rules of probability, etc.) are very well explained there. I highly recommend it as an alternative source of information.
Date | Topics | Required Readings |
Optional Readings |
Due |
W 23 Jan | Welcome to machine learning | - | - | - |
F 25 Jan | Decision trees and inductive bias | CIML 1-1.6 | - | HW00 |
M 28 Jan | Dealing with data | CIML 1.7-1.10 | - | HW01 |
W 30 Jan | Lab: exploring trees, interpreting results | None | - | - |
F 01 Feb | Geometry and nearest neighbors | CIML 2-2.3 | - | - |
M 04 Feb | K-means clustering | CIML 2.4-2.6 | - | HW02 |
W 06 Feb | Lab: geometric models | None | - | - |
F 08 Feb | Perceptrons | CIML 3-3.5 | - | - |
M 11 Feb | Perceptrons II | CIML 3.5-3.7 | - | HW03 |
W 13 Feb | Lab: perceptrons and linear models | None | - | - |
F 15 Feb | The importance of good features | CIML 4-4.4 | - | - |
M 18 Feb | Catch-up | - | - | HW04 |
W 20 Feb | Evaluation and debugging | CIML 4.5-4.8 | - | P1 |
F 22 Feb | Lab: which algorithm is best? | None | - | - |
M 25 Feb | Imbalanced and multiclass classification | CIML 5-5.2 | - | HW05 |
W 27 Feb | Ranking and collective classification | CIML 5.3-5.5 | - | - |
F 01 Mar | Lab: multiclass classification | None | - | - |
M 04 Mar | Linear models and gradient descent | CIML 6-6.4 | - | HW06 |
W 06 Mar | Lab: gradient descent | None | - | - |
F 08 Mar | Class cancelled for visit day | - | - | - |
M 11 Mar | Subgradient descent and support vector machines | CIML 6.5-6.7 | - | HW07 |
W 13 Mar | Midterm review | None | - | - |
F 15 Mar | Midterm | None | - | - |
M 25 Mar | Naive Bayes models | CIML 7-7.5 | - | HW08 |
W 27 Mar | Lab: Naive Bayes | None | - | P2 |
F 29 Mar | Conditional probabilistic models | CIML 7.6-7.7 | - | - |
M 01 Apr | Neural networks | CIML 8-8.3 | - | HW09 |
W 03 Apr | Deep neural networks | CIML 8.4-8.6 | - | - |
F 05 Apr | Kernels | CIML 9-9.3 | - | - |
M 08 Apr | No class | - | - | HW10 |
W 10 Apr | Lab: kernels | None | - | - |
F 12 Apr | No class | - | - | - |
M 15 Apr | Support vector machines II | CIML 9.4-9.6 | - | HW11 |
W 17 Apr | K-means revisited | CIML 13-13.1 | - | - |
F 19 Apr | PCA and kPCA | CIML 13.2 | - | - |
M 22 Apr | Digging into Data I | Unix4Poets | - | HW12 |
W 24 Apr | Lab: digging into data | None | - | - |
F 26 Apr | Digging into Data II | - | - | - |
M 29 Apr | Online learning | TBD | - | HW13 |
W 01 May | Markov decision processes | Chapter 3 | - | - |
F 03 May | Imitation learning: DAgger | TBD | - | - |
M 06 May | Lab: imitation learning | None | - | P3,HW14 |
W 08 May | Review for final exam | None | - | - |
F 17 May | Final Exam, 1:30-3:30pm | - | - | - |
All written homeworks are due on before class on the date listed (i.e., by 3:55p). See the schedule above for due dates. All projects are due on Wednesdays by 10pm. Everything should go through our online handin system.
Although you won‘t need to use any of this software for your homeworks/projects, there are a large number of open-source machine learning toolkits out there. A small sample:
Cheating: Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else‘s solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don‘t take that chance - if you‘re having trouble understanding the material, please let us know and we will be more than happy to help.
ADA: Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first two weeks of the semester. You may reach them at 301-314-7682 or by visiting Susquehanna Hall on the 4th Floor.
College guidelines: Document concerning adding, dropping, etc. here.
【转载】Machine Learning CMSC 422 Spring 2013
标签:
原文地址:http://www.cnblogs.com/daleloogn/p/4235044.html