这门课以8周设计,分成 4个核心问题,每个核心问题约需2周的时间来探讨.每个约2个小时的录影中,每个小时为一个主题,以会各分成4到5个小段落,每个段落里会有一个后多个随堂的练习.我们在探讨每个核心问题的第二周。依上所述,課程的規畫如下:
When Can Machines Learn? [何时可以使用机器学习]
Why Can Machines Learn? [为什么机器可以学习]
How Can Machines Learn? [机器可以怎么样学习]
How Can Machines Learn Better? [機器可以怎麼樣學得更好]
延伸阅读
預备知识
作业零 (機率統計、線性代數、微分之基本知識)
参考书籍
Learning from Data: A Short Course , Abu-Mostafa, Magdon-Ismail, Lin, 2013.
经典文献
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386-408, 1958. (第二講:Perceptron 的出處)
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963. (第四講:Hoeffding‘s Inequality)
Y. S. Abu-Mostafa, X. Song , A. Nicholson, M. Magdon-ismail. The bin model, 1995. (第四講:bin model 的出處)
V. Vapnik. The nature of statistical learning theory, 2nd edition, 2000. (第五到八講:VC dimension 與 VC bound 的完整數學推導及延伸)
Y. S. Abu-Mostafa. The Vapnik-Chervonenkis dimension: information versus complexity in learning. Neural Computation, 1(3):312-317, 1989. (第七講:VC Dimension 的概念與重要性)
參考文獻
A. Sadilek, S. Brennan, H. Kautz, and V. Silenzio. nEmesis: Which restaurants should you avoid today? First AAAI Conference on Human Computation and Crowdsourcing, 2013. (第一講:ML 在「食」的應用)
Y. S. Abu-Mostafa. Machines that think for themselves. Scientific American, 289(7):78-81, 2012. (第一講:ML 在「衣」的應用)
A. Tsanas, A. Xifara. Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy and Buildings, 49: 560-567, 2012. (第一講:ML 在「住」的應用)
J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel. Introduction to the special issue on machine learning for traffic sign recognition. IEEE Transactions on Intelligent Transportation Systems 13(4): 1481-1483, 2012. (第一講:ML 在「行」的應用)
R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize. IEEE Spectrum, 46(5):29-33, 2009. (第一講:Netflix 大賽)
S. I. Gallant. Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1(2):179-191, 1990. (第二講:pocket 的出處,注意到實際的 pocket 演算法比我們介紹的要複雜)
R. Xu, D. Wunsch II. Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645-678, 2005. (第三講:Clustering)
X. Zhu. Semi-supervised learning literature survey. University of Wisconsin Madison, 2008. (第三講:Semi-supervised)
Z. Ghahramani. Unsupervised learning. In Advanced Lectures in Machine Learning (MLSS ’03), pages 72–112, 2004. (第三講:Unsupervised)
L. Kaelbling, M. Littman, A. Moore. reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4: 237-285. (第三講:Reinforcement)
A. Blum. On-Line algorithms in machine learning. Carnegie Mellon University,1998. (第三講:Online)
B. Settles. Active learning literature survey. University of Wisconsin Madison, 2010. (第三講:Active)
D. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7): 1341-1390. (第四講:No free lunch 的正式版)
T. M. Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, 14(3):326–334, 1965. (第五到六講:Growth Function)
B. Zadrozny, J. Langford, N. Abe. Cost sensitive learning by cost-proportionate example weighting. IEEE International Conference on Data Mining, 2003. (第八講:Weighted Classification)
G. Sever, A. Lee. Linear Regression Analysis, 2nd Edition, Wiley, 2003. (第九講:Linear Regression 由統計學的角度來分析;第十二到十三講:Polynomial Transform 後再做 Linear Regression)
D. C. Hoaglin, R. E. Welsch. The hat matrix in regression and ANOVA. American Statistician, 32:17–22, 1978. (第九講:Linear Regression 的 Hat Matrix)
D. W. Hosmer, Jr., S. Lemeshow, R. X. Sturdivant. Applied Logistic Regression, 3rd Edition, Wiley, 2013 (第十講:Logistic Regression 由統計學的角度來分析)
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. International Conference on Machine Learning, (第十一講:Stochastic Gradient Descent 用在線性模型的理論分析)
R. Rifkin, A. Klautau. In Defense of One-Vs-All Classification. Journal of Machine Learning Research, 5: 101-141, 2004. (第十一講:One-versus-all)
J. Fürnkranz. Round Robin Classification. Journal of Machine Learning Research, 2: 721-747, 2002. (第十一講:One-versus-one)
L. Li, H.-T. Lin. Optimizing 0/1 loss for perceptrons by random coordinate descent. In Proceedings of the 2007 International Joint Conference on Neural Networks (IJCNN ’07), pages 749–754, 2007. (第十一講:一個由最佳化角度出發的 Perceptron Algorithm)
G.-X. Yuan, C.-H. Ho, C.-J. Lin. Recent advances of large-scale linear classification. Proceedings of IEEE, 2012. (第十一講:更先進的線性分類方法)
Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, C.-J. Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 11(2010), 1471-1490. (第十二講:一個使用多項式轉換加上線性分類模型的方法)
M. Magdon-Ismail, A. Nicholson, Y. S. Abu-Mostafa. Learning in the presence of noise. In Intelligent Signal Processing. IEEE Press, 2001. (第十三講:Noise 和 Learning)
A. Neumaier, Solving ill-conditioned and singular linear systems: A tutorial on regularization, SIAM Review 40 (1998), 636-666. (第十四講:Regularization)
T. Poggio, S. Smale. The mathematics of learning: Dealing with data. Notices of the American Mathematical Society, 50(5):537–544, 2003. (第十四講:Regularization)
P. Burman. A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika, 76(3): 503–514, 1989. (第十五講:Cross Validation)
R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial intelligence (IJCAI ’95), volume 2, 1137–1143, 1995. (第十五講:Cross Validation)
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Occam’s razor. Information Processing Letters, 24(6):377–380, 1987. (第十六講:Occam‘s Razor)
关于Machine Learning更多讨论与交流,敬请关注本博客和新浪微博songzi_tea.
NTU-Coursera机器学习:机器学习基石 (Machine Learning Foundations)
原文地址:http://blog.csdn.net/songzitea/article/details/43922131