Big Data (Fall 2013)

Lecture 0 (Aug 21) Syllabus. (See reading 1 & 2 below for next class.)
Lecture 1 (Aug 26) went over bounds in N. Alon paper, exercise for next class, implement online algorithm for estimating mean. Exercise: Play with distribution of input and design choices for online algorithm. Bring some plots and comments to class on Wed.
Lecture 2 (Aug 28) went over G.S. Manku paper, go over exercise. Two more papers to read for next class (3 & 4 below) remember to write/print (1 short) paragraph summaries for each and bring to class..
Lecture 3 (Sep 4) went over exercise, convergence of estimate of the mean depends on variance. Discuss streaming algorithms. Read LSH paper (5) for next class, bring your written summaries to class.
Lecture 4 (Sep 9) went over distributions of distances between points in high dimensional spaces. Began approximate nearest neighbor and the LSH paper (including min hash for set similarity). Optional reading (Broder et al on shingling (min hash))
Lecture 5 (Sep 11) went over more details about distributions of distances between points, and alternatives for locality sensitive hash functions (bit selection from unary). No reading for next week.
Lecture 6 (Sep 16) will cover computing means on high dimensional data, and preview readings on classification. Prepare summary for this reading for Wednesday's class. Also, exercise question, does covariance between dimensions matter for sample complexity of estimating mean of high dimensional data?
Lecture 7 (Sep 18) go over PAC learning, begin SVMs. Reading for Monday here.
Lecture 8 (Sep 23) go over reading, introducing SVMs, quick project discussion.
Lecture 9 (Sep 25) go over some math for SVMs, talk about exercise.
Lecture 10 (Sep 30) go over exercise steps in more detail, read chapter 4 from Hastie et al and write 1 paragraph summary for next class. Optionally for more information about SVMs look at chapter 12, no summary needed.
Lecture 11 (Oct 2) Talk about fitting models and homework exercise.
Lecture 12 (Oct 7) Go over visualizations of loss/regularization/objective for learning homework. (homework exercise due tonight).
Lecture 13 (Oct 9) Difference between regression and classification. (no reading for next class, but
Lecture 14 (Oct 14) 2nd order methods and averaged stochastic gradient descent.
Lecture 15 (Oct 16) Introduction to neural networks and perceptron.
Lecture 16 (Oct 21) Discuss projects.
Lecture 17 (Oct 23) Discuss projects.
Lecture 18 (Oct 28) Mark Levoy talk on Google Glass
Lecture 19 (Oct 30) Parallel algorithms for learning (see readings 11,12,13).
Lecture 20 (Oct 4) Map-reduce
Lecture 21 (Nov 6) Boosting
Lecture 23 (Nov 11) kd-trees and rp-trees. Hand in summery for readig 14 for next class.
Lecture 24 (Nov 13) more trees.
Lecture 25 (Nov 18) Project presentations (~5 minute presentation of your project progress, 3-5 slides -- problem, data, initial experiments/plots)
Lecture 26 (Nov 20) Project presentations (~5 minute presentation of your project progress, 3-5 slides -- problem, data, initial experiments/plots)

Reading for streaming approximation of statistics over one dimension
(1)The space complexity of approximating the frequency moments

N. Alon, Y. Matias, M. Szegedy in STOC 1996, JCSS 1999

(2)Approximate frequency counts over data streams

G.S. Manku, R. Motwani in VLDB 2002

(3)Space/Time Trade-offs in Hash Coding with Allowable Errors

B.H. Bloom in Communications of the ACM 13(7) pp 422-426, 1970

(4)Lossy Conservative Update (LCU) Sketch: Succinct Approximate Count Storage

A. Goyal and H Daumé III in AAAI 2011

Reading for hashing
(5)Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

A. Andoni and P. Indyk in CACM 2008
Readings for learning
(6)An introduction to computational learning theory

M.Kearns & U. Vazirani 1994

(7)The elements of statistical learning

T. Hastie, R. Tibshirani, & J. Friedman 2001

(8)Large-Scale Machine Learning with Stochastic Gradient Descent

L. Bouttou Compstat 2010

(9)Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent

W. Xu arXiv 2011

(10)A Stochastic Quasi-Newton Method for Online Convex Optimization

N.N. Schraudolf, J.Yin, and S Gunter AISTATS 2007
(11)A Reliable Effective Terascale Linear Learning System
Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford arXiv 2011-2013

(12)Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein in Foundations and Trends in Machine Learning

(13)HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. In Advances in NIPS 2011
Readings for Trees
(14)Random projection trees and low dimensional manifolds

S. Dasgupta and Y. Freund in STOC 2008