Course Schedule (Evolving)

Lecture recordings from Echo360 can be accessed here.

Date Day Topic Materials
9/3 Tue Course overview. Probability review, Markov's inequality. Estimating set size by counting duplicates. Slides.
Randomized Methods, Sketching & Streaming
9/5 Thu Chebyshev's inequality. Random hashing for efficient lookup and load balancing. 2-universal and pairwise independent hashing. Slides. Some notes (Arora and Kothari at Princeton) proving that the ax+b mod p hash function described in class in 2-universal.
9/10 Tue Union bound. Exponential tail bounds (Bernstein and Chernoff). Example applications. Slides. Some notes (Goemans at MIT) showing how to prove the Chernoff bound using the moment generating function + Markov's inequality approach discussed in class.
9/12 Thu Hashing continued. Bloom filters and their applications. Hashing for distinct elements. Slides. I've added a sketch of the correct Bloom filter analysis. Also see here. See here for some explaination of why a version of a Bloom filter with no false negatives cannot be achieved without using a lot of space.
9/17 Tue Distinct elements continued. Flajolet-Martin and HyperLogLog. Jaccard similarity for audio fingerprinting, document comparision, etc. The median trick. Slides. The 2007 paper introducing the popular HyperLogLog distinct elements algorithm. Chapter 4 of Mining of Massive Datasets, with content on bloom filters, distinct item counting.
9/19 Thu Jaccard similarity search with MinHash. Locality sensitive hashing and nearest neighbor search. Slides. Reading: Chapter 3 of Mining of Massive Datasets, with content on Jaccard similarity, MinHash, and locality sensitive hashing.
9/24 Tue Finish up MinHash and LSH. SimHash for consine similarity. Slides. Reading: Chapter 3 of Mining of Massive Datasets, with content on Jaccard similarity, MinHash, and locality sensitive hashing.
9/26 Thu The frequent elements problem. Misra-Gries summaries. Count-min sketch. Slides. Reading: Notes (Amit Chakrabarti at Dartmouth) on streaming algorithms. See Chapters 2 and 4 for frequent elements. Some more notes on the frequent elements problem. A website with lots of resources, implementations, and example applications of count-min sketch.
10/1 Tue Randomized dimensionality reduction and the Johnson-Lindenstrauss lemma. Applications to regression, clustering. Slides: Compressed/cleaned up, Raw from class. Reading: Chapter 2.7 of Foundations of Data Science on the Johnson-Lindenstrauss lemma. Notes on the JL-Lemma (Anupam Gupta CMU). Linear Algebra Review: Khan academy.
10/3 Thu Finish up JL Lemma. Slides. The Fast JL transform: speeding up random projection with the Fast Fourier transform. Sparse random projections which can be multiplied by more quickly. JL type random projections for the l1 norm using Cauchy instead of Gaussian random matrices.
Spectral Methods
10/8 Tue Principal component analysis, low-rank approximation, dimensionality reduction. Slides: Compressed/cleaned up, Raw from class. Reading: Chapter 3 of Foundations of Data Science and Chapter 11 of Mining of Massive Datasets on low-rank approximation and the SVD.
10/10 Thu Eigencomposition and application to PCA and low-rank approximation. Slides: Compressed/cleaned up, Raw from class.
10/15 Tue No Class, Monday Schedule.
10/17 Thu Midterm (In Class) Study guide and review questions.
10/22 Tue The singular value decomposition and its connection to eigendecomposition/PCA/low-rank approximation.
10/24 Thu Linear algebraic view of graphs. Applications to spectral clustering, community detection, network visualization.
10/29 Tue Computing the SVD: power method, Krylov methods. Problem Set 3 Released. On spectral methods. Due 11/11 (tentative).
Optimization
10/31 Thu Gradient descent. Analysis for convex functions, example applications.
11/5 Tue Stochastic and online gradient descent. Application to neural networks, non-convex analysis.
11/7 Thu Optimization for least squares regression. Linear algebraic and polynomial viewpoint. Iterative methods, connections to gradient descent and its variants.
11/12 Tue Optimization approaches for hard problems: alternating minimization and the EM algorithm. k-means clustering.
11/14 Thu TBD. Probem Set 4 Released. On optimization. Due 12/6 (tentative)
Assorted Topics
11/19 Tue Compressed sensing, Restricted isometry property, basis pursuit.
11/21 Thu Discrete Fourier transform, fast Fourier transform.
11/26 Tue No Class, Thanksgiving Recess.
11/28 Thu No Class, Thanksgiving Recess.
12/3 Tue Fourier methods continued.
12/5 Thu High-dimensional geometry, isoperimetric inequality.
12/10 Tue Differential privacy, algorithmic fairness.
12/19 Thu Final (10:30am-12:30pm in Thompson 104)