### March 18

**Speaker: **Yatong Chen

**Title: **On Making Stochastic Classifiers Deterministic [paper]

**Abstract: **Stochastic classifiers arise in a number of machine learning problems, and have become especially prominent of late, as they often result from constrained optimization problems, e.g. for fairness, churn, or custom losses. Despite their utility, the inherent randomness of stochastic classifiers may cause them to be problematic to use in practice for a variety of practical reasons. In this paper, we attempt to answer the theoretical question of how well a stochastic classifier can be approximated by a deterministic one, and compare several different approaches, proving lower and upper bounds. We also experimentally investigate the pros and cons of these methods, not only in regard to how successfully each deterministic classifier approximates the original stochastic classifier, but also in terms of how well each addresses the other issues that can make stochastic classifiers undesirable.

### March 11

**Speaker: **Manuj Mukherjee

**Title: **Approximating Probability Distributions by ReLUNetworks

**Abstract: **How many neurons are needed to approximate a target probability distribution using a neural network with a given input distribution and approximation error? This paper examines this question for the case when the input distribution is uniform, and the target distribution belongs to the class of histogram distributions. We obtain a new upper bound on the number of required neurons, which is strictly better than previously existing upper bounds. The key ingredient in this improvement is an efficient construction of the neural nets representing piecewise linear functions. We also obtain a lower bound on the minimum number of neurons needed to approximate the histogram distributions.

**Bio: **Manuj Mukherjee is a postdoc at Bar Ilan University, Ramat Gan, Israel. He completed his PhD from the Indian Institute of Science, Bangalore. Manuj has also spent a couple of years in Telecom Paris (erstwhile Telecom ParisTech), Paris, France, as a postdoc before joining Bar Ilan University.

### March 4 (Preponed to 2pm)

**Speaker:** Andrew Stolman

**Title:** Challenges in geometric graph embeddings

**Abstract:** Many machine learning methods rely on exploiting the geometry of real-valued data, however network-structured data is fundamentally discrete and combinatorial. In order to bridge this gap, there is a growing body of work around learning real-valued embeddings of graph vertices which can then be fed to downstream ML tasks which includes DeepWalk, Node2Vec and Graph Convolutional Networks. This line of work appears to take for granted that an embedding into low-dimensional geometry which preserves desirable properties exists. In this talk, I will cover some recent theoretical and practical results which shed some doubt on this assumption.

### February 25

**Speaker:** Zhaowei Zhu

**Title: **Learning with instance-dependent label noise: A sample sieve approach

**Abstract:** In this talk, I will continue to introduce peer loss, and the confidence regularizer inspired by peer loss, to help mitigate the effects of noisy labels. Based on the confidence regularizer, we further design a dynamic sample sieve to safely filter out the corrupted instances. This work has been accepted by ICLR 2021. Both theoretical and empirical results will be presented.

### February 18

**Speaker:** C. Seshadhri

**Title:** Random walks and forbidden minors III: poly(d/ε)-time partition oracles for minor-free graph classes [paper]

**Abstract:** Consider the family of bounded degree graphs in any minor-closed family (such as planar graphs). Let d be the degree bound and n be the number of vertices of such a graph. Graphs in these classes have hyperfinite decompositions, where, for a sufficiently small ε > 0, one removes εdn edges to get connected components of size independent of n. An important tool for sublinear algorithms and property testing for such classes is the partition oracle, introduced by the seminal work of Hassidim-Kelner-Nguyen-Onak (FOCS 2009). A partition oracle is a local procedure that gives consistent access to a hyperfinite decomposition, without any preprocessing. Given a query vertex v, the partition oracle outputs the component containing v in time independent of n. All the answers are consistent with a single hyperfinite decomposition.

An important open problem in property testing of constructing partition oracles that run in poly(d/ε) time per query. We resolve this open problem, obtaining poly(d/ε)-query algorithms for additive εn-approximations for problems such as maximum matching, minimum vertex cover, maximum independent set, and minimum dominating set for these graph families.

Joint work with Akash Kumar and Andrew Stolman

### February 11

**Speaker: **Sabyasachi Basu

**Title: **Property testing in hyperfinite graphs [paper]

**Abstract: **We briefly go over the general idea behind property testing, and talk about property testing in the bounded degree model and some special graphs. More specifically, we talk about some major results due to Newman and Sohler pertaining to a special class of graphs called hyperfinite graphs. These can broadly be seen as an extension of the class of planar graphs, and the results use the characterization of such graphs using structures called k-discs. The results also provide a useful characterization of testable properties.

### February 4

**Speaker: **Reilly Raab

**Title: **Bayesian Truth Serum: Overview, Extensions and Limitations

**Abstract: **The Bayesian Truth Serum, proposed by Drazen Prelec in 2004, is an interesting and seminal mechanism for eliciting honest information from independent agents in Nash equilibrium without requiring the mechanism to have access to ground truth, though it makes several limiting assumptions. Several attempts have been made to generalize the mechanism to more robust settings, with mixed success. I will present the mechanism, discuss how and why it works, and detail my own work generalizing it to new settings.

### January 21

**Speaker: **Raghuvansh R. Saxena (graduate student, Princeton University)

**Title: **Interactive Error Resilience and the Surprising Power of Feedback

**Abstract: **Interactive error correcting codes are codes that encode a two party communication protocol to an error-resilient protocol that succeeds even if a constant fraction of the communicated symbols are adversarially corrupted, at the cost of increasing the communication by a constant factor. The fraction of corruptions that such codes can protect against is called the error resilience. Several recent results have shown that drastic gains in the error resilience can be achieved by using interactive codes that implement "feedback". I shall be reviewing (at least) two of these works in this talk.

Based on joint works with Klim Efremenko and Gillat Kol.

### January 14

**Speaker: **Yang Liu

**Ti****tle: **Learning when label noise presents

**Abstract: **In this talk, I will first explain the potential harms of training with corrupted labels, including the training accuracy drop and violation of fairness guarantees. I then discuss the popular solutions in the literature and their caveats. If time permits, I am going to introduce a cohort of new results from my group on introducing a family of new robust loss functions, which I call peer loss, to help mitigate the effects of noisy labels. Both theoretical and empirical results will be presented.

### January 7

**Speaker: **Noujan Pashanasangi

**Title: **Near-Linear Time Homomorphism Counting in Bounded Degeneracy Graphs: The Barrier of Long Induced Cycles [paper]

**Abstract: **Counting homomorphisms of a constant sized pattern graph H in an input graph G is a fundamental computational problem. There is a rich history of studying the complexity of this problem, under various constraints on the input G and the pattern H. Given the significance of this problem and the large sizes of modern inputs, we investigate when near-linear time algorithms are possible. We focus on the case when the input graph has bounded degeneracy, a commonly studied and practically relevant class for homomorphism counting. It is known from previous work that for certain classes of H, H-homomorphisms can be counted exactly in near-linear time in bounded degeneracy graphs. Can we precisely characterize the patterns H for which near-linear time algorithms are possible? We completely resolve this problem, discovering a clean dichotomy using fine-grained complexity. Let m denote the number of edges in G. We prove the following: if the largest induced cycle in H has length at most 5, then there is an O(*m* log *m*) algorithm for counting H-homomorphisms in bounded degeneracy graphs. If the largest induced cycle in H has length at least 6, then (assuming standard fine-grained complexity conjectures) there is a constant *y*>0, such that there is no o(*m ^{1+y}*) time algorithm for counting H-homomorphisms.

### April 27

**Speaker:** Sabyasachi Basu

**Title:** Semi-supervised learning with graphs and harmonic functions [paper]

**Abstract:** This is based on a paper from 2003 by Zhu, Ghahramani and Lafferty, that commonly goes by the name ZGL. Their work focuses on semi-supervised learning on graphs, where labelled data is scarce. The authors pose the learning problem as a Gaussian random field on the graph, and the natural choice for the objective is then minimised by a harmonic function. Such a harmonic function can be interpreted in terms of random walks, which implies connections between the learning problem and topics in electrical networks and spectral graph theory.

### April 20

**Speaker:** Andrew Stolman

**Title:** Inductive Representation Learning on Large Graphs [paper]

**Abstract:** Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

### April 6

**Speaker:** C. Seshadhri

**Title:** The impossibility of low-rank representations for triangle-rich complex networks, part 2 [video][paper]

**Abstract:** A common and increasingly prevalent technique for machine learning in graphs is "low-dimensional graph representation". Given a graph (like a social network), one converts the nodes to vectors in a low-dimensional space. These vectors become "features" for the nodes, which are then used for various ML tasks. Implicitly, these conversions are based on models of social networks, where proximity in low-dimensional space corresponds to edges of a graph. We argue that this model, contrary to popular belief, is fundamentally flawed. We formally prove that low-dimensional embeddings fail to capture the triangle structure of real social networks, thereby failing to model the social structure of the graph. We do a series of empirical studies that validate our theory.

### March 30

**Speaker:** C. Seshadhri

**Title:** The impossibility of low-rank representations for triangle-rich complex networks, part 1 [video][paper]

**Abstract:** A common and increasingly prevalent technique for machine learning in graphs is "low-dimensional graph representation". Given a graph (like a social network), one converts the nodes to vectors in a low-dimensional space. These vectors become "features" for the nodes, which are then used for various ML tasks. Implicitly, these conversions are based on models of social networks, where proximity in low-dimensional space corresponds to edges of a graph. We argue that this model, contrary to popular belief, is fundamentally flawed. We formally prove that low-dimensional embeddings fail to capture the triangle structure of real social networks, thereby failing to model the social structure of the graph. We do a series of empirical studies that validate our theory.

### February 24

**Speaker:** Yatong Chen

**Title:** Decoupled smoothing on graphs

**Abstract:** Graph smoothing methods are an extremely popular family of approaches for semi-supervised learning. The choice of graph used to represent relationships in these learning problems is often a more important decision than the particular algorithm or loss function used, yet this choice is less well-studied in the literature. In this work, we demonstrate that for social networks, the basic friendship graph itself may often not be the appropriate graph for predicting node attributes using graph smoothing. More specifically, standard graph smoothing is designed to harness the social phenomenon of homophily whereby individuals are similar to “the company they keep.” We present a decoupled approach to graph smoothing that decouples notions of “identity” and “preference,” resulting in an alternative social phenomenon of monophily whereby individuals are similar to “the company they're kept in,” as observed in recent empirical work. Our model results in a rigorous extension of the Gaussian Markov Random Field (GMRF) models that underlie graph smoothing, interpretable as smoothing on an appropriate auxiliary graph of weighted or unweighted two-hop relationships.

### February 10

**Speaker:** Konstantinos Zampetakis

**Title:** Coloring triangle-free graphs via semi-random methods

**Abstract:** The chromatic number of a graph is the least number of colors needed to color the vertices of a graph such that adjacent vertices attain different colors. For general graphs the chromatic number can be as large as the maximum degree plus one (e.g. cliques), however, we should to be able to improve on this bound when considering graphs satisfying certain sparsity conditions. Specifically, we will consider graphs that contain no triangles (triangle-free graphs) and prove that their chromatic number is bounded by \delta/\log \delta, using a seminal technique of probabilistic combinatorics known as "semi-random method".

### February 3

**Speaker:** C. Seshadhri

**Title:** Self-improving algorithms

**Abstract:** How does one beat worst-case analysis for, say, a simple problem like sorting? Typically, in the real-world, inputs come from some (ill-defined) "source", and our aim is to process inputs from that source. If one could exploit structure in the inputs, one may be able to design faster algorithms. This talk is about a theoretical framework for such results, called self-improving algorithms. We assume an unknown but fixed distribution of inputs. As it processes more inputs, the self-improving algorithm learns about the distribution and eventually converges to the optimal algorithm for that distribution. We will specifically look at self-improving sorting algorithms.

### January 27

**Speaker:** Shweta Jain

**Title:** The power of pivoting for exact clique counting

### January 13

**Speaker:** Andrew Stolman

**Title:** Learning-Based Frequency Estimation Algorithms

### January 6

**Speaker:** Akash Kumar

**Title:** Online Convex Optimization and Regularity Lemmas (based on Lecture notes by Luca Trevisan)