Nathan Kallus


Nathan Kallus

Assistant Professor
Cornell University and Cornell Tech
Field member: ORIE, CS, Stats, CAM


   Google Scholar

Bloomberg Center #363
2 West Loop Road
New York, NY 10044


Research group

I'm always recruiting motivated and talented PhD students for my research group at the Cornell Tech campus in NYC. See here for more information.

Publications and working papers

Showing 42 of 42 records
  • Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes, with Y. Hu and X. Mao.

    • arXiv September 2019.

    • Abstract: We study a nonparametric contextual bandit problem where the expected reward functions belong to a Hölder class with smoothness parameter β. We show how this interpolates between two extremes that were previously studied in isolation: non-differentiable bandits (β%leq;1), where rate-optimal regret is achieved by running separate non-contextual bandits in different context regions, and parametric-response bandits (β=∞), where rate-optimal regret can be achieved with minimal or no exploration due to infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and non-differentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits.
  • Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes, with M. Uehara.

    • arXiv September 2019.

    • Abstract: Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian, time-invariant, and ergodic structure in efficient OPE. We first derive the efficiency limits for OPE when one assumes each of these structures. This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar. But, in ergodic time-invariant Markov decision processes, our bounds show that truly-off-policy evaluation is feasible, even with only just one dependent trajectory, and provide the limits of how well we could hope to do. We develop a new estimator based on Double Reinforcement Learning (DRL) that leverages this structure for OPE. Our DRL estimator simultaneously uses estimated stationary density ratios and q-functions and remains efficient when both are estimated at slow, nonparametric rates and remains consistent when either is estimated consistently. We investigate these properties and the performance benefits of leveraging the problem structure for more efficient OPE.
  • Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes, with M. Uehara.

    • arXiv August 2019.

    • Abstract: Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of q-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness efficiently.
  • Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination, with X. Mao and A. Zhou.

    • arXiv June 2019.

    • Abstract: The increasing impact of algorithmic decisions on people's lives compels us to scrutinize their fairness and, in particular, the disparate impacts that ostensibly-color-blind algorithms can have on different groups. Examples include credit decisioning, hiring, advertising, criminal justice, personalized medicine, and targeted policymaking, where in some cases legislative or regulatory frameworks for fairness exist and define specific protected classes. In this paper we study a fundamental challenge to assessing disparate impacts in practice: protected class membership is often not observed in the data. This is particularly a problem in lending and healthcare. We consider the use of an auxiliary dataset, such as the US census, that includes class labels but not decisions or outcomes. We show that a variety of common disparity measures are generally unidentifiable aside for some unrealistic cases, providing a new perspective on the documented biases of popular proxy-based methods. We provide exact characterizations of the sharpest-possible partial identification set of disparities either under no assumptions or when we incorporate mild smoothness constraints. We further provide optimization-based algorithms for computing and visualizing these sets, which enables reliable and robust assessments -- an important tool when disparity assessment can have far-reaching policy implications. We demonstrate this in two case studies with real data: mortgage lending and personalized medicine dosing.
  • Data Pooling in Stochastic Optimization, with V. Gupta.

    • Major revision in Management Science Special Issue on Data-Driven Prescriptive Analytics.

    • arXiv June 2019.

    • GitHub.

    • Abstract: Managing large-scale systems often involves simultaneously solving thousands of unrelated stochastic optimization problems, each with limited data. Intuition suggests one can decouple these unrelated problems and solve them separately without loss of generality. We propose a novel data-pooling algorithm called Shrunken-SAA that disproves this intuition. In particular, we prove that combining data across problems can outperform decoupling, even when there is no a priori structure linking the problems and data are drawn independently. Our approach does not require strong distributional assumptions and applies to constrained, possibly non-convex, non-smooth optimization problems such as vehicle-routing, economic lot-sizing or facility location. We compare and contrast our results to a similar phenomenon in statistics (Stein's Phenomenon), highlighting unique features that arise in the optimization setting that are not present in estimation. We further prove that as the number of problems grows large, Shrunken-SAA learns if pooling can improve upon decoupling and the optimal amount to pool, even if the average amount of data per problem is fixed and bounded. Importantly, we highlight a simple intuition based on stability that highlights when} and why data-pooling offers a benefit, elucidating this perhaps surprising phenomenon. This intuition further suggests that data-pooling offers the most benefits when there are many problems, each of which has a small amount of relevant data. Finally, we demonstrate the practical benefits of data-pooling using real data from a chain of retail drug stores in the context of inventory management.
  • More Efficient Policy Learning via Optimal Retargeting.

    • arXiv June 2019.

    • GitHub.

    • Abstract: Policy learning can be used to extract individualized treatment regimes from observational data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a commonplace lack of overlap in the data for different actions, which can lead to unwieldy policy evaluation and poorly performing learned policies. We study a solution to this problem based on retargeting, that is, changing the population on which policies are optimized. We first argue that at the population level, retargeting may induce little to no bias. We then characterize the optimal reference policy centering and retargeting weights in both binary-action and multi-action settings. We do this in terms of the asymptotic efficient estimation variance of the new learning objective. We further consider bias regularization. Extensive empirical results in a simulation study and a case study of targeted job counseling demonstrate that retargeting is a fairly easy way to significantly improve any policy learning procedure.
  • Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning, with M. Uehara.

    • To appearProceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), to appear, 2019.

    • arXiv June 2019.

    • GitHub.

    • Abstract: Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS's inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them. Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages.
  • The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric, with A. Zhou.

    • To appearProceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), to appear, 2019.

    • arXiv February 2019.

    • Abstract: Where machine-learned predictive risk scores inform high-stakes decisions, such as bail and sentencing in criminal justice, fairness has been a serious concern. Recent work has characterized the disparate impact that such risk scores can have when used for a binary classification task. This may not account, however, for the more diverse downstream uses of risk scores and their non-binary nature. To better account for this, in this paper, we investigate the fairness of predictive risk scores from the point of view of a bipartite ranking task, where one seeks to rank positive examples higher than negative ones. We introduce the xAUC disparity as a metric to assess the disparate impact of risk scores and define it as the difference in the probabilities of ranking a random positive example from one protected group above a negative one from another group and vice versa. We provide a decomposition of bipartite ranking loss into components that involve the discrepancy and components that involve pure predictive ability within each group. We use xAUC analysis to audit predictive risk scores for recidivism prediction, income prediction, and cardiac arrest prediction, where it describes disparities that are not evident from simply comparing within-group predictive performance.
  • Deep Generalized Method of Moments for Instrumental Variable Analysis, with A. Bennett and T. Schnabel.

    • To appearProceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), to appear, 2019.

    • arXiv May 2019.

    • GitHub.

    • Abstract: Instrumental variable analysis is a powerful tool for estimating causal effects when randomization or full control of confounders is not possible. The application of standard methods such as 2SLS, GMM, and more recent variants are significantly impeded when the causal effects are complex, the instruments are high-dimensional, and/or the treatment is high-dimensional. In this paper, we propose the DeepGMM algorithm to overcome this. Our algorithm is based on a new variational reformulation of GMM with optimal inverse-covariance weighting that allows us to efficiently control very many moment conditions. We further develop practical techniques for optimization and model selection that make it particularly successful in practice. Our algorithm is also computationally tractable and can handle large-scale datasets. Numerical results show our algorithm matches the performance of the best tuned methods in standard settings and continues to work in high-dimensional settings where even recent methods break.
  • Assessing Disparate Impacts of Personalized Interventions: Identifiability and Bounds, with A. Zhou.

    • To appearProceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), to appear, 2019.

    • arXiv June 2019.

    • Abstract: Personalized interventions in social services, education, and healthcare leverage individual-level causal effect predictions in order to give the best treatment to each individual or to prioritize program interventions for the individuals most likely to benefit. While the sensitivity of these domains compels us to evaluate the fairness of such policies, we show that actually auditing their disparate impacts per standard observational metrics, such as true positive rates, is impossible since ground truths are unknown. Whether our data is experimental or observational, an individual's actual outcome under an intervention different than that received can never be known, only predicted based on features. We prove how we can nonetheless point-identify these quantities under the additional assumption of monotone treatment response, which may be reasonable in many applications. We further provide a sensitivity analysis for this assumption by means of sharp partial-identification bounds under violations of monotonicity of varying strengths. We show how to use our results to audit personalized interventions using partially-identified ROC and xROC curves and demonstrate this in a case study of a French job training dataset.
  • Policy Evaluation with Latent Confounders via Optimal Balance, with A. Bennett.

    • To appearProceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), to appear, 2019.

    • arXiv August 2019.

    • GitHub.

    • Abstract: Evaluating novel contextual bandit policies using logged data is crucial in applications where exploration is costly, such as medicine. But it usually relies on the assumption of no unobserved confounders, which is bound to fail in practice. We study the question of policy evaluation when we instead have proxies for the latent confounders and develop an importance weighting method that avoids fitting a latent outcome regression model. We show that unlike the unconfounded case no single set of weights can give unbiased evaluation for all outcome models, yet we propose a new algorithm that can still provably guarantee consistency by instead minimizing an adversarial balance objective. We further develop tractable algorithms for optimizing this objective and demonstrate empirically the power of our method when confounders are latent.
  • Comment: Entropy Learning for Dynamic Treatment Regimes.

  • Classifying Treatment Responders Under Causal Effect Monotonicity.

    • Proceedings of the 36th International Conference on Machine Learning (ICML), 97:3201--3210, 2019.

    • arXiv February 2019.

    • GitHub.

    • Abstract: In the context of individual-level causal inference, we study the problem of predicting whether someone will respond or not to a treatment based on their features and past examples of features, treatment indicator (e.g., drug/no drug), and a binary outcome (e.g., recovery from disease). As a classification task, the problem is made difficult by not knowing the example outcomes under the opposite treatment indicators. We assume the effect is monotonic, as in advertising's effect on a purchase or bail-setting's effect on reappearance in court: either it would have happened regardless of treatment, not happened regardless, or happened only depending on exposure to treatment. Predicting whether the latter is latently the case is our focus. While previous work focuses on conditional average treatment effect estimation, formulating the problem as a classification task rather than an estimation task allows us to develop new tools more suited to this problem. By leveraging monotonicity, we develop new discriminative and generative algorithms for the responder-classification problem. We explore and discuss connections to corrupted data and policy learning. We provide an empirical study with both synthetic and real datasets to compare these specialized algorithms to standard benchmarks.
  • Confounding-Robust Policy Improvement, with A. Zhou.

    • Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), 9289--9299, 2018.

    • Journal version: Major revision in Management Science.

    • arXiv May 2018.

    • Winner, INFORMS 2018 Data Mining Best Paper Award 2nd place, INFORMS 2018 Junior Faculty Interest Group (JFIG) Paper Competition

    • Abstract: We study the problem of learning personalized decision policies from observational data while accounting for possible unobserved confounding in the data-generating process. Unlike previous approaches which assume unconfoundedness, i.e., no unobserved confounders affected treatment assignment as well as outcome, we calibrate policy learning for realistic violations of this unverifiable assumption with uncertainty sets motivated by sensitivity analysis in causal inference. Our framework for confounding-robust policy improvement optimizes the minimax regret of a candidate policy against a baseline or reference "status quo" policy, over a uncertainty set around nominal propensity weights. We prove that if the uncertainty set is well-specified, robust policy learning can do no worse than the baseline, and only improve if the data supports it. We characterize the adversarial subproblem and use efficient algorithmic solutions to optimize over parametrized spaces of decision policies such as logistic treatment assignment. We assess our methods on synthetic data and a large clinical trial, demonstrating that confounded selection can hinder policy learning and lead to unwarranted harm, while our robust approach guarantees safety and focuses on well-evidenced improvement.
  • Removing Hidden Confounding by Experimental Grounding, with A. M. Puli and U. Shalit.

    • Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), 10911--10920, 2018.

    • arXiv October 2018.

    • GitHub.

    • Spotlight at NeurIPS

    • Abstract: Observational data is being increasingly used as a means for making individual-level causal predictions and intervention recommendations. The foremost challenge of causal inference from observational data is hidden confounding, whose presence cannot be tested in data and can invalidate any causal conclusion. Experimental data does not stuffer from confounding but is usually limited in both scope and scale. We introduce a novel method of using limited experimental data to correct the hidden confounding in causal effect models trained on larger observational data, even if the observational data does not fully overlap with the experimental data. Our method makes strictly weaker assumptions than existing approaches, and we prove conditions under which our method yields a consistent estimator. We demonstrate our method's efficacy using real-world data from a large educational experiment.
  • Balanced Policy Evaluation and Learning.

    • Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), 8909--8920, 2018.

    • arXiv May 2017.

    • Abstract: We present a new approach to the problems of evaluating and learning personalized decision policies from observational data of past contexts, decisions, and outcomes. Only the outcome of the enacted decision is available and the historical policy is unknown. These problems arise in personalized medicine using electronic health records and in internet advertising. Existing approaches use inverse propensity weighting (or, doubly robust versions) to make historical outcome (or, residual) data look like it were generated by a new policy being evaluated or learned. But this relies on a plug-in approach that rejects data points with a decision that disagrees with the new policy, leading to high variance estimates and ineffective learning. We propose a new, balance-based approach that too makes the data look like the new policy but does so directly by finding weights that optimize for balance between the weighted data and the target policy in the given, finite sample, which is equivalent to minimizing worst-case or posterior conditional mean square error. Our policy learner proceeds as a two-level optimization problem over policies and weights. We demonstrate that this approach markedly outperforms existing ones both in evaluation and learning, which is unsurprising given the wider support of balance-based weights. We establish extensive theoretical consistency guarantees and regret bounds that support this empirical success.
  • Causal Inference with Noisy and Missing Covariates via Matrix Factorization, with X. Mao and M. Udell.

    • Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), 6921--6932, 2018.

    • arXiv June 2018.

    • GitHub.

    • Abstract: Valid causal inference in observational studies often requires controlling for confounders. However, in practice measurements of confounders may be noisy, and can lead to biased estimates of causal effects. We show that we can reduce the bias caused by measurement noise using a large number of noisy measurements of the underlying confounders. We propose the use of matrix factorization to infer the confounders from noisy covariates, a flexible and principled framework that adapts to missing values, accommodates a wide variety of data types, and can augment many causal inference methods. We bound the error for the induced average treatment effect estimator and show it is consistent in a linear regression setting, using Exponential Family Matrix Completion preprocessing. We demonstrate the effectiveness of the proposed procedure in numerical experiments with both synthetic data and real clinical data.
  • Fairness under unawareness: assessing disparity when protected class is unobserved, with J. Chen, X. Mao, G. Svacha, and M. Udell.

    • Proceedings of the 2nd ACM Conference on Fairness, Accountability, and Transparency (FAT*), 339--348, 2019.

    • arXiv November 2018.

    • GitHub.

    • Abstract: Assessing the fairness of a decision making system with respect to a protected class, such as gender or race, is complicated by not being able to observe class membership labels due to legal or operational reasons. Probabilistic models for predicting the protected class based on observed proxy variables, such as surname and geolocation for race, are sometimes used to impute these missing labels. Such methods are known to be used by government regulators and have been observed to exaggerate disparities. The reason why is unknown, as is whether overestimation is always the case. We decompose the bias of estimating outcome disparity via an existing threshold-based imputation method into multiple interpretable bias sources, which explains when over- or underestimation occurs. We also propose an alternative weighted estimator that uses soft classification rules rather than hard imputation, and show that its bias arises simply from the conditional covariance of the outcome with the true class membership. We illustrate our results with numerical simulations as well as an application to a dataset of mortgage applications, using geolocation as a proxy for race. We confirm that the bias of threshold-based imputation is generally upward; however, its magnitude varies strongly with the threshold chosen due to the complex interplay of multiple sources of bias uncovered by our theoretical analysis. Our new weighted estimator tends to have a negative bias that is much simpler to analyze and reason about.
  • Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding, with X. Mao and A. Zhou.

    • Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89:2281--2290, 2019.

    • arXiv October 2018.

    • Abstract: We study the problem of learning conditional average treatment effects (CATE) from observational data with unobserved confounders. The CATE function maps baseline covariates to individual causal effect predictions and is key for personalized assessments. Recent work has focused on how to learn CATE under unconfoundedness, i.e., when there are no unobserved confounders. Since CATE may not be identified when unconfoundedness is violated, we develop a functional interval estimator that predicts bounds on the individual causal effects under realistic violations of unconfoundedness. Our estimator takes the form of a weighted kernel estimator with weights that vary adversarially. We prove that our estimator is sharp in that it converges exactly to the tightest bounds possible on CATE when there may be unobserved confounders. Further, we study personalized decision rules derived from our estimator and prove that they achieve optimal minimax regret asymptotically. We assess our approach in a simulation study as well as demonstrate its application in the case of hormone replacement therapy by comparing conclusions from a real observational study and clinical trial.
  • Residual Unfairness in Fair Machine Learning from Prejudiced Data, with A. Zhou.

    • Proceedings of the 35th International Conference on Machine Learning (ICML), 80:2439--2448, 2018.

    • arXiv June 2018.

    • Abstract: Recent work in fairness in machine learning has proposed adjusting for fairness by equalizing accuracy metrics across groups and has also studied how datasets affected by historical prejudices may lead to unfair decision policies. We connect these lines of work and study the residual unfairness that arises when a fairness-adjusted predictor is not actually fair on the target population due to systematic censoring of training data by existing biased policies. This scenario is particularly common in the same applications where fairness is a concern. We characterize theoretically the impact of such censoring on standard fairness metrics for binary classifiers and provide criteria for when residual unfairness may or may not appear. We prove that, under certain conditions, fairness-adjusted classifiers will in fact induce residual unfairness that perpetuates the same injustices, against the same groups, that biased the data to begin with, thus showing that even state-of-the-art fair machine learning can have a "bias in, bias out" property. When certain benchmark data is available, we show how sample reweighting can estimate and adjust fairness metrics while accounting for censoring. We use this to study the case of Stop, Question, and Frisk (SQF) and demonstrate that attempting to adjust for fairness perpetuates the same injustices that the policy is infamous for.
  • Policy Evaluation and Optimization with Continuous Treatments, with A. Zhou.

    • Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), 84:1243--1251, 2018.

    • arXiv February 2018.

    • GitHub.

    • Finalist, Best Paper of INFORMS 2017 Data Mining and Decision Analytics Workshop

    • Abstract: We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments. Previous work for discrete treatment/action spaces focuses on inverse probability weighting (IPW) and doubly robust (DR) methods that use a rejection sampling approach for evaluation and the equivalent weighted classification problem for learning. In the continuous setting, this reduction fails as we would almost surely reject all observations. To tackle the case of continuous treatments, we extend the IPW and DR approaches to the continuous setting using a kernel function that leverages treatment proximity to attenuate discrete rejection. Our policy estimator is consistent and we characterize the optimal bandwidth. The resulting continuous policy optimizer (CPO) approach using our estimator achieves convergent regret and approaches the best-in-class policy for learnable policy classes. We demonstrate that the estimator performs well and, in particular, outperforms a discretization-based benchmark. We further study the performance of our policy optimizer in a case study on personalized dosing based on a dataset of Warfarin patients, their covariates, and final therapeutic doses. Our learned policy outperforms benchmarks and nears the oracle-best linear policy.
  • Instrument-Armed Bandits.

    • Proceedings of the 29th International Conference on Algorithmic Learning Theory (ALT), 2018.

    • arXiv May 2017.

    • Abstract: We extend the classic multi-armed bandit (MAB) model to the setting of noncompliance, where the arm pull is a mere instrument and the treatment applied may differ from it, which gives rise to the instrument-armed bandit (IAB) problem. The IAB setting is relevant whenever the experimental units are human since free will, ethics, and the law may prohibit unrestricted or forced application of treatment. In particular, the setting is relevant in bandit models of dynamic clinical trials and other controlled trials on human interventions. Nonetheless, the setting has not been fully investigate in the bandit literature. We show that there are various and divergent notions of regret in this setting, all of which coincide only in the classic MAB setting. We characterize the behavior of these regrets and analyze standard MAB algorithms. We argue for a particular kind of regret that captures the causal effect of treatments but show that standard MAB algorithms cannot achieve sublinear control on this regret. Instead, we develop new algorithms for the IAB problem, prove new regret bounds for them, and compare them to standard MAB algorithms in numerical examples.
  • More robust estimation of sample average treatment effects using Kernel Optimal Matching in an observational study of spine surgical interventions, with B. Pennicooke and M. Santacatterina.

    • arXiv November 2018.

    • GitHub.

    • Abstract: Inverse probability of treatment weighting (IPTW), which has been used to estimate sample average treatment effects (SATE) using observational data, tenuously relies on the positivity assumption and the correct specification of the treatment assignment model, both of which are problematic assumptions in many observational studies. Various methods have been proposed to overcome these challenges, including truncation, covariate-balancing propensity scores, and stable balancing weights. Motivated by an observational study in spine surgery, in which positivity is violated and the true treatment assignment model is unknown, we present the use of optimal balancing by Kernel Optimal Matching (KOM) to estimate SATE. By uniformly controlling the conditional mean squared error of a weighted estimator over a class of models, KOM simultaneously mitigates issues of possible misspecification of the treatment assignment model and is able to handle practical violations of the positivity assumption, as shown in our simulation study. Using data from a clinical registry, we apply KOM to compare two spine surgical interventions and demonstrate how the result matches the conclusions of clinical trials that IPTW estimates spuriously refute.
  • Optimal Balancing of Time-Dependent Confounders for Marginal Structural Models, with M. Santacatterina.

    • arXiv June 2018.

    • GitHub.

    • Abstract: Marginal structural models (MSMs) estimate the causal effect of a time-varying treatment in the presence of time-dependent confounding via weighted regression. The standard approach of using inverse probability of treatment weighting (IPTW) can lead to high-variance estimates due to extreme weights and be sensitive to model misspecification. Various methods have been proposed to partially address this, including truncation and stabilized-IPTW to temper extreme weights and covariate balancing propensity score (CBPS) to address treatment model misspecification. In this paper, we present Kernel Optimal Weighting (KOW), a convex-optimization-based approach that finds weights for fitting the MSM that optimally balance time-dependent confounders while simultaneously controlling for precision, directly addressing the above limitations. KOW directly minimizes the error in estimation due to time-dependent confounding via a new decomposition as a functional. We further extend KOW to control for informative censoring. We evaluate the performance of KOW in a simulation study, comparing it with IPTW, stabilized-IPTW, and CBPS. We demonstrate the use of KOW in studying the effect of treatment initiation on time-to-death among people living with HIV and the effect of negative advertising on elections in the United States.
  • DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training.

    • arXiv February 2018.

    • Abstract: We study optimal covariate balance for causal inferences from observational data when rich covariates and complex relationships necessitate flexible modeling with neural networks. Standard approaches such as propensity weighting and matching/balancing fail in such settings due to miscalibrated propensity nets and inappropriate covariate representations, respectively. We propose a new method based on adversarial training of a weighting and a discriminator network that effectively addresses this methodological gap. This is demonstrated through new theoretical characterizations of the method as well as empirical results using both fully connected architectures to learn complex relationships and convolutional architectures to handle image confounders, showing how this new method can enable strong causal analyses in these challenging settings.
  • Learning Weighted Representations for Generalization Across Designs, with F. Johansson, U. Shalit, and D. Sontag.

    • arXiv February 2018.

    • Abstract: Predictive models that generalize well under distributional shift are often desirable and sometimes crucial to building robust and reliable machine learning applications. We focus on distributional shift that arises in causal inference from observational data and in unsupervised domain adaptation. We pose both of these problems as prediction under a shift in design. Popular methods for overcoming distributional shift make unrealistic assumptions such as having a well-specified model or knowing the policy that gave rise to the observed data. Other methods are hindered by their need for a pre-specified metric for comparing observations, or by poor asymptotic properties. We devise a bound on the generalization error under design shift, incorporating both representation learning and sample re-weighting. Based on the bound, we propose an algorithmic framework that does not require any of the above assumptions and which is asymptotically consistent. We empirically study the new framework using two synthetic datasets, and demonstrate its effectiveness compared to previous methods.
  • Recursive Partitioning for Personalization using Observational Data.

    • Proceedings of the 34th International Conference on Machine Learning (ICML), 70:1789--1798, 2017.

    • arXiv August 2016.

    • Winner, Best Paper of INFORMS 2016 Data Mining and Decision Analytics Workshop

    • Abstract: We study the problem of learning to choose from m discrete treatment options (e.g., news item or medical drug) the one with best causal effect for a particular instance (e.g., user or patient) where the training data consists of passive observations of covariates, treatment, and the outcome of the treatment. The standard approach to this problem is regress and compare: split the training data by treatment, fit a regression model in each split, and, for a new instance, predict all m outcomes and pick the best. By reformulating the problem as a single learning task rather than m separate ones, we propose a new approach based on recursively partitioning the data into regimes where different treatments are optimal. We extend this approach to an optimal partitioning approach that finds a globally optimal partition, achieving a compact, interpretable, and impactful personalization model. We develop new tools for validating and evaluating personalization models on observational data and use these to demonstrate the power of our novel approaches in a personalized medicine and a job training application.
  • Generalized Optimal Matching Methods for Causal Inference.

    • To appearJournal of Machine Learning Research (JMLR), Accepted, 2019.

    • arXiv December 2016.

    • Abstract: We develop an encompassing framework and theory for matching and related methods for causal inference that reveal the connections and motivations behind various existing methods and give rise to new and improved ones. The framework is given by generalizing a new functional analytical characterization of optimal matching as minimizing worst-case conditional mean squared error given the observed data based on specific restrictions and assumptions. By generalizing these, we obtain a new class of generalized optimal matching (GOM) methods, for which we provide a single theory for tractability and consistency that applies generally to GOM. Many commonly used existing methods are included in GOM and using their GOM interpretation we extend these to new methods that judiciously and automatically trade off balance for variance and outperform their standard counterparts. As a subclass of GOM, we develop kernel optimal matching, which, as supported by new theory, is notable for combining the interpretability of matching methods, the non-parametric model-free consistency of optimal matching, the efficiency of well-specified regression, the judicious sample size selection of monotonic imbalance bounding methods, the double robustness of augmented inverse propensity weight estimators, and the model-selection flexibility of Gaussian-process regression. We discuss connections to and non-linear generalizations of equal percent bias reduction and its ramifications.
  • Dynamic Assortment Personalization in High Dimensions, with M. Udell.

    • To appearOperations Research, Accepted, 2019.

    • arXiv October 2016.

    • Abstract: We demonstrate the importance of structural priors for effective, efficient large-scale dynamic assortment personalization. Assortment personalization is the problem of choosing, for each individual or consumer segment (type), a best assortment of products, ads, or other offerings (items) so as to maximize revenue. This problem is central to revenue management in e-commerce, online advertising, and multi-location brick-and-mortar retail, where both items and types can number in the thousands-to-millions. Data efficiency is paramount in this large-scale setting. A good personalization strategy must dynamically balance the need to learn consumer preferences and to maximize revenue. \  We formulate the dynamic assortment personalization problem as a discrete-contextual bandit with /m/ contexts (customer types) and many arms (assortments of the /n/ items). We assume that each type's preferences follow a simple parametric model with /n/ parameters. In all, there are /mn/ parameters, and existing literature suggests that order optimal regret scales as /mn/. However, this figure is orders of magnitude larger than the data available in large-scale applications, and imposes unacceptably high regret. \  In this paper, we impose natural structure on the problem \— a small latent dimension, or low rank. In the static setting, we show that this model can be efficiently learned from surprisingly few interactions, using a time- and memory-efficient optimization algorithm that converges globally whenever the model is learnable. In the dynamic setting, we show that structure-aware dynamic assortment personalization can have regret that is an order of magnitude smaller than structure-ignorant approaches. We validate our theoretical results empirically.
  • Optimal A Priori Balance in the Design of Controlled Experiments.

    • Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(1):85--112, 2018.

    • arXiv December 2013.

    • Code.

    • Abstract: We develop a unified theory of designs for controlled experiments that balance baseline covariates a priori (before treatment and before randomization) using the framework of minimax variance and a new method called kernel allocation. We show that any notion of a priori balance must go hand in hand with a notion of structure, since with no structure on the dependence of outcomes on baseline covariates complete randomization (no special covariate balance) is always minimax optimal. Restricting the structure of dependence, either parametrically or non-parametrically, gives rise to certain covariate imbalance metrics and optimal designs. This recovers many popular imbalance metrics and designs previously developed ad hoc, including randomized block designs, pairwise-matched allocation and rerandomization. We develop a new design method called kernel allocation based on the optimal design when structure is expressed by using kernels, which can be parametric or non-parametric. Relying on modern optimization methods, kernel allocation, which ensures nearly perfect covariate balance without biasing estimates under model misspecification, offers sizable advantages in precision and power as demonstrated in a range of real and synthetic examples. We provide strong theoretical guarantees on variance, consistency and rates of convergence and develop special algorithms for design and hypothesis testing.
  • The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization, with D. Bertsimas.

    • Major revision in Operations Research.

    • arXiv May 2016.

    • Abstract: While data-driven decision-making is transforming modern operations, most large-scale data is of an observational nature, such as transactional records. These data pose unique challenges in a variety of operational problems posed as stochastic optimization problems, including pricing and inventory management, where one must evaluate the effect of a decision, such as price or order quantity, on an uncertain cost/reward variable, such as demand, based on historical data where decision and outcome may be confounded. Often, the data lacks the features necessary to enable sound assessment of causal effects and/or the strong assumptions necessary may be dubious. Nonetheless, common practice is to assign a decision an objective value equal to the best prediction of cost/reward given the observation of the decision in the data. While in general settings this identification is spurious, for optimization purposes it is only the objective value of the final decision that matters, rather than the validity of any model used to arrive at it. In this paper, we formalize this statement in the case of observational-data-driven optimization and study both the power and limits of predictive approaches to observational-data-driven optimization with a particular focus on pricing. We provide rigorous bounds on optimality gaps of such approaches even when optimal decisions cannot be identified from data. To study potential limits of predictive approaches in real datasets, we develop a new hypothesis test for causal-effect objective optimality. Applying it to interest-rate-setting data, we empirically demonstrate that predictive approaches can be powerful in practice but with some critical limitations.
  • From Predictive to Prescriptive Analytics, with D. Bertsimas.

    • Management Science, 2018.

    • arXiv February 2014.

    • Finalist, POMS Applied Research Challenge 2016

    • Abstract: In this paper, we combine ideas from machine learning (ML) and operations research and management science (OR\/MS) in developing a framework, along with specific methods, for using data to prescribe decisions in OR\/MS problems. In a departure from other work on data-driven optimization and reflecting our practical experience with the data available in applications of OR\/MS, we consider data consisting, not only of observations of quantities with direct effect on costs\/revenues, such as demand or returns, but predominantly of observations of associated auxiliary quantities. The main problem of interest is a conditional stochastic optimization problem, given imperfect observations, where the joint probability distributions that specify the problem are unknown. We demonstrate that our proposed solution methods are generally applicable to a wide range of decision problems. We prove that they are computationally tractable and asymptotically optimal under mild conditions even when data is not independent and identically distributed (iid) and even for censored observations. As an analogue to the coefficient of determination R\², we develop a metric P termed the coefficient of prescriptiveness to measure the prescriptive content of data and the efficacy of a policy from an operations perspective. To demonstrate the power of our approach in a real-world setting we study an inventory management problem faced by the distribution arm of an international media conglomerate, which ships an average of 1 billion units per year. We leverage both internal data and public online data harvested from IMDb, Rotten Tomatoes, and Google to prescribe operational decisions that outperform baseline measures. Specifically, the data we collect, leveraged by our methods, accounts for an 88\% improvement as measured by our coefficient of prescriptiveness.
  • Robust Sample Average Approximation, with D. Bertsimas and V. Gupta.

    • Mathematical Programming, 171(1--2):217--282, 2018.

    • arXiv August 2014.

    • Winner, Best Student Paper Award, MIT Operations Research Center 2013

    • Abstract: Sample average approximation (SAA) is a widely popular approach to data-driven decision-making under uncertainty. Under mild assumptions, SAA is both tractable and enjoys strong asymptotic performance guarantees. Similar guarantees, however, do not typically hold in finite samples. In this paper, we propose a modification of SAA, which we term Robust SAA, which retains SAA's tractability and asymptotic properties and, additionally, enjoys strong finite-sample performance guarantees. The key to our method is linking SAA, distributionally robust optimization, and hypothesis testing of goodness-of-fit. Beyond Robust SAA, this connection provides a unified perspective enabling us to characterize the finite sample and asymptotic guarantees of various other data-driven procedures that are based upon distributionally robust optimization. This analysis provides insight into the practical performance of these various methods in real applications. We present examples from inventory management and portfolio allocation, and demonstrate numerically that our approach outperforms other data-driven approaches in these applications.
  • Data-Driven Robust Optimization, with D. Bertsimas and V. Gupta.

    • Mathematical Programming, 167(2):235--292, 2018.

    • arXiv January 2014.

    • Finalist, INFORMS Nicholson Paper Competition 2013

    • Abstract: The last decade has seen an explosion in the availability of data for operations research applications as part of the Big Data revolution. Motivated by this data rich paradigm, we propose a novel schema for utilizing data to design uncertainty sets for robust optimization using statistical hypothesis tests. The approach is flexible and widely applicable, and robust optimization problems built from our new sets are computationally tractable, both theoretically and practically. Furthermore, optimal solutions to these problems enjoy a strong, finite-sample probabilistic guarantee. We also propose concrete guidelines for practitioners and illustrate our approach with applications in portfolio management and queueing. Computational evidence confirms that our data-driven sets significantly outperform conventional robust optimization techniques whenever data is available.
  • A Framework for Optimal Matching for Causal Inference.

    • Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54:372--381, 2017.

    • arXiv June 2016.

    • Abstract: We propose a novel framework for matching estimators for causal effect from observational data that is based on minimizing the dual norm of estimation error when expressed as an operator. We show that many popular matching estimators can be expressed as optimal in this framework, including nearest-neighbor matching, coarsened exact matching, and mean-matched sampling. This reveals their motivation and aptness as structural priors formulated by embedding the effect in a particular functional space. This also gives rise to a range of new, kernel-based matching estimators that arise when one embeds the effect in a reproducing kernel Hilbert space. Depending on the case, these estimators can be found using either quadratic optimization or integer optimization. We show that estimators based on universal kernels are universally consistent without model specification. In empirical results using both synthetic and real data, the new, kernel-based estimators outperform all standard causal estimators in estimation error.
  • Personalized Diabetes Management Using Electronic Medical Records, with D. Bertsimas, A Weinstein, and D. Zhuo.

    • Diabetes Care, 40(2):210--217, 2017.

    • Abstract: Objective: Current clinical guidelines for managing type 2 diabetes do not differentiate based on patient-specific factors. We present a data-driven algorithm for personalized diabetes management that improves health outcomes relative to the standard of care. \ Research Design and Methods: We modeled outcomes under 13 pharmacological therapies based on electronic medical records from 1999 to 2014 for 10,806 patients with type 2 diabetes from Boston Medical Center. For each patient visit, we analyzed the range of outcomes under alternative care using a k-nearest neighbor approach. The neighbors were chosen to maximize similarity on individual patient characteristics and medical history that were most predictive of health outcomes. The recommendation algorithm prescribes the regimen with best predicted outcome if the expected improvement from switching regimens exceeds a threshold. We evaluated the effect of recommendations on matched patient outcomes from unseen data. \ Results: Among the 48,140 patient visits in the test set, the algorithm's recommendation mirrored the observed standard of care in 68.2\% of visits. For patient visits in which the algorithmic recommendation differed from the standard of care, the mean posttreatment glycated hemoglobin A\1c\<\/sub\> (HbA\1c\<\/sub\>) under the algorithm was lower than standard of care by 0.44 \± 0.03\% (4.8 \± 0.3 mmol\/mol) (\P\<\/em\> \< 0.001), from 8.37\% under the standard of care to 7.93\% under our algorithm (68.0 to 63.2 mmol\/mol). \ Conclusion: A personalized approach to diabetes management yielded substantial improvements in HbA\1c\<\/sub\> outcomes relative to the standard of care. Our prototyped dashboard visualizing the recommendation algorithm can be used by providers to inform diabetes care and improve outcomes.
  • Revealed Preference at Scale: Learning Personalized Preferences from Assortment Choice, with M. Udell.

    • Proceedings of the 17th ACM Conference on Economics and Computation (EC), 17:821--837, 2016.

    • arXiv September 2015.

    • Abstract: We consider the problem of learning the preferences of a heterogeneous population by observing choices from an assortment of products, ads, or other offerings. Our observation model takes a form common in assortment planning applications: each arriving customer is offered an assortment consisting of a subset of all possible offerings; we observe only the assortment and the customer's single choice. \ In this paper we propose a mixture choice model with a natural underlying low-dimensional structure, and show how to estimate its parameters. In our model, the preferences of each customer or segment follow a separate parametric choice model, but the underlying structure of these parameters over all the models has low dimension. We show that a nuclear-norm regularized maximum likelihood estimator can learn the preferences of all customers using a number of observations much smaller than the number of item-customer combinations. This result shows the potential for structural assumptions to speed up learning and improve revenues in assortment planning and customization. We provide a specialized factored gradient descent algorithm and study the success of the approach empirically.
  • Inventory Management in the Era of Big Data, with D. Bertsimas and A. Hussain.

  • On the Predictive Power of Web Intelligence and Social Media.

    • Chapter in Big Data Analytics in the Social and Ubiquitous Context, Springer, 2016.

    • Abstract: With more information becoming widely accessible and new content created shared on today's web, more are turning to harvesting such data and analyzing it to extract insights. But the relevance of such data to see beyond the present is not clear. We present efforts to predict future events based on web intelligence -- data harvested from the web -- with specific emphasis on social media data and on timed event mentions, thereby quantifying the predictive power of such data. We focus on predicting crowd actions such as large protests and coordinated acts of cyber activism -- predicting their occurrence, specific timeframe, and location. Using natural language processing, statements about events are extracted from content collected from hundred of thousands of open content we sources. Attributes extracted include event type, entities involved and their role, sentiment and tone, and -- most crucially -- the reported timeframe for the occurrence of the event discussed -- whether it be in the past, present, or future. Tweets (Twitter posts) that mention an event to occur reportedly in the future prove to be important predictors. These signals are enhanced by cross referencing with the fragility of the situation as inferred from more traditional media, allowing us to sift out the social media trends that fizzle out before materializing as crowds on the ground.
  • The Power of Optimization Over Randomization in Designing Experiments Involving Small Samples, with D. Bertsimas and M. Johnson.

    • Operations Research, 63(4):868--876, 2015.

    • Code. Editorial in World Bank's Development Impact.

    • Abstract: Random assignment, typically seen as the standard in controlled trials, aims to make experimental groups statistically equivalent before treatment. However, with a small sample, which is a practical reality in many disciplines, randomized groups are often too dissimilar to be useful. We propose an approach based on discrete linear optimization to create groups whose discrepancy in their means and variances is several orders of magnitude smaller than with randomization. We provide theoretical and computational evidence that groups created by optimization have exponentially lower discrepancy than those created by randomization.
  • Predicting Crowd Behavior with Big Public Data.

    • Proceedings of the 23rd international conference on World Wide Web (WWW) companion, 23:625--630, 2014.

    • arXiv February 2014.

    • Slides. Media coverage.

    • Winner, INFORMS Social Media Analytics Best Paper Competition 2015

    • Abstract: With public information becoming widely accessible and shared on today's web, greater insights are possible into crowd actions by citizens and non-state actors such as large protests and cyber activism. Turning public data into Big Data, company Recorded Future continually scans over 300,000 open content web sources in 7 languages from all over the world, ranging from mainstream news to government publications to blogs and social media. We study the predictive power of this massive public data in forecasting crowd actions such as large protests and cyber campaigns before they occur. Using natural language processing, event information is extracted from content such as type of event, what entities are involved and in what role, sentiment and tone, and the occurrence time range of the event discussed. The amount of information is staggering and trends can be seen clearly in sheer numbers. In the first half of this paper we show how we use this data to predict large protests in a selection of 19 countries and 37 cities in Asia, Africa, and Europe with high accuracy using standard learning machines. In the second half we delve into predicting the perpetrators and targets of political cyber attacks with a novel application of the na\ïve Bayes classifier to high-dimensional sequence mining in massive datasets.
  • Scheduling, Revenue Management, and Fairness in an Academic-Hospital Division: An Optimization Approach, with D. Bertsimas and R. Baum.

    • Academic Radiology, 21(10):1322--1330, 2014.

    • Abstract: Physician staff of academic hospitals today practice in several geographic locations including their main hospital, referred to as the extended campus. With extended campuses expanding, the growing complexity of a single division's schedule means that a na\ïve approach to scheduling compromises revenue and can fail to consider physician over-exertion. Moreover, it may provide an unfair allocation of individual revenue, desirable or burdensome assignments, and the extent to which the preferences of each individual are met. This has adverse consequences on incentivization and employee satisfaction and is simply against business policy. We identify the daily scheduling of physicians in this context as an operational problem that incorporates scheduling, revenue management, and fairness. Noting previous success of operations management and optimization in each of these disciplines, we propose a simple, unified optimization formulation of this scheduling problem using mixed integer optimization (MIO). Through a study of implementing the approach at the Division of Angiography and Interventional Radiology at the Brigham and Women's Hospital, which is directed by one of the authors, we exemplify the flexibility of the model to adapt to specific applications, the tractability of solving the model in practical settings, and the significant impact of the approach, most notably in increasing revenue significantly while being only more fair and objective.


  • CAREER: Robust Policy Learning for Safe and Reliable Algorithmic Decision Making from Observational Data in Sensitive Applications. NSF IIS 1846210. Sole PI. $500k. 2019–2023.

  • Robustness and Fairness in Policy Learning from Observational Data. JP Morgan AI Faculty Award. Sole PI. $150k annually. 2019–.

  • Fair and Explainable AI with Applications to Financial Services. Capital One. With M. Udell.

  • CRII: RI: New Methods for Learning to Personalize from Observational Data with Applications to Precision Medicine and Policymaking. NSF IIS 1656996. Sole PI. $175k. 2017–2019.

  • City Logistics: Challenges and Opportunities in the Information Age. The Eric And Wendy Schmidt Fund for Strategic Innovation. With H. Topaloglu. $200k. 2017–2019.


  • Area Chair: ICML 2019.

  • Area Chair: AISTATS 2019.


  • Spring 2019: Learning and Decision Making From Data (ORIE 5751 / CS 5726)

    • Master's-level class

    • Description: This course covers the analysis of data for making decisions with applications to electronic commerce, AI and intelligent agents, business analytics, and personalized medicine. The focus of the class is on how to make sense of data and use it to make better decisions using summarization, visualization, statistical inference, interaction, and supervised and reinforcement learning; on a framework for both conceptually understanding and practically assessing generalization, causality, and decision making using statistical principles and machine learning methods; and on how to effectively design intelligent decision-making systems. Topics include summarizing, visualizing, and comparing data distributions; drawing inferences and generalizing conclusions from data; making inferences about causal effects; A/B testing; instrumental variable analysis; sequential decision making and bandits; Markov decision processes; reinforcement learning; and ethics of data-driven decisions. Students are expected to have working knowledge of calculus, probability, and linear algebra as well as a modern scripting language such as Python or R.

  • Fall 2018: Applied Machine Learning (ORIE 5750 / CS 5785)

    • Master's-level class

    • Description: Learn and apply key concepts of modeling, analysis and validation from Machine Learning, Data Mining and Signal Processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, association rule mining and dimensionality reduction.

  • Spring 2018: Learning and Decision Making From Data (ORIE 5751 / CS 5726)

    • Master's-level class

    • Description: This course covers the analysis of data for making decisions with applications to electronic commerce, AI and intelligent agents, business analytics, and personalized medicine. The focus of the class is on how to make sense of data and use it to make better decisions using summarization, visualization, statistical inference, interaction, and supervised and reinforcement learning; on a framework for both conceptually understanding and practically assessing generalization, causality, and decision making using statistical principles and machine learning methods; and on how to effectively design intelligent decision-making systems. Topics include summarizing, visualizing, and comparing data distributions; drawing inferences and generalizing conclusions from data; making inferences about causal effects; A/B testing; instrumental variable analysis; sequential decision making and bandits; Markov decision processes; reinforcement learning; and ethics of data-driven decisions. Students are expected to have working knowledge of calculus, probability, and linear algebra as well as a modern scripting language such as Python or R.

  • Fall 2017: Causality and Learning for Intelligent Decision Making (ORIE 6745)

    • PhD-level class

    • Description: The course introduces students to fundamental principles in causality and machine learning for decision making. Some of the most impactful applications of machine learning, whether in online marketing and commerce, personalized medicine, or data-driven policymaking, are not just about prediction but are rather about taking the right action directed at the right target at the right time. Actions and decisions, unlike predictions, have consequences and so, in seeking to take the right action, one must seek to understand the causal effects of any action or action policy, whether through active experimentation or analysis of observational data. In this course, we will study the interaction of causality and machine learning for the purpose of making decisions. In the case of known causal effects, we will briefly review the theory of generalization as it applies to designing action policies and systems. We will then study causal inference and estimation of unknown causal effects using both classical methods and modern machine learning and optimization methods, considering a variety of settings including controlled experiments (A/B testing), regression discontinuity, instrumental variables, and general observational studies. We will then study the direct design of action policies and systems when causal effects are not known, looking closely both at the online (contextual bandit) and offline (off-policy learning) cases. Finally, we will study ancillary consequences of intelligent systems’ actions, such as algorithmic fairness. The course will culminate in a final project.

  • Spring 2017: Applied Machine Learning (ORIE 5750 / CS 5785)

    • Master's-level class

    • Description: Learn and apply key concepts of modeling, analysis and validation from Machine Learning, Data Mining and Signal Processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, association rule mining and dimensionality reduction.

Education and Academic Positions

  • Assistant Professor
    School of Operations Research and Information Engineering and Cornell Tech
    Cornell University
    New York, New York
    July 2016–

  • Post-Doctoral Associate
    Operations Research and Statistics, Sloan School of Management
    Massachusetts Institute of Technology
    Cambridge, Massachusetts
    July 2015–June 2016

  • Visiting Scholar
    Data Sciences and Operations, Marshall School of Business
    University of Southern California
    Los Angeles, California
    July 2015–June 2016

  • Ph.D., Operations Research
    Massachusetts Institute of Technology, Cambridge, Massachusetts

  • B.A., Mathematics
    University of California, Berkeley, California

  • B.S., Computer Science and Engineering
    University of California, Berkeley, California