Tuesday, March 6 |
07:00 - 09:00 |
Breakfast (Vistas Dining Room) |
09:00 - 09:35 |
Dimitris Bertsimas: From Data to Decisions ↓ In this talk, I review some developments in my research group at MIT regarding taking decisions directly from data.
Starting with my paper with Nathan Kallus: ``From predictive to prescriptive analytics’’, first written in 2014 that presents a framework from extending predictive Machine Algorithms of considerable generality to prescriptive ones for two stage problems, we discuss a number of exiting new developments:
With Chris Mc Cord, in the paper ``From Predictions to Prescriptions in Multistage Optimization Problems’ written in 2017, we extend the earlier framework to multistage problems and also prove rates of convergence.
With Bart van Parys in the paper, ``Bootstrap Robust Prescriptive Analytics’’ written in 2017, we provide an approach to make our prescriptive approaches immune to overfitting phemonema.
With Nihal Koduri in the paper, ``Data-Driven Optimization: A kernel regression approach’’, written in 2018 we provide non-parametric methods that outperform earlier approaches for two stage problems. (TCPL 201) |
09:35 - 10:10 |
Anton Kleywegt: Distributional Robustness and Regularization in Statistical Learning ↓ A central problem in statistical learning is to design prediction algorithms that not only perform well on training data, but also perform well on new and unseen, but similar, data. We approach this problem by formulating a distributionally robust stochastic optimization (DRSO) problem, which seeks a solution that minimizes the worst-case expected loss over a family of distributions that are close to the empirical distribution as measured by Wasserstein distance. We establish a connection between such Wasserstein DRSO and regularization. Specifically, we identify a broad class of loss functions, for which the Wasserstein DRSO is asymptotically equivalent to a regularization problem with a gradient-norm penalty. Such relation provides a new interpretation for approaches that use regularization, including a variety of statistical learning problems and discrete choice models. The connection also suggests a principled way to regularize high-dimensional, non-convex problems, which is demonstrated with the training of Wasserstein generative adversarial networks (WGANs) in deep learning.
This is joint work with Rui Gao and Xi Chen. (TCPL 201) |
10:10 - 10:40 |
Coffee Break (TCPL Foyer) |
10:40 - 11:15 |
Andrew Lim: Calibration of Distributionally Robust Empirical Optimization Problems ↓ We study the out-of-sample properties of robust empirical optimization and develop a theory for data-driven calibration of the ``robustness parameter"" for worst-case maximization problems with concave reward functions. Building on the intuition that robust optimization reduces the sensitivity of the expected reward to errors in the model by controlling the spread of the reward distribution, we show that the first-order benefit of ``little bit of robustness"" is a significant reduction in the variance of the out-of-sample reward while the corresponding impact on the mean is almost an order of magnitude smaller. One implication is that a substantial reduction in the variance of the out-of-sample reward (i.e. sensitivity of the expected reward to model misspecification) is possible at little cost if the robustness parameter is properly calibrated. To this end, we introduce the notion of a robust mean-variance frontier to select the robustness parameter and show that it can be approximated using resampling methods like the bootstrap. Our examples show that robust solutions resulting from ``open loop"" calibration methods (e.g. selecting a 90% confidence level regardless of the data and objective function) can be very conservative out-of-sample, while those corresponding to the ambiguity parameter that optimizes an estimate of the out-of-sample expected reward (e.g. via the bootstrap) with no regard for the variance are often insufficiently robust.
This is joint work with Jun-ya Gotoh and Michael J. Kim. (TCPL 201) |
11:15 - 11:50 |
Vishal Gupta: Small-Data, Large-Scale Linear Optimization ↓ Optimization applications often depend upon a huge number of uncertain parameters. In many contexts, however, the amount of relevant data per parameter is small, and hence, we may have only imprecise estimates. We term this setting -- where the number of uncertainties is large, but all estimates have fixed and low precision -- the ``small-data, large-scale regime."" We formalize a model for this regime, focusing on linear programs with uncertain objective coefficients, and prove that the small-data, large-scale regime is distinct from the traditional large-sample regime. Consequently, methods like sample average approximation, data-driven robust optimization, regularization, and ``estimate-then-optimize"" policies can perform poorly. We propose a novel framework that, given a policy class, identifies an asymptotically best-in-class policy, where the asymptotics hold as the number of uncertain parameters grows large, but the amount of data per uncertainty (and hence the estimate's precision) remains small. We apply our approach to two natural policy classes for this problem: the first inspired by the empirical Bayes literature in statistics and the second by the regularization literature in optimization and machine learning. In both cases, the sub-optimality gap between our proposed method and the best-in-class policy decays exponentially fast in the number of uncertain parameters, even for a fixed amount of data. We also show that in the usual large-sample regime our policies are comparable to the sample average approximation. Thus, our policies retain the strong large-sample performance of traditional methods, and additionally enjoy provably strong performance in the small-data, large-scale regime. Numerical experiments confirm the significant benefits of our methods.
Joint work with Prof. Paat Rusmevichientong, USC Marshall. (TCPL 201) |
12:10 - 12:45 |
Karthyek R. A. Murthy: Distributionally Robust Optimization with Optimal Transport Distances: Some Statistical and Algorithmic Advances ↓ The objective of this talk is to showcase optimal transport distances, which include the well-known Wasserstein distances as special case, as an attractive tool to model distributional ambiguities when performing optimization under uncertainty. Unlike traditional stochastic optimization methods where we look for optimal decision choices assuming a fixed probability model, here we look for decision choices that perform uniformly good for any choice of probability distribution that is within a fixed radius (quantified by optimal transport distance) from a baseline model. Interestingly, we show that various popular machine learning algorithms that employ regularization can be recovered as particular examples of this optimal transport based DRO approach.
Drawing inspiration from empirical likelihood theory in Statistics, we develop a systematic approach to choose the radius of ambiguity sets in data-driven optimization settings. Along with beating the curse of dimensionality that is associated with the traditional approaches based on concentration inequalities, the proposed choice of radius also allows us to systematically choose regularization parameters in large scale machine learning problems without having to use the brute-force tuning approach of cross-validation.
We also develop a stochastic gradient descent based iterative scheme to solve the proposed DRO formulation for a large class of problems involving affine decision rules. Interestingly, the proposed computational algorithm is at least ""as fast"" as the non-robust sample average approximation, thus making optimal transport based DRO scalable and attractive for computational purposes. (TCPL 201) |
12:25 - 13:45 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
14:00 - 14:35 |
Bart Paul Gerard Van Parys: Bootstrap Robust Prescriptive Analytics ↓ We discuss prescribing optimal decisions in a framework where their cost depends on uncertain problem parameters that need to be learned from supervised data. Prescriptive analytics consists in making optimal decisions specific to a particular covariate context. Prescriptive methods need to factor in additional observed contextual information on a potentially large number of covariates as opposed to static decision methods who only use sample data. Any naive use of training data may, however, lead to gullible decisions over-calibrated to one particular data set. In this presentation, we use ideas from distributionally robust optimization and the statistical bootstrap to propose two novel prescriptive methods based on (nw) Nadaraya-Watson and (nn) nearest neighbors learning which safeguard against overfitting and lead to improved out-of-sample performance. Both resulting robust prescriptive methods reduce to tractable convex optimization problems and enjoy a limited disappointment on bootstrap data. We illustrate the data-driven decision-making framework and our novel robustness notion on a small news vendor problem as well as a small portfolio allocation problem. (TCPL 201) |
14:35 - 15:10 |
Henry Lam: Parameter Calibration for Optimization under Uncertainty ↓ Optimization formulations to handle decision-making under uncertainty often contain parameters needed to be calibrated from data. Examples include uncertainty set sizes in robust optimization, and Monte Carlo sample sizes in constraint sampling or scenario generation. We investigate strategies to select good parameter values based on data splitting and the validation of their performances in terms of feasibility and optimality. We analyze the effectiveness of these strategies in relation to the complexity of the optimization class and problem dimension. (TCPL 201) |
15:10 - 15:45 |
Nathan Kallus: Distributionally Robust Optimization for Learning Causal-Effect-Maximizing Policies ↓ Policy learning from observational data seeks to extract personalized interventions from passive interaction data to maximize causal effects. The aim is to transform electronic health records to personalized treatment regimes, transactional records to personalized pricing strategies, and click-streams to personalized advertising campaigns. The task is made difficult by the observational nature of the data: only outcomes of the interventions performed are available and the distribution of units exposed to one intervention or another differ systematically. In such purely observational setting, existing methods adapted from experimental settings tenuously rely on unstable plug-in approaches and heuristic stopgaps to address ensuing complications. In this talk I will describe a new approach based on distributionally robust optimization that overcomes these failures and its application to personalized medicine. By showing that estimation error reduces to the discrepancy in a moment of a particular unknown function, the approach relies on protecting against any possible realization thereof. On the one hand, this leads to unparalleled finite-sample performance, as demonstrated by experiments. On the other hand, theoretical results show that the asymptotic optimality and convergence rates of plug-in approaches are preserved. Time permitting, I will also outline advances in handling continuous treatments and in representation learning for causal inference using deep neural networks. (TCPL 201) |
15:30 - 16:00 |
Coffee Break (TCPL Foyer) |
16:20 - 16:55 |
Sanjay Mehrotra: Decomposition Methods For Solving Distributionally Robust Two-Stage Stochastic Integer Programs ↓ We introduce and study a two-stage distributionally robust mixed binary problem (TSDR-MBP) where the random parameters follow the worst-case distribution belonging to an uncertainty set of probability distributions. We present a decomposition algorithm, which utilizes distribution separation procedure and parametric cuts within Benders’ algorithm or L-shaped method, to solve TSDR-MBPs with binary variables in the first stage and mixed binary programs in the second stage. We refer to this algorithm as distributionally robust integer (DRI) L-shaped algorithm. Using similar decomposition framework, we provide another algorithm to solve TSDR linear problem where both stages have only continuous variables. We investigate conditions and the families of ambiguity set for which our algorithms are finitely convergent. We present two examples of ambiguity set, defined using moment matching, or Kantorovich-Rubinstein distance (Wasserstein metric), which satisfy the foregoing conditions. We also present a cutting surface algorithm to solve TSDR-MBPs. We computationally evaluate the performance of the DRI Lshaped algorithm and the cutting surface algorithm in solving distributionally robust versions of a few instances from the Stochastic Integer Programming Library, in particular stochastic server location and stochastic multiple binary knapsack problem instances. We also discuss the usefulness of incorporating partial distribution information in two-stage stochastic optimization problems. (TCPL 201) |
16:55 - 17:30 |
John Gunnar Carlsson: Wasserstein Distance and the Distributionally Robust TSP ↓ Motivated by a districting problem in multi-vehicle routing, we consider a distributionally robust version of the Euclidean travelling salesman problem in which we compute the worst-case spatial distribution of demand against all distributions whose Wasserstein distance to an observed demand distribution is bounded from above. This constraint allows us to circumvent common overestimation that arises when other procedures are used, such as fixing the center of mass and the covariance matrix of the distribution. Numerical experiments confirm that our new approach is useful when used in a decision support tool for dividing a territory into service districts for a fleet of vehicles when limited data is available. (TCPL 201) |
17:30 - 18:05 |
Dan Iancu: Discrete Convexity and Dynamic Robust Optimization ↓ We discuss necessary and sufficient conditions for the optimality of specific classes of policies in dynamic robust optimization. We then focus on the specific case of affine policies, and show how our conditions can be used to recover and generalize several existing results in the literature. Our treatment draws interesting connections with the theory of discrete convexity (L-natural / M-natural convexity and multimodularity) and global concave envelopes, which may be of independent interest. Time permitting, we also discuss some related applications of the results in the context of a learning and stopping problem.
This is joint work with Yehua Wei. (TCPL 201) |
17:45 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |