Tuesday, May 14 |
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
09:00 - 10:00 |
Matteo Barigozzi: General Spatio-Temporal Factor Models for High-Dimensional Random Fields on a Lattice (Online) |
10:00 - 10:30 |
Coffee Break (TCPL Foyer) |
10:30 - 11:30 |
George Michailidis: Regularized high dimension low tubal-rank tensor regression ↓ Tensor regression models are of interest in diverse fields of social and behavioral sciences, including neuroimaging analysis, image processing and so on. Recent theoretical advancements of tensor decomposition have facilitated significant development of various tensor regression models. This talk discusses a tensor regression model, wherein the coefficient tensor is decomposed into two components: a low tubal rank tensor and a structured sparse one. We first address the issue of identifiability of the two components comprising the coefficient tensor and subsequently develop a fast and scalable Alternating Minimization algorithm to solve the convex regularized program. Further, finite sample error bounds under high dimensional scaling for the model parameters are provided. The performance of the model is assessed on synthetic data and is also used in an application involving data from an intelligent tutoring platform. Extensions to multivariate time series data are also briefly discussed. (TCPL 201) |
11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
13:00 - 13:40 |
Chencheng Cai: Design of Experiments for Network Data ↓ Optimal experimental designs on network data are challenging due to the interference between network units. One unit’s treatment status usually affects its neighbors’ outcomes, a phenomenon known as spillover effects or network effects. In this talk, we focus on the experimental designs for three types of networks. Specifically, for a well-clustered network, we consider the optimal design equipped with the Horvitz-Thompson estimator with the sampling procedure considered. We establish a minimum sample curve, which is a combination of a number of clusters and cluster size to be sampled for any given required power. For imperfectly clustered networks, we investigate the optimal randomized saturation design for difference-in-means estimators and re-evaluate two widely used designs, which are believed to be optimal but turn out to be sub-optimal. Lastly, we introduce a novel partitioning method for an arbitrary sparse network without cluster structures, where direct treatments and interference can be separately controlled for a sub-network. By trading data quantity for quality, the proposed method turns out to outperform existing designs on sparse networks. (TCPL 201) |
13:40 - 14:20 |
Dan Yang: Network Regression and Supervised Centrality Estimation ↓ The centrality in a network is often used to measure nodes' importance and model network effects on a certain outcome. Empirical studies widely adopt a two-stage procedure, which first estimates the centrality from the observed noisy network and then infers the network effect from the estimated centrality, even though it lacks theoretical understanding. We propose a unified modeling framework, under which we first prove the shortcomings of the two-stage procedure, including the inconsistency of the centrality estimation and the invalidity of the network effect inference. Furthermore, we propose a supervised centrality estimation methodology, which aims to simultaneously estimate both centrality and network effect. The advantages in both regards are proved theoretically and demonstrated numerically via extensive simulations and a case study in predicting currency risk premiums from the global trade network. (TCPL 201) |
14:20 - 15:00 |
Aseem Baranwal: Locally optimal message-passing on feature-decorated sparse graphs ↓ We study the node classification problem on feature-decorated graphs in the sparse setting, i.e., when the expected degree of a node is O(1) in the number of nodes, in the fixed-dimensional asymptotic regime, i.e., the dimension of the feature data is fixed while the number of nodes is large. Such graphs are typically known to be locally tree-like. We introduce a notion of Bayes optimality for node classification tasks, called asymptotic local Bayes optimality, and compute the optimal classifier according to this criterion for a fairly general statistical data model with arbitrary distributions of the node features and edge connectivity. The optimal classifier is implementable using a message-passing graph neural network architecture. We then compute the generalization error of this classifier and compare its performance against existing learning methods theoretically on a well-studied statistical model with naturally identifiable signal-to-noise ratios (SNRs) in the data. We find that the optimal message-passing architecture interpolates between a standard MLP in the regime of low graph signal and a typical convolution in the regime of high graph signal. Furthermore, we prove a corresponding non-asymptotic result. (TCPL 201) |
15:00 - 15:30 |
Coffee Break (TCPL Foyer) |
15:30 - 16:10 |
Anru Zhang: High-order Singular Value Decomposition in Tensor Analysis ↓ The analysis of tensor data, i.e., arrays with multiple directions, is motivated by a wide range of scientific applications and has become an important interdisciplinary topic in data science. In this talk, we discuss the fundamental task of performing Singular Value Decomposition (SVD) on tensors, exploring both general cases and scenarios with specific structures like smoothness and longitudinality. Through the developed frameworks, we can achieve accurate denoising for 4D scanning transmission electron microscopy images; in longitudinal microbiome studies, we can extract key components in the trajectories of bacterial abundance, identify representative bacterial taxa for these key trajectories, and group subjects based on the change of bacteria abundance over time. We also showcase the development of statistically optimal methods and computationally efficient algorithms that harness valuable insights from high-dimensional tensor data, grounded in theories of computation and non-convex optimization. (TCPL 201) |
16:10 - 16:50 |
Simone Giannerini: Inference in matrix-valued time series with common stochastic trends and multi factor error structure ↓ We study inference in the context of a (large dimensional) factor model for matrix-valued time series, with (possibly) common stochastic trends and a stationary factor structure in the error term. As a preliminary, negative result, we show that both a "flattened” and a projection/sketching-based estimation technique offer super consistent estimation of the row and column loadings spaces, with no improvement in the rate of convergence when using a projection-based estimation. However, the common stochastic trends cannot be estimated consistently in the presence of a factor structure in the error term: in the presence of strong cross sectional dependence, even sketching does not help. In turn, this precludes estimation of the stationary idiosyncratic component associated to the common factors. Hence, we propose an alternative way of consistently estimating the common (stationary and nonstationary) factors, and the row and column loadings spaces associated with both the stationary and nonstationary common factors. Our technique is based on: firstly, consistently estimating the row and column loadings spaces associated with both the stationary and nonstationary common factors; secondly, “unsketching”, i.e. getting rid of the common nonstationary component by projecting the data onto the orthogonal complement of the estimated loadings space associated with the common stochastic trends; thirdly, using the “unsketched" data to recover the whole factor structure (loadings and common factors) associated with the stationary factors; fourthly, removing the estimated stationary common component from the data, and estimating the nonstationary common component once again. In this case, we show that full-blown consistent estimation is possible, and that projection-based estimation improves on the rates of convergence of the estimated loadings space for both the stationary and the nonstationary factor structures. Ancillary results such as limiting distribution of the estimated common factors and loadings, and sequential procedures to estimate the number of common factors are also proposed. (TCPL 201) |
16:50 - 17:30 |
Yuefeng Han: Estimation and Inference for CP Tensor Factor Model ↓ High-dimensional tensor-valued data have recently gained attention from researchers in economics and statistics. We consider the estimation and inference of high-dimensional tensor factor models, where each dimension of the tensor diverges. Specifically, we focus on the factor model that admits CP-type tensor decomposition, allowing for loading vectors that are not necessarily orthogonal. Based on the contemporary covariance matrix, we propose an iterative higher-order projection estimation method. Our estimator is robust to weak dependence among factors and weak correlation across different dimensions in the idiosyncratic shocks. We develop an inferential theory, establishing consistency and the asymptotic normality under relaxed assumptions. Through a simulation study and an
empirical application with sorted portfolios, we illustrate the advantages of our proposed estimator over existing methodologies in the literature. (TCPL 201) |
17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |