Schedule for: 25w5333 - Novel Mathematical Paradigm for Phylogenomics
Beginning on Sunday, August 24 and ending Friday August 29, 2025
All times in Banff, Alberta time, MDT (UTC-6).
Sunday, August 24 | |
---|---|
16:00 - 17:30 | Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre) |
17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |
20:00 - 22:00 | Informal gathering (TCPL Foyer) |
Monday, August 25 | |
---|---|
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
08:45 - 09:00 |
Introduction and Welcome by BIRS Staff ↓ A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions. (TCPL 201) |
09:00 - 09:40 |
Peter Lockhart: - REVIEW TALK - Environmental Network analyses ↓ Robust network inference can enhance our understanding of complex microbial ecosystems. In this introductory talk, I discuss the motivations for pursuing network-based approaches and the challenges presented by different biological systems. These systems include soils from diverse and conventional pastures, dioxin-contaminated land, waterways containing chemical pollutants, and oligotrophic limestone caves that host bioactive, antimicrobial-producing bacteria. Despite the biological complexity of microbial interactions in each of these environments, analyzing network properties has the potential for valuable biological insights and practical outcomes. This talk reviews key network properties and analytical approaches relevant to these systems. (Online) |
09:40 - 10:00 |
Caroline Colijn: Exchangeability in phylogenetics, with an application to birds ↓ Coalescent theory is widely used as a framework for understanding patterns of ancestry and for interpreting sequence data. For example, coalescent theory underlies the likelihoods used in inference of past population dynamics, commonly applied to infectious disease phylodynamics. Exchangeability, in coalescent theory, refers to the property that each pair of lineages is equally likely to coalesce, as we move back in time from the present to the past, given a set of observed taxa. Forward in time, exchangeability means that each lineage is equally likely to branch next. In this talk I will introduce a statistical test that measures departures from exchangeability in timed phylogenies. The test works by asking how often, in the tree, a node's child lineage is chronologically next to branch, after the node itself. We derive a test statistic and its distribution, as well as two limiting distributions. I will explore how this test performs on simulated genealogies and on phylogenies from data. I will discuss some mechanisms that can cause non-exchangeable patterns, and how to identify which are most plausible. (TCPL 201) |
10:00 - 10:30 | Coffee Break (TCPL Foyer) |
10:30 - 11:10 |
Amaury Lambert: Stochastic Models Coupling the Evolution of Genomes and Species ↓ Due to recombination and to hybridization between species, the genealogies of genes, even sampled from distantly related species, are usually different at different genes, and (so) distinct from the species tree. We review models coupling gene trees and species tree including the popular multispecies coalescent as well as three alternative models that our team has devised and studied: the nested coalescent, Kingman's coalescent with erosion and the gene-based diversification model, acknowledging the importance of gene flow and where gene histories shape the species tree rather than the opposite. These models are meant to pave the way for approaches of diversification using the richer signal contained in genomic evolutionary histories rather than in the mere species tree. (Online) |
11:10 - 11:30 |
Dannie Durand: Sorting out Phylogenetic Discordance in Population Phylogenomics ↓ Phylogenetic population modeling, combined with sequencing of large collections of closely related taxa, has enabled unprecedented exploration of ancestral population processes in evolutionary and ecological contexts. The discovery of Neanderthal DNA in human genomes and the role of adaptive introgression in Heliconius butterflies are just two of many intriguing examples. Given a three-species phylogeny (A|BC), the history of a gene sampled from these species may display concordant (A|BC) or discordant (B|AC, C|AB) gene tree topologies. The probability of a given discordant topology depends on the specific population processes at work. Thus, given a set of gene trees associated with a species tree, the distribution of gene tree topologies provides a wealth of information for testing alternate hypotheses and estimating population parameters.
Approaches that exploit this signal have revolutionized our understanding of introgression but also pose substantial methodological challenges. First, correct interpretation requires distinguishing gene duplication from discordance due to Incomplete Lineage Sorting (ILS) and introgression. Typically, discordance due to gene duplication is screened out by restricting analyses to single-copy orthologs, but sequence-based ortholog identification is error-prone, especially in duplication-rich and poorly assembled genomes. In addition, categorizing discordance does not scale naturally to larger species trees, because the number of different topologies grows super-exponentially. A common approach is to sub-sample sets of three species (rooted triplets), but the number of triplets also grows exponentially, and combining information from multiple triplets is labor intensive.
Here, we present a novel algorithm that estimates the portion of the topological distribution arising from ILS in each species internode, while excluding discordance due to introgression between distant descendants and accounting for uncertainty due to gene loss and missing data. Our approach also screens out incongruence due to gene duplication and can be applied to both multigene families and single copy orthologs. Topology statistics are obtained for all internal branches in a single execution, supporting automated, large-scale phylogenomic analyses of entire clades. Our algorithm is polynomial in tree size and hence, applicable to large species trees. We demonstrate our approach through the reanalysis of several phylogenomic datasets from the literature. Characterizing ILS can help to resolve phylogenetic uncertainty and is important for understanding the relative contributions of ILS, introgression, and convergent evolution to trait evolution and present-day genetic variation. (TCPL 201) |
11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
13:00 - 13:20 | Group tour of the Banff Centre (Vistas Dining Room) |
13:20 - 13:40 |
Laura Kubatko: A composite likelihood framework for phylogenomic inference ↓ Species-level phylogenetic inference under the multispecies coalescent model remains challenging in the typical inference frameworks (e.g., the likelihood and Bayesian frameworks) due to the dimensionality of the space of both gene trees and species trees. Algebraic approaches intended to establish identifiability of species tree parameters have suggested computationally efficient inference procedures that have been widely used by empiricists and that have good theoretical properties, such as statistical consistency. However, such approaches are less powerful than approaches based on the full likelihood. In this talk, I will describe how the use of a composite likelihood approach enables computationally tractable statistical inference of the species-level phylogenetic relationships for genome-scale data. In particular, asymptotic properties of estimators obtained in the composite-likelihood framework will be derived, and the utility of the methods developed will be demonstrated with both simulated and empirical data. (TCPL 201) |
13:40 - 14:00 |
Blerina Sinaimeri: Reconstructing Evolutionary Histories: Algorithms and Challenges in Cophylogeny ↓ Cophylogeny studies the joint evolutionary history of interacting species, such as hosts and parasites or mutualistic partners. Reconstructing these histories involves inferring evolutionary events, such as co-divergence, host switches, duplications, and losses, that relate the two phylogenies. This poses algorithmic challenges due to the vast space of possible reconciliations and the complexity of identifying biologically plausible scenarios. In this work, we review some key computational approaches to cophylogeny reconstruction and emphasize recent efforts aimed at exploring and visualizing the space of possible evolutionary histories. We also discuss open problems related to scalability, ambiguity in solutions, and the need for better interpretability tools to support biological insight. (TCPL 201) |
14:00 - 14:20 |
Maribel Hernandez-Rosales: REvolutionh-tl: A Scalable and Interpretable Framework for Inference of Gene Family Evolution ↓ REvolutionH-tl is a graph-based tool for reconstructing evolutionary histories of gene families without requiring precomputed gene or species trees. It begins by computing sequence similarity across species, then constructs best match graphs (BMGs) to identify orthogroups. From these, the tool infers event-labeled gene trees based on consistent triples and assigns speciation or duplication labels using species overlap. Species trees are then reconstructed from informative triples derived from the gene trees. Finally, REvolutionH-tl reconciles gene and species trees to infer evolutionary events such as duplications and losses. Each step is modular and designed for scalability, allowing efficient and biologically consistent reconstruction of gene family evolution from raw sequence data. (TCPL 201) |
14:20 - 14:40 |
Yao-ban Chan: Dealing with gene dependence in reconciliations ↓ The phylogenetic trees of species can be distinct from the trees of their genes, due to various evolutionary processes that affect genes but do not create new species. Reconciliations map the gene trees into the species tree, explaining the discrepancies by events including gene duplications and losses, lateral genetic transfer, and incomplete lineage sorting. We study the most parsimonious reconciliation problem, where we seek to optimise a weighted sum of these events. Some reconciliation models are solvable in polynomial time, typically with dynamic programming on gene subtrees. However, many models are NP-hard, and are not amenable to dynamic programming; these are often distinguished by some form of dependence between gene trees or subtrees. In this talk, we explore some strategies for finding most parsimonious reconciliations for these harder models: (1) a probabilistic algorithm for sampling reconciliations from an imposed Boltzmann distribution, which recovers optimal reconciliations through simulated annealing; and (2) a dynamic programming approach on species subtrees, extending to a branch-and-bound algorithm. We illustrate the application of these methods to two models, the segmental duplication model and the gene network model. This lays the foundations for new approaches to difficult (but more realistic) reconciliation problems. (TCPL 201) |
14:40 - 15:00 |
Kristina Wicke: Revisiting the standard LGT model ↓ Processes such as incomplete lineage sorting (ILS) and reticulate evolution such as hybridization and lateral gene transfer (LGT) are known to cause discordance between gene trees and species trees, complicating phylogenetic inference. While the multi-species coalescent model has led to a rich theoretical and understanding of ILS, probabilistic models for LGT have received comparatively less attention. In this talk, we revisit the standard LGT model of Linz et al. (2007), in which random LGT events occur according to a Poisson process with a constant transfer rate. Focusing on the simplest case of three species, we derive gene tree and site pattern probabilities and discuss their implications for identifiability. We also address the question of whether LGT and ILS can be distinguished from one another and outline directions we will pursue in future work. This talk is based on joint work with Simone Linz and Laura Kubatko. (TCPL 201) |
15:00 - 15:30 | Coffee Break (TCPL Foyer) |
15:30 - 16:30 |
Hands-on session ↓ 1) PRESENTER : Mattéo Delabre
TITLE : A Unifying Approach to Algebraic Dynamic Programming
ABSTRACT : Dynamic programming is an algorithm design technique for solving combinatorial optimization problems that can be recursively decomposed into subproblems so that any optimal solution can be constructed from optimal solutions to these subproblems. All dynamic programming algorithms present two aspects: (1) a recursive decomposition of the problem space, and (2) a scoring and selection scheme for the elements of this space. Such algorithms are usually presented using recurrences between table entries, interleaving these two aspects in a way that complicates analysis, adaptation, and generalization.
Algebraic dynamic programming is an approach that promotes a complete separation between the two aspects, enabling the construction of algorithms in a more declarative way. Because the recursive problem decomposition is made independent of any particular scoring scheme, it may be reused to answer new questions by simply swapping the scoring scheme. Scoring schemes, which are shown to correspond to algebraic structures called semirings, can be constructed by composing building blocks reusable across various problems.
Many principles of algebraic dynamic programming have been independently discovered by multiple researchers over the last decades. The main goal of this work is to present these results in a unified and approachable way. It is accompanied by a Python library named padapto that implements the concepts described herein. We also report new results and highlight open problems in the field. (TCPL 201) |
16:30 - 17:30 | Collaborative work (TCPL 201) |
17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |
Tuesday, August 26 | |
---|---|
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
09:00 - 09:40 |
Barbara Holland: - REVIEW LECTURE - Phylogenetic Networks: What Have Mathematicians Enjoyed Working On, What Do Biologists Really Want, and Is There Any Overlap? ↓ Phylogenetic networks have emerged as essential tools for modeling reticulate evolution (e.g. hybridization, recombination, and horizontal gene transfer). They are also fascinating objects from a mathematical standpoint. Over the last decades a sizable cohort of mathematicians have been attracted to work in the field and there has been a proliferation of network classes as well as novel algorithms.
Some natural questions from a mathematician’s viewpoint are to ask if a particular class of networks (e.g. normal, tree-child, k-level) is identifiable from a particular type of data (e.g. trinets, quartets, distances) or to ask if an algorithm exists that will find it in polynomial time. The set of network classes where you get happy answers to these questions is rather small and not necessarily biologically realistic (there is after all no particular reason for evolution to be identifiable or computable).
Some questions that evolutionary biologists might care about (this is my guess) are more focused on process, i.e. how prevalent is hybridization compared to speciation? Does the rate of reticulate evolution decline with increasing evolutionary distance? Are hybrid species more or less likely to go extinct than there non-hybrid counterparts?
In this talk I will try to take a birds-eye view of the field of phylogenetic networks and ask if we are focusing on the right questions. (TCPL 201) |
09:40 - 10:00 |
Louxin Zhang: Counting Phylogenetic Networks ↓ Phylogenetic networks are powerful models for representing evolutionary histories that involve reticulation events, such as hybridization, horizontal gene transfer, and recombination. The topological complexity of these networks poses significant challenges for their inference, especially as the number of taxa increases. To gain a deeper mathematical understanding of the space of phylogenetic networks, we investigate the counting and enumeration of restricted classes of binary phylogenetic networks, as well as unrestricted binary phylogenetic networks. In this talk, I will present recent results on this topic. (TCPL 201) |
10:00 - 10:30 | Coffee Break (TCPL Foyer) |
10:30 - 10:50 |
Leo van Iersel: Identifiability of level-2 phylogenetic networks ↓ Which evolutionary histories can potentially be reconstructed from sufficiently long DNA sequences? This question has previously been studied for phylogenetic trees and for phylogenetic networks that are level-1, which means that reticulate evolutionary events were restricted to be independent in the sense that the corresponding cycles in the network are non-overlapping. In this talk, I will discuss the identifiability of phylogenetic networks from DNA sequence data under Markov models of DNA evolution for more general classes of networks that may contain pairs of tangled reticulations (level-2).
This is joint work with Aviva Englander, Martin Frohn, Elizabeth Gross, Niels Holtgrefe, Mark Jones and Seth Sullivant. (TCPL 201) |
10:50 - 11:10 |
John Rhodes: Identifiability of species relationships from data: beyond planar networks ↓ Inferring species relationships when hybridization or other gene flow results in a network history
remains difficult, in part because phylogenetic signal may be confounded with incomplete lineage
sorting. Several current strategies under various models first infer quartet relationships displayed on
the network, and then use these as data to infer a larger network. Most work has been limited to
level-1 networks, with the first level-2 results very recent. The extent to which statistically consistent
inference of more complex network classes is possible remains open.
We show that a large class of networks is identifiable and thus in theory inferable, using displayed
quartet relationships under several models, including ones addressing incomplete lineage sorting.
While the class of networks' we consider still restricts network structure to be galled and tree-child, it
allows for networks of arbitrary level, including non-planar ones. We structure our proofs so that once
a small number of basic identifiability results for any model and data summary are established, purely
combinatorial theorems give identifiability of the larger network structure.
This is joint work with E. Allman, C. An\'e, and H. Ba\~nos. (TCPL 201) |
11:10 - 11:29 |
Andrew Francis: Phylogenetic encodings ↓ A number of encodings are in common use for phylogenetic trees and networks. These include those of mathematical simplicity, such as the encoding as a set of vertices and a set of edges, as well as some such as Newick format, that captures important information such as clade structures. In this talk I will describe these as well as more recently developed encodings such as in partitions, covers, or "folios", and discuss how their different underlying paradigms can indicate the nature of their specific advantages. (TCPL 201) |
11:29 - 11:30 |
Group Photo ↓ Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo! (TCPL Foyer) |
11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
13:20 - 13:40 |
Pengyu Liu: Polynomial encoding of phylogenetic trees and networks ↓ Encoding phylogenies plays a central role in many areas of phylogenetics, including evolutionary inference, consensus tree construction, and large-scale comparative genomics. Complete polynomial invariants are important mathematical objects for characterizing discrete structures such as rooted trees and networks. These polynomials can provide interpretable and computationally efficient encoding of phylogenies. When written as coefficient matrices, they are compatible with modern data analytics tools. In this talk, we introduce complete polynomial invariants for phylogenetic trees and a class of phylogenetic networks. (TCPL 201) |
13:40 - 14:00 |
Norbert Zeh: Sometimes only fools get it right: When cluster reduction isn't safe ↓ Cluster reduction is a technique that can be used to speed up the construction of phylogenetic networks from sets of trees, or so we thought. Baroni, Semple, and Steel proved that cluster reduction is safe for 2 trees, that is, it preserves the optimality of the constructed network when the input consists of 2 trees. While there are inputs where cluster reduction isn’t applicable, Li and Zeh verified experimentally that on 2-tree inputs, it is the single most important technique for constructing optimal networks efficiently in practice, that not using cluster reduction would be folly. This talk shows cluster reduction isn’t safe for 8 or more trees. It will mention but not prove a number of other results in our paper: cluster reduction isn’t safe for computing an optimal orchard that displays 8 or more trees, cluster reduction is safe for 3 trees, and cluster reduction is safe for computing optimal tree-child networks. What happens with 4-7 trees is an open problem, but our intuition is that cluster reduction stops being safe already for 4 trees. Proving this would require different techniques than the ones we use. In particular, we prove that if there exists a set of t trees that cannot be displayed by a network with at most 2n reticulations, then there exists a set of 2t trees for which cluster reduction fails, but every set of 3 trees can be displayed by a network with at most 2n reticulations. (TCPL 201) |
14:00 - 14:20 |
Md Shamsuzzoha Bayzid: Improved quartet-based species tree inference despite gene tree estimation error ↓ Species tree estimation from genes sampled from throughout the whole genome is challenging because of gene tree discordance, often caused by incomplete lineage sorting (ILS). Quartet-based summary methods for estimating species trees from a collection of gene trees are becoming popular due to their high accuracy and theoretical guarantees of robustness to arbitrarily high amounts of ILS. ASTRAL, the most widely used quartet-based method, aims to infer species trees by maximizing the number of quartets in the gene trees consistent with the species tree. An alternative approach is inferring quartets for all subsets of four species and amalgamating them into a coherent species tree. While summary methods can be sensitive to gene tree estimation error (GTEE) arising from a combination of reasons (ranging from analytical factors to more biological causes, as in short gene sequences and short internal branches due to rapid radiations), quartet amalgamation offers an advantage by potentially bypassing gene tree estimation. In this talk, I will demonstrate that careful generation and amalgamation of weighted quartets, as implemented in methods like wQFM and wQMC, can lead to significantly more accurate trees than popular methods like ASTRAL, especially in the face of gene tree estimation errors. I will also present our recent algorithm, QT-WEAVER, which effectively accounts for GTEE by correcting the weighted quartet distribution induced by a set of estimated gene trees. (TCPL 201) |
14:20 - 14:40 |
Riccardo Dondi: Multi-Label Tree Pruning for Reconciliation ↓ Multi-labeled trees (MUL- trees) are rooted leaf labeled trees, where each label may be associated with more than one leaf. Notable examples of multi-labeled trees in phylogenetics are gene trees. In this talk, we consider combinatorial problems based on leaf-pruning (that is leaf removal) of multi-labeled trees that leads to single-labeled trees.
One variant (MULSETPC), given a set of MUL-trees, asks whether it is possible to define a pruning that leads to a set of consistent trees.
Another variant (MUL-tree Pruning for Reconciliation), given a set of MUL-trees and a species tree, asks for a pruning that minimizes the reconciliation cost (number of duplications).
We discuss the complexity of these problems and some algorithmic strategies to solve them. (TCPL 201) |
14:40 - 15:00 |
Mathieu Gascon: Gene repertoires evolution minimizing episodes of gains and losses ↓ Given a leaf labeled tree topology, the small phylogeny problem consist in finding an internal node labeling according to a certain optimization function. The small phylogeny problem has been widely considered for inferring ancestral sequences of a gene family given the phylogenetic tree of the gene family, or in the genome rearrangement field for inferring ancestral gene contents. In this latter case, given a species tree with leaves labeled by the gene content and order of the corresponding genome, the problem is to infer the gene content and order at the internal nodes of the tree minimizing a certain rearrangement distance metric. Almost all those genome rearrangement models lead to intractable problems, especially those involving duplications and losses, and even on two genomes. Here, we consider a simpler version of the small phylogeny problem on gene repertoires with an evolutionary model restricted to gains and losses. More precisely, given a tree topology leaf labeled with gene sets, the problem is to infer gene sets of the internal nodes of the tree minimizing the number of gain and loss episodes, i.e. where a gain or loss can add or remove one gene or more. We present a polynomial-time algorithm for solving the problem. For the moment, the algorithm is a heuristic, but our conjecture is that it leads to an optimal solution. We present this work in progress to stimulate discussion and possibly motivate collaborations on this subject. (TCPL 201) |
15:00 - 15:30 | Coffee Break (TCPL Foyer) |
15:30 - 16:30 | Open problems - Cédric Chauve, Peter Stadler, Mattéo Delabre and Marc Hellmuth (TCPL 201) |
16:30 - 17:30 | Poster session (TCPL 201) |
17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |
Wednesday, August 27 | |
---|---|
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
08:45 - 09:20 |
Ben Raphael: - KEYNOTE TALK - Models and Algorithms for Cancer Evolution ↓ Cancer arises through the accumulation of genetic and epigenetic alterations in the cells of a tissue.This talk will describe models and algorithms to address some of the challenges that arise in inferring cancer evolution from high-throughput DNA/RNA sequencing data. These challenges include: deconvolving mixtures of mutations from sequencing of bulk tumor samplings, addressing high rates of error and missing data in single-cell sequencing, modeling copy number aberrations that alter large genomic regions, and leveraging information from spatial and longitudinal sampling. (TCPL 201) |
09:20 - 09:40 |
Mohamed El-Kebir: Emerging Topics in Cancer Phylogenetics ↓ Cancer is a complex genetic disease driven by evolutionary process, leading to heterogeneous tumors composed of multiple subpopulations of cells with different sets of mutations. A tumor’s evolutionary history can be modeled by a phylogenetic tree. In this talk, I will survey combinatorial algorithms for reconstructing, comparing, and interpreting tumor phylogenies from DNA sequencing data. (TCPL 201) |
09:40 - 10:00 |
Mukul Bansal: Fast and Accurate Distance-Based Reconstruction of Single-Cell Tumor Phylogenies ↓ Accurate reconstruction of single-cell tumor phylogenies, or tumor cell lineage trees, based on somatic copy number alterations (sCNAs) has important implications for cancer prognosis and treatment. However, existing computational methods for this problem are often too slow to be applied to large datasets. In addition, many existing methods are highly sensitive to the kinds of errors and noise present in real datasets. In this talk, we will discuss two recently developed distance-based methods, DICE-bar and DICE-star, for reconstructing single-cell tumor phylogenies from sCNA data. DICE-bar and DICE-star are orders of magnitude faster than many existing methods and, despite their simplicity, match or exceed the accuracies of existing methods based on far more complicated frameworks. We will also discuss the results of applying DICE-star, the most accurate method on error-prone datasets, to several real single-cell breast and ovarian cancer datasets. (TCPL 201) |
10:00 - 10:30 | Coffee Break (TCPL Foyer) |
10:30 - 10:50 |
Marc Hellmuth: Orthology in the Context of Phylogenetic Networks ↓ Orthologous genes, which arise through speciation, play a key role in comparative genomics and functional inference. In particular, graph-based methods allow for the inference of orthology estimates without prior knowledge of the underlying gene or species trees. This results in orthology graphs, where each vertex represents a gene, and an edge exists between two vertices if the corresponding genes are estimated to be orthologs. Orthology graphs inferred under a tree-like evolutionary model must be cographs. However, real-world data often deviate from this property, either due to noise in the data, errors in inference methods or, simply, because evolution follows a network-like rather than a tree-like process. The latter, in particular, raises the question of whether and how orthology graphs can be derived from or, equivalently, are explained by phylogenetic networks.
In this work, we study the constraints imposed on orthology graphs when the underlying evolutionary history follows a phylogenetic network instead of a tree. We show that any orthology graph can be represented by a sufficiently complex level-k network. However, such networks lack biologically meaningful constraints. In contrast, level-1 networks provide a simpler explanation, and we establish characterizations for level-1 explainable orthology graphs, i.e., those derived from level-1 evolutionary histories. To this end, we employ modular decomposition, a classical technique for studying graph structures. Specifically, an arbitrary graph is level-1 explainable if and only if each primitive subgraph is a near-cograph (a graph in which the removal of a single vertex results in a cograph). Additionally, we present a linear-time algorithm to recognize level-1 explainable orthology graphs and to construct a level-1 network that explains them, if such a network exists. (TCPL 201) |
10:50 - 11:10 |
Katharina Huber: Reconstructing Arboreal Networks ↓ The development of powerful strategies to help protect the planet's biodiversity will undoubtedly be informed by our understanding of the evolutionary history of plants and animals, amongst others. Usually expressed in the form of a phylogenetic tree, such a structure does however not always appropriately capture the complex evolutionary picture of a set of organisms, in that their past could have been influenced by non-treelike evolutionary processes such as horizontal gene transfer, hybridization, or introgression. In such cases, more general structures called phylogenetic networks have proven useful. Originally introduced as certain single-rooted directed acyclic graphs, phylogenetic networks have recently also attracted attention in multi-rooted form due to their relevance in introgression studies and also studies concerned with horizontal gene transfer between bacteria living in distinct ecological niches.
A major challenge in the context of phylogenetic networks is the development of methodologies and software tools for reconstructing them from biological data. In this talk, we address the question of what can be said in this context for arboreal networks that is, multi-rooted phylogenetic networks such that when ignoring their roots and directions the resulting graph is an unrooted phylogenetic tree.
Joint work with Darren Overman. (TCPL 201) |
11:10 - 11:30 |
Annachiara Korchmaros: Integrating Horizontal Gene Transfer Estimation in REvolutionH-tl ↓ Horizontal Gene Transfer (HGT), or lateral gene transfer, refers to the transfer of genetic material between unrelated organisms through mechanisms other than vertical descent. HGT plays a significant role in evolution, particularly in the microbial world, yet inferring HGT events remains challenging. Conventional methods to detect HGTs typically rely on comparing gene tree and species tree topologies, often using tree distance-based approaches to quantify discrepancies between them. Hellmuth and Stadler [1] proposed a pipeline to detect orthologous genes and generate topologies supporting gene and species trees without assuming any specific evolutionary model in duplication-loss-transfer scenarios. This pipeline has been incorporated into the tool REvolutionH-tl [3], effective when
evolutionary scenarios are free of HGT events. However, this raises the question of how to extend REvolutionH-tl to include the estimation of transfer events. Recent theoretical frameworks [2, 4] have been developed to refine the gene tree inferred by REvolutionH-tl without comparing it to a species tree, as is common in tree distance-based methods. In this talk, I will present these theoretical methods and discuss the current progress in integrating them into REvolutionH-tl to enhance its capability to estimate HGT events. (TCPL 201) |
11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
13:30 - 17:30 | Free Afternoon (Banff National Park) |
17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |
Thursday, August 28 | |
---|---|
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
09:00 - 09:40 |
Olivier Gascuel: - MINI-COURSE - Deep Learning from Phylogenies ↓ Our goal is to analyze the evolutionary dynamics that led to phylogenies obtained from sequences. We use a parametric framework and assume that the studied phylogeny is derived from a birth-death model. We then seek to estimate the parameters of this model. This approach is increasingly used in epidemiology to estimate critical parameters, such as R₀ (the average number of people infected by one person), and in macroevolution studies to understand how species have diversified over time. Depending on the model, some parameters may be unidentifiable. For most models, the likelihood of the input phylogeny is the solution to a complex, recursive system of ordinary differential equations (ODEs). In this context, likelihood-based methods are prone to numerical errors and convergence issues. In this presentation, I will explain an alternative approach in which we simulate phylogenies within a given model and learn to predict their parameters using deep learning. The challenge lies in finding a compact numerical representation of phylogenies that works best with the available neural network architectures. I will describe the two approaches* we have proposed in this context, both of which are based on one-dimensional convolutional networks. When the studied model admits a closed-form solution for the likelihood function (or an efficient solution for the ODEs that define the likelihood), the deep learning approach produces results similar to those obtained using likelihood-based methods. For other models, which constitute the vast majority, the deep learning approach avoids numerical errors and a lack of convergence, yielding superior results. This work demonstrates the strengths and pitfalls of deep learning methods, which, as in many other fields, offer an alternative to traditional statistical inference methods in phylogenetics.
* Voznica J., Zhukova A., Boskova V., Saulnier E., Lemoine F., Moslonka-Lefebvre M., Gascuel O. 2022. Deep learning from phylogenies to uncover the epidemiological dynamics of outbreaks. Nature Communications 13:3896. https://www.nature.com/articles/s41467-022-31511-0
* Perez M., Gascuel 0. 2024. PhyloCNN: Improving tree representation and neural network architecture for deep learning from trees in phylodynamics and diversification studies. BioRxiv https://doi.org/10.1101/2024.12.13.628187 (TCPL 201) |
09:40 - 10:00 |
Alessandra Carbone: Phylogenic spaces are not functional spaces ↓ The functional classification of proteins based solely on their sequences has become a major challenge in the face of the massive data accumulation in our databases. The wide diversity of homologous sequences often conceals a broad range of functional activities that are difficult to predict. Yet, identifying these functions is crucial for understanding the fundamental mechanisms driving the evolution of life and for advancing biotechnological applications. In this talk, I will present how a novel approach using AI to explore sequence space enables fine-grained functional classification of proteins. Several biological questions will be addressed through concrete examples, ranging from protein classification and functional annotation to the discovery of new functions, and large-scale mapping of functions in natural environments. I will also illustrate how functional space can diverge significantly from phylogenetic space, highlighting the importance of developing approaches that help bridge these two complementary dimensions of protein evolution. (TCPL 201) |
10:00 - 10:30 | Coffee Break (TCPL Foyer) |
10:30 - 10:50 |
Katherine St. John: Understanding Phylogenetic Search Spaces ↓ Trees are a canonical structure for representing evolutionary histories. Many popular criteria used to infer optimal trees are computationally hard, and the number of possible tree shapes grows super-exponentially in the number of taxa. The underlying structure of the spaces of trees yields rich insights that can improve the search for optimal trees, both in accuracy and running time, and the analysis and visualization of results. This talk will highlight some of the elegant questions that arise in these highly structured spaces. This talk assumes no background in biology and all are welcome. (TCPL 201) |
10:50 - 11:10 |
Megan Owen: Algorithms and Applications in BHV Treespace ↓ The Billera-Holmes-Vogtmann (BHV) treespace is a continuous metric space containing all phylogenetic trees with $n$ leaves and branch lengths. Furthermore, the BHV treespace is CAT(0), or non-positively curved, and thus shortest paths and means are unique, as in Euclidean space. We look at how these properties lead to fast algorithms for analyzing sets of trees, including detecting bias in phylogenetic inference. (TCPL 201) |
11:10 - 11:30 |
Vincent Moulton: Spaces of ranked tree-child networks ↓ Ranked tree-child networks are a recently introduced class of rooted phylogenetic networks in which the evolutionary events represented by the network are ordered so as to respect the flow of time. This class includes the well-studied ranked phylogenetic trees (also known as ranked genealogies). An important problem in phylogenetic analysis is to define distances between phylogenetic trees and networks in order to systematically compare them. Various distances have been defined on ranked binary phylogenetic trees, but very little is known about comparing ranked tree-child networks. In this talk, we introduce an approach to comparing binary ranked tree-child networks on the same leaf set that is based on a new encoding of such networks that is given in terms of a certain partially ordered set. This allows us to define two new spaces of ranked binary tree-child networks. The first space can be considered as a generalization of the recently introduced space of ranked binary phylogenetic trees whose distance is defined in terms of ranked nearest neighbor interchange moves. The second space is a continuous space that captures all equidistant tree-child networks and generalizes the space of ultrametric trees. As well as explaining how this allows us to define distances between ranked tree-child networks, we present some interesting open problems concerning the structure of these new network spaces.
This is joint work with Andreas Spillner, Merseburg University of Applied Sciences, Germany (TCPL 201) |
11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
13:20 - 13:40 |
Bertrand Marchand: Methods for comparing phylogenetic networks ↓ When evaluating reconstruction methods for evolutionary histories, a standard test consists in comparing the reconstruction to a simulated ground truth. One may also want to be able to extract a consensus, or dismiss outliers among several predictions on a given set of data. In any case, this requires the ability to compare evolutionary histories. While several well-established comparison methods exist for phylogenetic trees (e.g. triplet distance, Robinson-Foulds disance...), the situation is not as clear for phylogenetic networks. Specifically, the quest for metrics on phylogenetic networks that are (1) valid on as large a family of networks as possible, (2) fast to compute and (3) provide an intuitive understanding of how networks differ is still open. After giving an overview of existing methods for comparing phylogenetic networks, this talk will present recent developments towards a metric based on edge contractions and expansions, generalizing the original formulation of the Robinson-Foulds distance on trees to any pair of phylogenetic networks. The specific case of tree-based networks will also be discussed. (TCPL 201) |
13:40 - 14:00 |
Olivier Tremblay-Savard: Cherry-picking distances between phylogenetic networks ↓ Phylogenetic networks are increasingly being used to represent evolutionary relationships between species, as the reticulations they contain can represent more complex evolutionary events such as lateral gene transfer, hybridization and recombination. Being able to efficiently calculate distances between two phylogenetic networks is useful for the development of network construction methods, as it allows to measure the level of discrepancy between a network constructed using a new method and some other reference network (manually curated or built using different techniques). Inspired by the fact that cherry-picking sequences (sequences of cherry-picking operations which reduce simple or reticulated cherries by removing a leaf or a reticulation edge respectively) can be used to efficiently identify isomorphism between certain classes of networks that are called \textit{reconstructible} (i.e. they can be reconstructed from the same cherry-picking sequence used to reduce them), we recently defined four new distances based on cherry-picking. Three of these distance formulations were shown to be equivalent to the problem of finding a maximum agreement cherry-reduced subnetwork (MACRS), and we call the corresponding distance the \textit{cherry distance}. This talk will focus on briefly presenting the four different cherry-picking distance formulations, the MACRS problem, and the algorithms recently developed to solve the MACRS problem on rooted level-1 binary networks. The presentation will conclude with experimental results on simulated and real datasets highlighting some characteristics of the cherry distance. (TCPL 201) |
14:00 - 14:20 |
Michael Fuchs: Distributional results for the k-Robinson-Foulds distance of random Cayley trees ↓ Motivated by applications in bioinformatics, Khayatian et al. (2024) recently defined $k$-Robinson-Foulds distances for rooted and unrooted Cayley trees. Moreover, they presented simulation results for these distances for a randomly chosen pair of trees. In this talk, we present our results on limit laws of the $k$-Robinson-Foulds distance which explain these simulations. We mainly concentrate on the two extreme cases, namely, $k=0$ and $k=n-2$ which exhibit very different behavior. The talk is based on joint work with Mike Steel (University of Canterbury) and Cheng-Kai Yeh (undergraduate student at National Chengchi University). (TCPL 201) |
14:20 - 14:40 |
Simone Linz: Bounding the SNPR distance between two tree-child networks ↓ Agreement forests continue to play a central role in the comparison of phylogenetic trees since their introduction more than 25 years ago. In this talk, we introduce agreement digraphs, a concept that generalises agreement forests for two rooted phylogenetic trees to two rooted phylogenetic networks. Analogous to the way in which agreement forests compute the rooted subtree prune and regraft distance between two phylogenetic trees, we then discuss how agreement digraphs can be used to tightly bound the subnet prune and regraft distance between two tree-child networks from above and below. (Joint work with Steven Kelk and Charles Semple.) (TCPL 201) |
14:40 - 15:00 |
Joan Carles Pons: Describing encodings and computing consensus for phylogenetic networks ↓ An important and extensively studied problem in phylogenetics is the computation of consensus trees, which aims to summarize the common features shared by a collection of rooted phylogenetic trees. In contrast, consensus methods for phylogenetic networks remain far less explored. One way to address the consensus problem is by employing encodings of the input trees (or networks, in our case). In this work, we present encoding strategies for phylogenetic networks: for tree-child networks, we use their mu-representation; for normal networks, we rely on their collection of clusters. Based on these encodings, we propose a method to compute a consensus network for normal networks. In contrast, we discuss the limitations encountered when attempting to extend the corresponding approach to tree-child networks.
This work is part of a collaborative effort with the student Toni Fuentes, Katherina Huber, Vincent Moulton, and Joan Carles Pons. (TCPL 201) |
15:00 - 15:30 | Coffee Break (TCPL Foyer) |
15:30 - 16:30 | Open problems and collaborative work (TCPL 201) |
16:30 - 17:30 | Poster session (TCPL 201) |
17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in Vistas Dining Room, top floor of the Sally Borden Building. (Vistas Dining Room) |
Friday, August 29 | |
---|---|
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
09:00 - 09:20 |
Celine Scornavacca: Rehabilitating the benefits of gene tree correction in presence of incomplete lineage sorting ↓ Gene trees are essential components in many downstream phylogenomics analyses. However, their reconstruction often relies on limited-length sequences and may not account for complex evolutionary events, such as gene duplications and losses or incomplete lineage sorting (ILS), which are not modeled by standard phylogenetic methods. To address these challenges, a common approach involves first estimating gene trees using conventional models, then refining them through correction methods that incorporate information about evolutionary events—an approach known as species tree-aware phylogenetic reconstruction. In Yan et al. (2021), the authors argue against applying gene tree correction methods in the presence of ILS, suggesting that such corrections can lead to overfitting and force gene trees to resemble the species tree, thereby obscuring genuine gene-level variation caused by ILS. We reprocessed the same dataset and demonstrated that, when applied carefully, correction methods can offer significant benefits, even in the presence of ILS. (TCPL 201) |
09:20 - 09:40 |
Cédric Chauve: Gene tree/species tree reconciliation in the reconstruction of ancestral gene orders ↓ The Small Parsimony Problem (SPP) aims to reconstruct ancestral gene orders given exant gene orders for species whose phylogeny is given. Most methods to solve the SPP in the context where the full gene content of the considered genomes, as opposed to considering only single-copy orthologous genes, require an intermediate step of gene trees/species tree reconciliation to infer the gene content of ancestral species. In this talk, I will describe such a method, developed with Daniel Doerr, describe its application on real data and discuss the impact of the reconciled gene trees on the outcome. (TCPL 201) |
09:40 - 10:00 |
Peter Stadler: Unbiased anchors for reliable genome-wide synteny detection ↓ Orthology inference lies at the foundation of comparative genomics research. The conservation of gene order, i.e. synteny, can be used in conjunction with sequence similarity to disambuiguate multiple matches
arising from duplications and help identify genome rearrangements. Current approaches for synteny detection, however, rely on genome annotations and are therefore limited. An annotation-free approach relies on identifying "sufficiently unique" sequence intervals which serve as anchor candidates. Matches of such anchors are guaranteed to be 1-1 orthologs. We find that our approach works better in closely related genomes whereas there is a better performance with annotations for more distantly related genomes. Overall, the presented algorithm offers a useful alternative to annotation-based methods and can outperform them in many cases. Syntenyanchors can also be utilized for homology-based scaffolding in sequence assembly, and outperforms competing methods. (TCPL 201) |
10:00 - 10:30 | Coffee Break (TCPL Foyer) |
10:30 - 11:00 |
Checkout by 11AM ↓ 5-day workshop participants are welcome to use BIRS facilities (TCPL ) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 11AM. (Front Desk - Professional Development Centre) |
11:00 - 12:00 | Summary of the workshop and open discussion. (TCPL 201) |
12:00 - 13:30 | Lunch from 11:30 to 13:30 (Vistas Dining Room) |