Overview
In this exciting PhD project, you pioneer neuro‑symbolic methods that retain the mechanistic grounding of classical phylogenetics and integrate the representational richness of genomic large language models (gLLMs).
Your role
Genomic sequences are modeled as evolving along binary phylogenetic trees through stochastic string‑valued substitution and insertion‑deletion (indel) processes. Given a set of present‑day sequences, classical inference problems in phylogenetics are: (i) homology inference, (ii) tree inference, and (iii) ancestral sequence reconstruction. A central focus of our recent work has been to develop fast frequentist indel‑aware approaches to these problems. For tractability, the models in most cases must assume that residues evolve independently across sites. In reality, mutation probabilities are influenced by sequence context, including position‑specific structural and functional constraints. Recent years have seen convergence of computational biology and data‑driven methods, leading to genomic large language models (gLLMs) that model sequence context dependences.
Building on our previous work, the aim is to develop neuro‑symbolic methods that retain mechanistic grounding of classical phylogenetics and integrate the representational richness of gLLMs. As a PhD student you will devise mutation models, develop inference algorithms, implement them in our Rust code‑base, and evaluate the methods by simulation and on real data.
Selected relevant articles
Maiolo M, Zhang X, Gil M, Anisimova M. "Progressive multiple sequence alignment with indel evolution" BMC Bioinformatics. 2018. 19(1):331. doi:10.1186/s12859-018-2357-1.
Peerska J, Gil M, Anisimova M. "Joint alignment and tree inference" bioRxiv, 2021. doi:10.1101/2021.09.28.462230.
Jowkar G, Peerska J, Maiolo M, Gil M, Anisimova M. "ARPIP: Ancestral sequence Reconstruction with insertions and deletions under the Poisson Indel Process" Systematic biology. 2022. syac050-syac050. doi:10.1093/sysbio/syac050.
Iglhaut C, Peerska J, Gil M, Anisimova M. "Please Mind the Gap: Indel‑Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels" Molecular Biology and Evolution. 2024. 41(7):msae109. doi:10.1093/molbev/msae109.
Your profile
You should have a MSc in Computer Science, Computational Science, Computational Biology, Statistics / Applied Mathematics, or a related quantitative field, with a strong background in:
Algorithms, particularly combinatorial optimization
Stochastic modelling
Computational inferential statistics
Programming, ideally in Rust and/or C++
Knowledge of phylogenetics, and/or an understanding of neural networks is an advantage.
What you can expect
We offer working conditions and terms of employment commensurate with higher education institutions and actively promote personal development for staff in leadership and non‑leadership positions. A detailed description of advantages and benefits can be found at Working at the ZHAW.
Legal and diversity statement
ZHAW is committed to gender‑mixed and diverse teams in order to promote equality, diversity and innovation.
#J-18808-Ljbffr