# publications

(*) denotes equal contribution

## 2023

- Studying stochastic systems biology of the cell with single-cell genomics dataGennady Gorin,
*John J. Vastola*, and Lior Pachter*Cell Systems*Oct 2023Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.

- Causal inference during closed-loop navigation: parsing of self- and object-motionJean-Paul Noel, Johannes Bill, Haoran Ding,
*John J. Vastola*, and 3 more authors*Philosophical Transactions of the Royal Society B: Biological Sciences*Oct 2023A key computation in building adaptive internal models of the external world is to ascribe sensory signals to their likely cause(s), a process of causal inference (CI). CI is well studied within the framework of two-alternative forced-choice tasks, but less well understood within the cadre of naturalistic action–perception loops. Here, we examine the process of disambiguating retinal motion caused by self- and/or object-motion during closed-loop navigation. First, we derive a normative account specifying how observers ought to intercept hidden and moving targets given their belief about (i) whether retinal motion was caused by the target moving, and (ii) if so, with what velocity. Next, in line with the modelling results, we show that humans report targets as stationary and steer towards their initial rather than final position more often when they are themselves moving, suggesting a putative misattribution of object-motion to the self. Further, we predict that observers should misattribute retinal motion more often: (i) during passive rather than active self-motion (given the lack of an efference copy informing self-motion estimates in the former), and (ii) when targets are presented eccentrically rather than centrally (given that lateral self-motion flow vectors are larger at eccentric locations during forward self-motion). Results support both of these predictions. Lastly, analysis of eye movements show that, while initial saccades toward targets were largely accurate regardless of the self-motion condition, subsequent gaze pursuit was modulated by target velocity during object-only motion, but not during concurrent object- and self-motion. These results demonstrate CI within action–perception loops, and suggest a protracted temporal unfolding of the computations characterizing CI. This article is part of the theme issue ‘Decision and control processes in multisensory perception’.

## 2022

- Interpretable and tractable models of transcriptional noise for the rational design of single-molecule quantification experimentsGennady Gorin*,
*John J. Vastola**, Meichen Fang, and Lior Pachter*Nature Communications*Dec 2022The question of how cell-to-cell differences in transcription rate affect RNA count distributions is fundamental for understanding biological processes underlying transcription. Answering this question requires quantitative models that are both interpretable (describing concrete biophysical phenomena) and tractable (amenable to mathematical analysis). This enables the identification of experiments which best discriminate between competing hypotheses. As a proof of principle, we introduce a simple but flexible class of models involving a continuous stochastic transcription rate driving a discrete RNA transcription and splicing process, and compare and contrast two biologically plausible hypotheses about transcription rate variation. One assumes variation is due to DNA experiencing mechanical strain, while the other assumes it is due to regulator number fluctuations. We introduce a framework for numerically and analytically studying such models, and apply Bayesian model selection to identify candidate genes that show signatures of each model in single-cell transcriptomic data from mouse glutamatergic neurons.

- Is the information geometry of probabilistic population codes learnable?
*John J. Vastola*, Zach Cohen, and Jan Drugowitsch*In NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations*Dec 2022One reason learning the geometry of latent neural manifolds from neural activity data is difficult is that the ground truth is generally not known, which can make manifold learning methods hard to evaluate. Probabilistic population codes (PPCs), a class of biologically plausible and self-consistent models of neural populations that encode parametric probability distributions, may offer a theoretical setting where it is possible to rigorously study manifold learning. It is natural to define the neural manifold of a PPC as the statistical manifold of the encoded distribution, and we derive a mathematical result that the information geometry of the statistical manifold is directly related to measurable covariance matrices. This suggests a simple but rigorously justified decoding strategy based on principal component analysis, which we illustrate using an analytically tractable PPC.

## 2021

- Solving the chemical master equation for monomolecular reaction systems and beyond: a Doi-Peliti path integral view
*John J. Vastola**Journal of Mathematical Biology*Oct 2021The chemical master equation (CME) is a fundamental description of interacting molecules commonly used to model chemical kinetics and noisy gene regulatory networks. Exact time-dependent solutions of the CME—which typically consists of infinitely many coupled differential equations—are rare, and are valuable for numerical benchmarking and getting intuition for the behavior of more complicated systems. Jahnke and Huisinga’s landmark calculation of the exact time-dependent solution of the CME for monomolecular reaction systems is one of the most general analytic results known; however, it is hard to generalize, because it relies crucially on special properties of monomolecular reactions. In this paper, we rederive Jahnke and Huisinga’s result on the time-dependent probability distribution and moments of monomolecular reaction systems using the Doi-Peliti path integral approach, which reduces solving the CME to evaluating many integrals. While the Doi-Peliti approach is less intuitive, it is also more mechanical, and hence easier to generalize. To illustrate how the Doi-Peliti approach can go beyond the method of Jahnke and Huisinga, we also find an explicit and exact time-dependent solution to a problem involving an autocatalytic reaction that Jahnke and Huisinga identified as not solvable using their method. Most interestingly, we are able to find a formal exact time-dependent solution for any CME whose list of reactions involves only zero and first order reactions, which may be the most general result currently known. This formal solution also yields a useful algorithm for efficiently computing numerical solutions to CMEs of this type.

## 2020

- Chemical Langevin equation: A path-integral view of Gillespie’s derivation
*John J. Vastola*, and William R. Holmes*Phys. Rev. E*Mar 2020In 2000, Gillespie rehabilitated the chemical Langevin equation (CLE) by describing two conditions that must be satisfied for it to yield a valid approximation of the chemical master equation (CME). In this work, we construct an original path-integral description of the CME and show how applying Gillespie’s two conditions to it directly leads to a path-integral equivalent to the CLE. We compare this approach to the path-integral equivalent of a large system size derivation and show that they are qualitatively different. In particular, both approaches involve converting many sums into many integrals, and the difference between the two methods is essentially the difference between using the Euler-Maclaurin formula and using Riemann sums. Our results shed light on how path integrals can be used to conceptualize coarse-graining biochemical systems and are readily generalizable.