publications | John J. Vastola

2025

Generalization through variance: how noise shapes inductive biases in diffusion models

John J. Vastola

In The Thirteenth International Conference on Learning Representations 2025

Abs HTML PDF

How diffusion models generalize beyond their training set is not known, and is somewhat mysterious given two facts: the optimum of the denoising score matching (DSM) objective usually used to train diffusion models is the score function of the training distribution; and the networks usually used to learn the score function are expressive enough to learn this score to high accuracy. We claim that a certain feature of the DSM objective—the fact that its target is not the training distribution’s score, but a noisy quantity only equal to it in expectation—strongly impacts whether and to what extent diffusion models generalize. In this paper, we develop a mathematical theory that partly explains this ’generalization through variance’ phenomenon. Our theoretical analysis exploits a physics-inspired path integral approach to compute the distributions typically learned by a few paradigmatic under- and overparameterized diffusion models. We find that the distributions diffusion models effectively learn to sample from resemble their training distributions, but with ‘gaps’ filled in, and that this inductive bias is due to the covariance structure of the noisy target used during training. We also characterize how this inductive bias interacts with feature-related inductive biases.
Dynamical symmetries in the fluctuation-driven regime: an application of Noether’s theorem to noisy dynamical systems

John J. Vastola

In Proceedings of the 3rd NeurIPS Workshop on Symmetry and Geometry in Neural Representations 2025

Abs HTML PDF

Noether’s theorem provides a powerful link between continuous symmetries and conserved quantities for systems governed by some variational principle. Perhaps unfortunately, most dynamical systems of interest in neuroscience and artificial intelligence cannot be described by any such principle. On the other hand, nonequilibrium physics provides a variational principle that describes how fairly generic noisy dynamical systems are most likely to transition between two states; in this work, we exploit this principle to apply Noether’s theorem, and hence learn about how the continuous symmetries of dynamical systems constrain their most likely trajectories. We identify analogues of the conservation of energy, momentum, and angular momentum, and briefly discuss examples of each in the context of models of decision-making, recurrent neural networks, and diffusion generative models.

2024

The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications

Binxu Wang, and John J. Vastola

Transactions on Machine Learning Research 2024

Abs HTML PDF

Diffusion models have achieved remarkable results in multiple domains of generative modeling. By learning the gradient of smoothed data distributions, they can iteratively generate samples from complex distributions, e.g., of natural images. The learned score function enables their generalization capabilities, but how the learned score relates to the score of the underlying data manifold remains largely unclear. Here, we aim to elucidate this relationship by comparing the learned scores of neural-network-based models to the scores of two kinds of analytically tractable distributions: Gaussians and Gaussian mixtures. The simplicity of the Gaussian model makes it particularly attractive from a theoretical point of view, and we show that it admits a closed-form solution and predicts many qualitative aspects of sample generation dynamics. We claim that the learned neural score is dominated by its linear (Gaussian) approximation for moderate to high noise scales, and supply both theoretical and empirical arguments to support this claim. Moreover, the Gaussian approximation empirically works for a larger range of noise scales than naive theory suggests it should, and is preferentially learned by networks early in training. At smaller noise scales, we observe that learned scores are better described by a coarse-grained (Gaussian mixture) approximation of training data than by the score of the training distribution, a finding consistent with generalization. Our findings enable us to precisely predict the initial phase of trained models’ sampling trajectories through their Gaussian approximations. We show that this allows one to leverage the Gaussian analytical solution to skip the first 15-30% of sampling steps while maintaining high sample quality (with a near state-of-the-art FID score of 1.93 on CIFAR-10 unconditional generation). This forms the foundation of a novel hybrid sampling method, termed \textitanalytical teleportation, which can seamlessly integrate with and accelerate existing samplers, including DPM-Solver-v3 and UniPC. Our findings strengthen the field’s theoretical understanding of how diffusion models work and suggest ways to improve the design and training of diffusion models.
Optimal packing of attractor states in neural representations

John J. Vastola

In Proceedings of the 2nd NeurIPS Workshop on Symmetry and Geometry in Neural Representations 2024

Abs HTML PDF

Animals’ internal states reflect variables like their position in space, orientation, decisions, and motor actions—but how should these internal states be arranged? Internal states which frequently transition between one another should be close enough that transitions can happen quickly, but not so close that neural noise significantly impacts the stability of those states, and how reliably they can be encoded and decoded. In this paper, we study the problem of striking a balance between these two concerns, which we call an ‘optimal packing’ problem since it resembles mathematical problems like sphere packing. While this problem is generally extremely difficult, we show that symmetries in environmental transition statistics imply certain symmetries of the optimal neural representations, which allows us in some cases to exactly solve for the optimal state arrangement. We focus on two toy cases: uniform transition statistics, and cyclic transition statistics.

2023

Studying stochastic systems biology of the cell with single-cell genomics data

Gennady Gorin, John J. Vastola, and Lior Pachter

Cell Systems Oct 2023

Abs HTML PDF

Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
Causal inference during closed-loop navigation: parsing of self- and object-motion

Jean-Paul Noel, Johannes Bill, Haoran Ding, John J. Vastola, and 3 more authors

Philosophical Transactions of the Royal Society B: Biological Sciences Oct 2023

Abs HTML PDF

A key computation in building adaptive internal models of the external world is to ascribe sensory signals to their likely cause(s), a process of causal inference (CI). CI is well studied within the framework of two-alternative forced-choice tasks, but less well understood within the cadre of naturalistic action–perception loops. Here, we examine the process of disambiguating retinal motion caused by self- and/or object-motion during closed-loop navigation. First, we derive a normative account specifying how observers ought to intercept hidden and moving targets given their belief about (i) whether retinal motion was caused by the target moving, and (ii) if so, with what velocity. Next, in line with the modelling results, we show that humans report targets as stationary and steer towards their initial rather than final position more often when they are themselves moving, suggesting a putative misattribution of object-motion to the self. Further, we predict that observers should misattribute retinal motion more often: (i) during passive rather than active self-motion (given the lack of an efference copy informing self-motion estimates in the former), and (ii) when targets are presented eccentrically rather than centrally (given that lateral self-motion flow vectors are larger at eccentric locations during forward self-motion). Results support both of these predictions. Lastly, analysis of eye movements show that, while initial saccades toward targets were largely accurate regardless of the self-motion condition, subsequent gaze pursuit was modulated by target velocity during object-only motion, but not during concurrent object- and self-motion. These results demonstrate CI within action–perception loops, and suggest a protracted temporal unfolding of the computations characterizing CI. This article is part of the theme issue ‘Decision and control processes in multisensory perception’.

2022

Interpretable and tractable models of transcriptional noise for the rational design of single-molecule quantification experiments

Gennady Gorin*, John J. Vastola*, Meichen Fang, and Lior Pachter

Nature Communications Dec 2022

Abs HTML PDF

The question of how cell-to-cell differences in transcription rate affect RNA count distributions is fundamental for understanding biological processes underlying transcription. Answering this question requires quantitative models that are both interpretable (describing concrete biophysical phenomena) and tractable (amenable to mathematical analysis). This enables the identification of experiments which best discriminate between competing hypotheses. As a proof of principle, we introduce a simple but flexible class of models involving a continuous stochastic transcription rate driving a discrete RNA transcription and splicing process, and compare and contrast two biologically plausible hypotheses about transcription rate variation. One assumes variation is due to DNA experiencing mechanical strain, while the other assumes it is due to regulator number fluctuations. We introduce a framework for numerically and analytically studying such models, and apply Bayesian model selection to identify candidate genes that show signatures of each model in single-cell transcriptomic data from mouse glutamatergic neurons.
Is the information geometry of probabilistic population codes learnable?

John J. Vastola, Zach Cohen, and Jan Drugowitsch

In NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations Dec 2022

Abs HTML PDF

One reason learning the geometry of latent neural manifolds from neural activity data is difficult is that the ground truth is generally not known, which can make manifold learning methods hard to evaluate. Probabilistic population codes (PPCs), a class of biologically plausible and self-consistent models of neural populations that encode parametric probability distributions, may offer a theoretical setting where it is possible to rigorously study manifold learning. It is natural to define the neural manifold of a PPC as the statistical manifold of the encoded distribution, and we derive a mathematical result that the information geometry of the statistical manifold is directly related to measurable covariance matrices. This suggests a simple but rigorously justified decoding strategy based on principal component analysis, which we illustrate using an analytically tractable PPC.

2021

Solving the chemical master equation for monomolecular reaction systems and beyond: a Doi-Peliti path integral view

John J. Vastola

Journal of Mathematical Biology Oct 2021

Abs HTML PDF

The chemical master equation (CME) is a fundamental description of interacting molecules commonly used to model chemical kinetics and noisy gene regulatory networks. Exact time-dependent solutions of the CME—which typically consists of infinitely many coupled differential equations—are rare, and are valuable for numerical benchmarking and getting intuition for the behavior of more complicated systems. Jahnke and Huisinga’s landmark calculation of the exact time-dependent solution of the CME for monomolecular reaction systems is one of the most general analytic results known; however, it is hard to generalize, because it relies crucially on special properties of monomolecular reactions. In this paper, we rederive Jahnke and Huisinga’s result on the time-dependent probability distribution and moments of monomolecular reaction systems using the Doi-Peliti path integral approach, which reduces solving the CME to evaluating many integrals. While the Doi-Peliti approach is less intuitive, it is also more mechanical, and hence easier to generalize. To illustrate how the Doi-Peliti approach can go beyond the method of Jahnke and Huisinga, we also find an explicit and exact time-dependent solution to a problem involving an autocatalytic reaction that Jahnke and Huisinga identified as not solvable using their method. Most interestingly, we are able to find a formal exact time-dependent solution for any CME whose list of reactions involves only zero and first order reactions, which may be the most general result currently known. This formal solution also yields a useful algorithm for efficiently computing numerical solutions to CMEs of this type.

2020

Chemical Langevin equation: A path-integral view of Gillespie’s derivation

John J. Vastola, and William R. Holmes

Phys. Rev. E Mar 2020

Abs HTML PDF

In 2000, Gillespie rehabilitated the chemical Langevin equation (CLE) by describing two conditions that must be satisfied for it to yield a valid approximation of the chemical master equation (CME). In this work, we construct an original path-integral description of the CME and show how applying Gillespie’s two conditions to it directly leads to a path-integral equivalent to the CLE. We compare this approach to the path-integral equivalent of a large system size derivation and show that they are qualitatively different. In particular, both approaches involve converting many sums into many integrals, and the difference between the two methods is essentially the difference between using the Euler-Maclaurin formula and using Riemann sums. Our results shed light on how path integrals can be used to conceptualize coarse-graining biochemical systems and are readily generalizable.