An AI-Accelerated Experimental Physics Loop

With an actual demo on real quantum hardware

May 18, 2026

Out of all the potential applications for AI, I am most interested in how it could be used to accelerate physics. Most of what I have seen so far has been focused on either new materials synthesis or theoretical calculations. Both of these are very cool, but they are primarily in the realms of simulation and theory. Ultimately, the arbiter of physical truth is experiment, so a truly useful AI physicist must be able to interact with experimental reality, not just simulate it.

This is obviously a huge question, but in this post, I’ll outline a concrete proposal for how AI could meaningfully accelerate experimental physics using my own field — quantum measurement — as a template. This is a first pass idea based on my own PhD work, so comments are welcome.

Why Quantum Measurement

Quantum measurement is the field concerned with making precision measurements at the smallest scales. Examples of this include discovering faint gravitational waves from the depths of the universe to using diamonds to image single-cell dynamics. Quantum mechanics dictates the fundamental and inescapable limits to our measurement precision, so the best measurements by default will always be quantum measurements. Improve quantum measurement, and you improve our ability to sense the world and generate new physical data. That new data in turn can feed back into the models to improve their performance.

An AI-accelerated Discovery Loop

A natural target for AI-driven discovery is the search for new optimal quantum measurement protocols, which would significantly accelerate precision measurement in contexts like dark matter searches or next-generation quantum sensor development.

A given measurement sensor, such as a superconducting quantum interference device (SQUID) or a Josephson parametric amplifier (JPA), must be operated with a particular measurement protocol consisting of parameters like bias points, coupling schemes, readout chains, and feedback strategies. This protocol space is combinatorially vast, and is currently navigated by intuition based on known principles like the standard quantum limit on added amplifier noise. However, with the advent of next-generation quantum technology, it is increasingly clear that protocols which use quantum effects such as backaction evasion and squeezed states can outperform traditionally optimal strategies in sensitivity. While the space of protocols is vast, it is richly constrained (by quantum mechanics, thermodynamics, and information theory) and, crucially, formally specifiable. This makes it a good target for AI-based discovery.

Current methods are limited because “optimal protocols” are derived for idealised models that neglect real-world imperfections like loss, noise, and nonlinearities. Simulations help to some extent but even they become prohibitively expensive when considering all possible real-world non-idealities. These protocols are then tuned by hand in the lab following the above intuition-based process. While the theorist works to find the optimum for a system that doesn’t exist, the experimentalist uses that optimum as a first-order guess to perform a highly limited search of the vast parameter space of the actual system. The result is that the optimal protocol for the real system is usually not found.

My own PhD work is an example of this. A seemingly simple choice like SQUID bias point changes the gain, imprecision, backaction, dynamic resistance, and stability of the measurement, profoundly affecting experiments like the Dark Matter Radio. The “best” operating point is therefore not simply the maximum-gain point, but a compromise across several experimentally measured quantities, which is a priori quite difficult to reason about — especially when you have to consider real-world effects. Measuring those quantities, and understanding their tradeoffs, is what my PhD is about.

In general, verification is hard because it is difficult to check a proposed protocol in simulation since this often leaves out real-world imperfections. On the other hand, real experimental runs are a scarce resource and cannot be used to validate every single proposed measurement protocol. The core difficulty is that the only ground truth is hardware performance, but hardware time is orders of magnitude too scarce to validate the number of candidates needed for a meaningful search.

With AI, however, the proposal-validation loop can be made tractable at scale. AI can be used to propose candidate measurement protocols, searching not only over continuous parameters like bias points, drive powers, and coupling strengths within known measurement schemes, but also over qualitatively distinct measurement strategies that a physicist would be unlikely to explore by hand (e.g. different readout architectures, feedback topologies, and drive schemes). These proposals are then validated through three levels:

Level 1: fast analytical validation against simple linearised input-output models and physical constraints like the Heisenberg uncertainty principle and avoiding unstable bias points. This validation is fast and will remove the vast majority of proposed candidates that are unphysical.
Level 2: Simulation of the top 1% of proposals from level 1. This selection should be done based on the key figures of merit, such as signal-to-noise ratio, but also:
1. Robustness to parameter uncertainty: if a protocol is outstanding at nominal values but collapses with a realistic 1% deviation, that is suboptimal compared to one that has a slightly lower headline figure of merit but is stable with deviation.
2. Robustness to model uncertainty: if a loss source is 2x worse in real life than anticipated, and this destroys performance for protocol A but leaves it unchanged for protocol B, then all else equal we favour protocol B. In general, protocols less sensitive to model uncertainty are more likely to survive implementation in real hardware.
3. Operational feasibility: operating parameters must be realistic. For example, the drive current cannot be so high such that the dilution refrigerator itself heats up, nor can the feedback run at a frequency that electronics in the lab cannot support.
4. Distinctiveness: The first three criteria score individual protocols. The final selection of which to promote to Level 3 additionally considers distinctiveness across the selected set, using Bayesian experimental design to ensure the portfolio of hardware tests maximally discriminates between competing device models.
Level 3: Experimental validation on hardware for the proposed candidates that survived level 2. This step is the most intensive and expensive because it will require detailed diagnostic studies involving real data like power spectra or S-parameters to determine the best actual protocols. In particular, the results of these experiments will also be used to do Bayesian updates on the models in level 2 to improve simulation capacity and to refine the fast analytical scoring at Level 1. For instance, if hardware reveals a systematic loss channel that shifts optimal operating points, this can be incorporated as a correction to the linearised models so that Level 1 passes fewer false positives and kills fewer good candidates in future iterations.

The crucial AI contribution, beyond just proposing candidates, is to learn which proposals are able to get through the levels of screening. By comparing how far the actual hardware performance of the best candidates deviates from the proposed/simulated performance, AI can be used to infer the effects of actual non-idealities that were not anticipated initially. This, combined with Bayesian design to update the simulation models, leads to an AI-accelerated discovery loop where better models lead to better proposals which in turn lead to fewer wasted experiments and convergence on superior measurement protocols. The result is a route to optimal protocols for real-world hardware: enhanced precision measurements and more sensitive probes of weak physical signals.

A schematic of the proposed AI-hardware loop for quantum devices/measurement

A First Demo

This is a fairly ambitious vision, but I managed to get a first-pass version working in the lab. I used Bayesian optimisation and transfer learning to tune a real DC SQUID amplifier coupled to a superconducting resonator for a dark-matter haloscope-style measurement. This optimisation searched over both continuous flux bias and discrete amplifier input-impedance settings, aiming to find the configuration that maximised a key metric for dark matter searches called sensitivity bandwidth ∆ν_s.

Sensitivity bandwidth is a nontrivial function of many amplifier quantities, including bias point, intrinsic amplifier noise, input impedance, feedback etc. This makes the optimum difficult to find manually, which in turn makes it exactly the type of problem where AI can find useful patterns in existing experimental data to improve performance.

Number of evaluations required to find the best operating point in the demo study.

In a retrospective study on already-acquired data, BO and transfer learning found the best operating points substantially faster than random search, and showed how prior measurement campaigns can be reused to accelerate future tuning. You can read the technical write-up for full details.

Although this was retrospective, it was a nice demo that isolated and validated the search efficiency. The natural next step is the live version, with the optimiser proposing points in real time while the fridge is cold.

Beyond Quantum Sensors

While the proposal laid out above is nominally for superconducting quantum sensors, it can be extended in principle to any experimental platform. For example, it could accelerate qubit design, calibration, and control not only in superconducting hardware but other modalities too such as atoms and ion traps. The key principle is the creation of a tight AI-experiment feedback loop that gives AI access to real data generated in hardware and allows it to learn from it. Ultimately, the key bottleneck is access to domain experts who are comfortable enough with AI infrastructure to set up learning pipelines and competent with the hardware to generate data to train on. I think this is one of the most interesting frontiers for AI: not just helping us simulate the world, but helping us build better experiments to interrogate it.

Trapped Flux

Discussion about this post

Ready for more?