Quantum deep reinforcement learning for clinical decision support in oncology: application to adaptive radiotherapy | Scientific Reports – Nature.com
Quantum deep reinforcement learning
Quantum deep reinforcement learning is a novel action value-based decision-making framework derived from QRL23 and deep q-learning10 framework. Like conventional RL9,31, our qDRL based CDSS framework is comprised of 5 main elements: clinical AI agent, ARTE, radiation dose decision-making policy, reward, and q-value function. Here, the AI agent is a clinical decision-maker that learns to make dose decisions for achieving clinically desirable outcomes within the ARTE. The learning takes place by the agent-environment interaction, which can be sequentially ordered as: the AI decides on a dose and executes it, and in response, a patient (part of the ARTE) transits from one state to the next. Each transition provides the AI with feedback for its decision in terms of RT outcome and associated reward value. The goal of RL is for the AI to learn a decision-making policy that maximizes the reward in the long run, defined in terms of a specified q-value function that assigns a value to every state-dose-decision pair obtained from the accumulation of rewards over time (returns).
Assuming Markovs property (i.e., an environments response at time (t + 1) depends only on the state and dose-decision at time (t)), the qDRL task can be mathematically described as a 5-tuple ((S, left| D rightrangle , TF, P, R)), where (S) is a finite set of patients states, (left| D rightrangle) is a superimposed quantum state representing the finite set of eigen-dose decision, (TF:S times D to S^{prime }) is the transition function that maps patients state (s_{t}) and eigen-dose (left| d rightrangle_{t}) to the next state (s_{t + 1}), (P_{LC|RP2} :S^{prime } to left[ {0,1} right]) is the RT outcome estimator that assigns probability values (p_{LC}) and (p_{RP2}) to the state (s_{t + 1}), and (R:left[ {0,1} right] times left[ {0,1} right] to {mathbb{R}}) is the reward function that assigns a reward (r_{t + 1}) to the state-decision pair (left( {s_{t} ,left| d rightrangle_{t} } right)) based on the outcome probability estimates.
Eigen-dose (left| d rightrangle) is a physically performable decision that is selected via quantum methods from the superimposed quantum state (left| D rightrangle) which simultaneously represents all possible eigen-doses at once. In simple words, (left| D rightrangle) is the collection of all possible dose options and (left| d rightrangle) is one of those options which is selected after a decision is made. Selecting dose decision (left| d rightrangle) is carried out in two steps: (1) amplifying the optimal eigen-dose (left| d rightrangle^{*}) from the superimposed state (left| D rightrangle) (i.e., (left| D rightrangle^{prime } = widehat{Amp}_{{left| d rightrangle^{*} }} left| D rightrangle)) and (2) measuring the amplified state (i.e., (left| d rightrangle = widehat{Measure}(left| {D^{prime } } rightrangle )).
The optimal eigen-dose (left| d rightrangle^{*}) is obtained from deep Q-net, which is the AIs memory. Deep Q-net, (DQN:S to {mathbb{R}}^{d}), is a neural network that takes patients state as input and then outputs q-value for each eigen-dose ((left{ {q_{left| d rightrangle } } right})). The optimal dose is then selected following greedy policy where the dose with the maximum q-value is selected (i.e., (left| d rightrangle^{*} = begin{array}{*{20}c} {argmax} \ {left| {d^{prime } } rightrangle } \ end{array} { q_{left| d rightrangle } })). We have applied a double Q-learning 32 algorithm in training the deep Q-net. The schematic of a training cycle is presented in Fig.2 and additional technical details are presented in the Supplementary Material.
We initially employed Grovers amplification procedure33,34 for the decision selection mechanism. While Grovers procedure works on a quantum simulator, it fails to correctly work in a quantum computer. The quantum circuit depth of Grovers procedure (for 4 or higher qubits) is much greater than the coherence length of the current quantum processor35. Whenever the quantum circuit length exceeds the coherence length, quantum state becomes significantly affected by the system noise and loses vital information. Therefore, we designed a quantum controller circuit that is shorter than the coherence length and is suitable for the task of decision selection. The merit of our design is its fixed length; since its length is fixed for any number of qubits, it is suitable for higher qubit systems, as much as permitted by the circuit width. Technical details regarding its implementation in quantum processor is presented in the Supplementary Materials.
An example of a controller circuit is given in Fig.5. Controller circuits use twice the number of qubits (n), which can be divided into control and main. Optimal eigen-states obtained from deep Q-net are created in the control by selecting the appropriate pre-control gates. Then the control is entangled with the qubits from the main via controlled NOT (CNOT) gates. CNOT gates are connected between a control qubit from the control and a target qubit from the main. CNOT gates flip the target qubit from (left| 1 rightrangle) to (left| 0 rightrangle) only when the control is in (left| 1 rightrangle) state and does not perform any operation otherwise. Because all the main qubits are prepared in (left| 0 rightrangle) state, we introduced the reverse gates (n X-gates in parallel) to flip them to (left| 1 rightrangle). X-gates flip (left| 0 rightrangle) to (left| 1 rightrangle), and vice-versa. The CNOT flips all the qubits whose controls are in (left| 1 rightrangle) state, creating a state that is element-wise opposite to the marked state. Finally, another set of reverse gates is applied to the main before making a measurement.
Quantum controller circuit for a 5 qubit (32 bit) system. (a) Quantum controller circuit for the selection of the state (left| {10101} rightrangle). The probability distribution corresponding to (b) failed Grovers amplification procedure for one iteration run in the 5-qubit IBMQ Santiago quantum processor and (c) successful quantum controller selection run in the 15-qubit IBMQ Melbourne quantum processor.
Another advantage of the controller circuit is controlled uncertainty level. The controller circuit has additional degrees of freedom that can control the level of uncertainty that might be needed to model a highly dubious clinical situation. By replacing the CNOT gate by a more general (CU3left( {theta ,phi ,lambda } right)) gate, we can control the level of additional stochasticity with the rotation angles (theta), (phi), and (lambda), which corresponds to the angles in the Bloch sphere. The angles can either be fixed or, for additional control, changed with training episode.
The patients state in the ARTE is defined by 5 biological features: cytokine (IP10), PET imaging feature (GLSZM-ZSV), radiation doses (Tumor gEUD and lung gEUD), and genetics (cxcr1- Rs2234671). Their descriptions are presented in Table 2. These 5 variables were selected from a multi-objective Bayesian Network study13, which considered over 297 various biological features and found the best features for predicting the joint LC and RP2 RT outcomes.
The training data analyzed in this study are obtained from the University of Michigan study UMCC 2007.123 (NCI clinical trial NCT01190527) and the validation data analyzed in this study are obtained from the RTOG-0617 study (NCI clinical trial NCT00533949). Both trials were conducted in accordance with relevant guidelines and regulations and informed consent was obtained from all subjects and/or legal guardians. Details on training and validation datasets, and necessary model imputation carried out to accommodate the differences in the datasets are presented in the Supplementary Materials.
Deep Neural Networks (DNN) were applied as transition functions for IP10 and GLSZM-ZSV features. They were trained with a longitudinal (time-series) dataset, with the pre-irradiation patient state and corresponding radiation dose as input features and post-irradiation state as output. For lung and tumor gEUD, we utilized prior knowledge and applied a monotonic relationship for the transition function since we know that gEUD should increase with increasing radiation dose. We assumed that the change in gEUD is proportional to the dose fractionation and tissue radiosensitivity,
$$frac{{gleft( {t_{n} } right) - gleft( {t_{n - 1} } right)}}{{t_{n} - t_{n - 1} }} propto d_{n} left( {1 + frac{{d_{n} }}{{frac{alpha }{beta }}}} right).$$
(1)
Here (gleft( {t_{n} } right)) is the gEUD at time point (t_{n}), (d_{n}) is the radiation dose fractionation given during the nth time period, and (alpha /beta) ratio is the radiosensitivity parameter which differs between tissue type. Note that we first applied constrained training42 to maintain monotonicity with DNN model, however the gEUD over time trend was flatter than anticipated, thus we opted for a process-driven approach in the final implementation. The technical details on the NNs and its training are presented in the Supplementary Material.
DNN classifiers were applied as the RT outcome estimator for LC and RP2 treatment outcomes. They were trained with post irradiation patient states as input and binary LC and RP2 outcomes as its labels.
RT outcome estimator must also satisfy a monotone condition between increasing radiation dose and increasing probability of local control as well as probability of radiation induced pneumonitis. To maintain this monotonic relationship, we used a generic logistic function,
$$p_{LC|RP2} = frac{1}{{1 + exp left( {frac{{gleft( {t_{6} } right) - mu }}{T}} right)}},$$
(2)
where (gleft( {t_{6} } right)) is the gEUD at week 6, and (mu) and (T) are two patient-specific parameters that are learned from training the DNN. Here, (mu) and (T) are the outputs of two neural networks that are fed into the logistic function and tuned one after the other, leaving the other fixed. The training details are presented in the Supplementary Materials.
The task of the agent is to determine the optimal dose that maximizes (p_{LC}) while minimizing (p_{RP2}). Accordingly, we built a reward function on the base function (P^{ + } = P_{LC} left( {1 - P_{RP2} } right)) as shown in Fig.6. The algebraic form is as follows,
$$R = left{ {begin{array}{*{20}l} {P^{ + } + 10 } hfill & { {text{if}} 70% < p_{Lc} < 100% ;{text{and}}; 0% < p_{RP2} < 17.2% } hfill \ {P^{ + } + 5} hfill & {{text{if}} 50% < p_{Lc} < 70% ;{text{and}}; 17.2% < p_{RP2} < 50% } hfill \ {P^{ + } - 1} hfill & {{text{if}} 0% < p_{Lc} < 50% ;{text{and}}; 50 < p_{RP2} < 100% } hfill \ end{array} } right.$$
(3)
Reward function for reinforcement learning. Contour plot of reward function as a function of the probability of local control (PLC) and radiation induced pneumonitis of grade 2 or higher (PRP2). Area enclosed by the blue line corresponds to the clinically desirable outcome, i.e., (P_{LC} > 70{%}) and ({P_{RP2}} <17.2{%}). Similarly, the area enclosed by the green lines corresponds to the computationally desirable outcome, i.e., (P_{LC} > 50{%}) and ({P_{RP2}} <50{%}). Along with (P_{LC} times (1-P_{RP2})) the AI agent receives+10 reward for achieving clinically desirable outcome,+5 for achieving computationally desirable outcome, and -1 when unable to achieve a desirable outcome.
Here the AI agent receives additional 10 points for achieving clinically desirable outcome (i.e., (p_{LC} > 70% quad {text{and}} quad p_{RP2} < 17.2%)), 5 points for achieving computationally desirable outcome (i.e., (p_{LC} > 50% quad {text{and}} quad p_{RP2} < 50%)), and -1 point for failing to achieve a desirable outcome altogether. The negative point motivates the AI agent to search for the optimal dose as soon as possible.
To compensate for low number of data points we employed WGAN-GP43, which learns the underlying data distribution and generates more data points. We generated 4000 additional data points for training qDRL models. Having a larger training dataset helps the reinforcement learning algorithm in accurately representing the state space. The training details are presented in the Supplementary Material.
See the rest here:
Quantum deep reinforcement learning for clinical decision support in oncology: application to adaptive radiotherapy | Scientific Reports - Nature.com
- IBM and University Researchers Create a Never-Before-Seen Molecule and Prove its Exotic Nature with Quantum Computing - IBM Newsroom - March 7th, 2026 [March 7th, 2026]
- IBM scientists unveil the first ever half-Mbius molecule, with the help of quantum computing - Scientific American - March 7th, 2026 [March 7th, 2026]
- Researchers create a never-before-seen molecule and prove its exotic nature with quantum computing - Phys.org - March 7th, 2026 [March 7th, 2026]
- Scientists May Have Found the Holy Grail of Quantum Computing - SciTechDaily - March 7th, 2026 [March 7th, 2026]
- 3 Best Quantum Computing Stocks to Buy in 2026, According to Analysts - TipRanks - March 7th, 2026 [March 7th, 2026]
- Tech bills of the week: quantum computing research; AI workforce development; and more - Nextgov/FCW - March 7th, 2026 [March 7th, 2026]
- Quantum computing moves forward to a new future - IT Brew - March 7th, 2026 [March 7th, 2026]
- IBM and University Researchers Create a Never-Before-Seen Molecule and Prove its Exotic Nature with Quantum Computing - PR Newswire - March 7th, 2026 [March 7th, 2026]
- 2 Quantum Computing Stocks That Could Help Make You a Fortune - The Motley Fool - March 7th, 2026 [March 7th, 2026]
- Quantum simulates properties of the first-ever half-Mbius molecule, designed by IBM and researchers - IBM Research - March 7th, 2026 [March 7th, 2026]
- John Martinis, winner of 2025 Nobel Prize in Physics: I wouldnt want quantum computing to be known for breaking the internet - EL PAS English - March 7th, 2026 [March 7th, 2026]
- Quantum Computing Readiness Research - Fujitsu Global - March 7th, 2026 [March 7th, 2026]
- Quantum Computing Inc. Completes Acquisition of NuCrypt to Advance Quantum Communications Commercialization - PR Newswire - March 7th, 2026 [March 7th, 2026]
- Quantum Computing Inc. Reports Fourth Quarter and Full-Year 2025 Financial Results - Quantum Computing Report - March 7th, 2026 [March 7th, 2026]
- Scientists Just Took One Big Step Toward an Unhackable Internet - AOL.com - March 7th, 2026 [March 7th, 2026]
- Quantum Computing Inc. Reports Fourth Quarter and Year-End 2025 Financial Results - Yahoo Finance - March 7th, 2026 [March 7th, 2026]
- IBM scientists unveil the first ever half-Mbius molecule, with the help of quantum computing - oodaloop.com - March 7th, 2026 [March 7th, 2026]
- Quantum Computing Threatens the Internet: Experts Call for Immediate Action - Sri Lanka Guardian - March 7th, 2026 [March 7th, 2026]
- Is your business protected against the quantum threat? - IOT Insider - March 7th, 2026 [March 7th, 2026]
- Rigetti Computing posts Q4 sales that fall short of expectations - Sherwood News - March 7th, 2026 [March 7th, 2026]
- Huawei Unveils the Upgraded Xinghe AI Fabric 2.0 Solution for the AI Era - HPCwire - March 7th, 2026 [March 7th, 2026]
- Why Quantum Computing Stock Is Plummeting Today - The Motley Fool - March 7th, 2026 [March 7th, 2026]
- Kvantify Partners with Danish Universities on Quantum Drug Discovery Project - The Quantum Insider - March 7th, 2026 [March 7th, 2026]
- Quantum Threats Are Real. These Companies Are Building the Fix - PR Newswire - March 7th, 2026 [March 7th, 2026]
- 2 Quantum Computing Stocks That Could Help Make You a Fortune - The Globe and Mail - March 7th, 2026 [March 7th, 2026]
- If I Could Own Only 1 Quantum Computing Stock for 2026, It Would Be This - The Motley Fool - March 7th, 2026 [March 7th, 2026]
- (Half) Twisted Science: Researchers Build a Molecular Mbius Strip With Only Half the Twist - The Quantum Insider - March 7th, 2026 [March 7th, 2026]
- Quantum Computing Inc. to Participate in the 2026 Cantor Global Technology & Industrial Growth Conference - Financial Times - March 7th, 2026 [March 7th, 2026]
- Is IonQ Stock the Tesla of Quantum Computing? - Barchart.com - March 7th, 2026 [March 7th, 2026]
- IonQ vs. D-Wave: Which Quantum Stock Has the Clearer Path to Growth in 2026? - The Motley Fool - February 22nd, 2026 [February 22nd, 2026]
- Triplet superconductivityphysicists may have found the missing link for quantum computers - Phys.org - February 22nd, 2026 [February 22nd, 2026]
- RGTI or QBTS: Top Analyst Selects the Top Quantum Computing Stock to Buy - TipRanks - February 22nd, 2026 [February 22nd, 2026]
- Here's the Quantum Computing Stock Wall Street Loves the Most (Hint: It's Not IonQ or Rigetti) - The Motley Fool - February 22nd, 2026 [February 22nd, 2026]
- Vanguard Owns 36 Million Shares of Rigetti Computing. Here's Why That $577 Million Position Doesn't Mean What You Think It Does. - The Motley Fool - February 22nd, 2026 [February 22nd, 2026]
- Deutsche Telekom and Qunnect Successfully Test Quantum Teleportation Over Live Berlin Network - HPCwire - February 22nd, 2026 [February 22nd, 2026]
- Quantum Co-laboratory Extends Five-Year National Collaboration - The Quantum Insider - February 22nd, 2026 [February 22nd, 2026]
- CoinShares says only 10,200 BTC face real quantum risk, pushing back on 'overblown' estimates - The Block - February 9th, 2026 [February 9th, 2026]
- IonQ's Growth Story Is Just Beginning. Here's What Investors Should Know. - Nasdaq - February 9th, 2026 [February 9th, 2026]
- Google has just crossed the quantum threshold: thus begins the era of error-free computers - ECOticias.com - February 9th, 2026 [February 9th, 2026]
- The Best Quantum Computing Stocks to Buy With $3,000 - The Motley Fool - February 9th, 2026 [February 9th, 2026]
- Looking for Quantum Computing Exposure? QTUM Is Still the Markets Only ETF Option - TipRanks - February 9th, 2026 [February 9th, 2026]
- Quantum Computing Stocks To Add to Your Watchlist - February 9th - MarketBeat - February 9th, 2026 [February 9th, 2026]
- From Quantum Threat to AI Exposure: Why Security Is Converging Faster Than Enterprises Expect - The Quantum Insider - February 9th, 2026 [February 9th, 2026]
- The Best Quantum Computing Stocks to Buy With $3,000 - AOL.com - February 9th, 2026 [February 9th, 2026]
- Why making Bitcoin quantum-proof now could do more harm than good - dlnews.com - February 9th, 2026 [February 9th, 2026]
- Infleqtion lands deal with DOE to help achieve grid optimization through quantum computing - Seeking Alpha - February 9th, 2026 [February 9th, 2026]
- Quantum computing: why UK businesses need to act now - Raconteur - February 9th, 2026 [February 9th, 2026]
- Quantum Computing vs Bitcoin: How Real Is the Threat? - BeInCrypto - February 9th, 2026 [February 9th, 2026]
- D-Wave Quantum: Falling Behind With Growing Execution And Supply Chain Risks (NYSE:QBTS) - Seeking Alpha - February 9th, 2026 [February 9th, 2026]
- Buy These 2 Quantum Stocks Now For Up to 5,233% Gains by 2035. - The Motley Fool - February 9th, 2026 [February 9th, 2026]
- D-Wave Quantum Shares Crashed in January. Is it Time to Buy? - The Motley Fool - February 9th, 2026 [February 9th, 2026]
- IonQ's Growth Story Is Just Beginning. Here's What Investors Should Know. - The Motley Fool - February 9th, 2026 [February 9th, 2026]
- "Only" 10,200 Bitcoin at Real Risk From Quantum Computing - 99Bitcoins - February 9th, 2026 [February 9th, 2026]
- Quantum Computing Stocks To Add to Your Watchlist - February 8th - MarketBeat - February 9th, 2026 [February 9th, 2026]
- Podcast with Joe Ghalbouni Ghalbouni consulting, formerly with Point72 hedge fund - The Quantum Insider - February 9th, 2026 [February 9th, 2026]
- Only 10K Bitcoin Face Realistic Quantum Computing Threat, CoinShares Research Shows - CoinCentral - February 9th, 2026 [February 9th, 2026]
- IonQ's Growth Story Is Just Beginning. Here's What Investors Should Know. - AOL.com - February 9th, 2026 [February 9th, 2026]
- CoinShares: Quantum Computing Isnt a Near-Term Threat to Bitcoin - Cointribune - February 9th, 2026 [February 9th, 2026]
- The biggest misconception about 'quantum computing' in the market: Its still 'too early' at this stage. - - February 9th, 2026 [February 9th, 2026]
- Google Calls on Governments And Industry to Prepare Now For Quantum-Era Cybersecurity - The Quantum Insider - February 7th, 2026 [February 7th, 2026]
- Surgery for quantum bits: Bit-flip errors corrected during superconducting qubit operations - Phys.org - February 7th, 2026 [February 7th, 2026]
- The quantum era is coming. Are we ready to secure it? - blog.google - February 7th, 2026 [February 7th, 2026]
- Google Calls for Quantum Security Overhaul Amid Rising Threats - The Tech Buzz - February 7th, 2026 [February 7th, 2026]
- Where Will Rigetti Computing Be in 3 Years? - The Motley Fool - February 7th, 2026 [February 7th, 2026]
- Is Quantum Computing Really Ready for Financial Risk and Security? - Mashable Benelux - February 7th, 2026 [February 7th, 2026]
- Quantum Computer Qubits Linked By New Design For Faster, More Reliable Processing - Quantum Zeitgeist - February 7th, 2026 [February 7th, 2026]
- Should You Forget IonQ and Buy These 2 Tech Stocks Instead? - The Motley Fool - February 7th, 2026 [February 7th, 2026]
- Quantum Technologies, Part Two: Recognizing Risks and Threats to National Security and Def - Institute for National Strategic Studies (INSS) - February 7th, 2026 [February 7th, 2026]
- Floridas Emerging Role in the Quantum Economy - The Quantum Insider - February 7th, 2026 [February 7th, 2026]
- Quantum Circuits Mimic Classical Computers With Built-In Timing For Faster Processing - Quantum Zeitgeist - February 7th, 2026 [February 7th, 2026]
- 2 Top Quantum Computing Stocks to Buy in February - The Motley Fool - February 7th, 2026 [February 7th, 2026]
- Quantum Computing Tackles Real-World Logistics, Cutting Costs And Carbon Emissions - Quantum Zeitgeist - February 7th, 2026 [February 7th, 2026]
- Quantum Computing Stocks To Add to Your Watchlist - February 6th - MarketBeat - February 7th, 2026 [February 7th, 2026]
- Pakistan to Host National Quantum Computing Hackathon at NCP - The Quantum Insider - February 7th, 2026 [February 7th, 2026]
- Quantum Error Correction Scales Up, Paving The Way For Reliable Computers - Quantum Zeitgeist - February 7th, 2026 [February 7th, 2026]
- Quantum Computings Noise Problem Tackled Before Results Are Even Read - Quantum Zeitgeist - February 7th, 2026 [February 7th, 2026]
- D Wave Quantum Expands Dual Platform Reach And Raises Valuation Questions - simplywall.st - February 7th, 2026 [February 7th, 2026]
- Were building for whats coming: A look at quantum computing in the Valley - MassLive.com - February 7th, 2026 [February 7th, 2026]
- NMSU Participates in NSF-Backed Quantum Photonics Project Led by University of New Mexico - HPCwire - February 7th, 2026 [February 7th, 2026]
- Bitcoin's Quantum threat is real but distant, says Wall Street analyst as doomsday debate rages on - CoinDesk - February 1st, 2026 [February 1st, 2026]