Quantum deep reinforcement learning for clinical decision support in oncology: application to adaptive radiotherapy | Scientific Reports – Nature.com
Quantum deep reinforcement learning
Quantum deep reinforcement learning is a novel action value-based decision-making framework derived from QRL23 and deep q-learning10 framework. Like conventional RL9,31, our qDRL based CDSS framework is comprised of 5 main elements: clinical AI agent, ARTE, radiation dose decision-making policy, reward, and q-value function. Here, the AI agent is a clinical decision-maker that learns to make dose decisions for achieving clinically desirable outcomes within the ARTE. The learning takes place by the agent-environment interaction, which can be sequentially ordered as: the AI decides on a dose and executes it, and in response, a patient (part of the ARTE) transits from one state to the next. Each transition provides the AI with feedback for its decision in terms of RT outcome and associated reward value. The goal of RL is for the AI to learn a decision-making policy that maximizes the reward in the long run, defined in terms of a specified q-value function that assigns a value to every state-dose-decision pair obtained from the accumulation of rewards over time (returns).
Assuming Markovs property (i.e., an environments response at time (t + 1) depends only on the state and dose-decision at time (t)), the qDRL task can be mathematically described as a 5-tuple ((S, left| D rightrangle , TF, P, R)), where (S) is a finite set of patients states, (left| D rightrangle) is a superimposed quantum state representing the finite set of eigen-dose decision, (TF:S times D to S^{prime }) is the transition function that maps patients state (s_{t}) and eigen-dose (left| d rightrangle_{t}) to the next state (s_{t + 1}), (P_{LC|RP2} :S^{prime } to left[ {0,1} right]) is the RT outcome estimator that assigns probability values (p_{LC}) and (p_{RP2}) to the state (s_{t + 1}), and (R:left[ {0,1} right] times left[ {0,1} right] to {mathbb{R}}) is the reward function that assigns a reward (r_{t + 1}) to the state-decision pair (left( {s_{t} ,left| d rightrangle_{t} } right)) based on the outcome probability estimates.
Eigen-dose (left| d rightrangle) is a physically performable decision that is selected via quantum methods from the superimposed quantum state (left| D rightrangle) which simultaneously represents all possible eigen-doses at once. In simple words, (left| D rightrangle) is the collection of all possible dose options and (left| d rightrangle) is one of those options which is selected after a decision is made. Selecting dose decision (left| d rightrangle) is carried out in two steps: (1) amplifying the optimal eigen-dose (left| d rightrangle^{*}) from the superimposed state (left| D rightrangle) (i.e., (left| D rightrangle^{prime } = widehat{Amp}_{{left| d rightrangle^{*} }} left| D rightrangle)) and (2) measuring the amplified state (i.e., (left| d rightrangle = widehat{Measure}(left| {D^{prime } } rightrangle )).
The optimal eigen-dose (left| d rightrangle^{*}) is obtained from deep Q-net, which is the AIs memory. Deep Q-net, (DQN:S to {mathbb{R}}^{d}), is a neural network that takes patients state as input and then outputs q-value for each eigen-dose ((left{ {q_{left| d rightrangle } } right})). The optimal dose is then selected following greedy policy where the dose with the maximum q-value is selected (i.e., (left| d rightrangle^{*} = begin{array}{*{20}c} {argmax} \ {left| {d^{prime } } rightrangle } \ end{array} { q_{left| d rightrangle } })). We have applied a double Q-learning 32 algorithm in training the deep Q-net. The schematic of a training cycle is presented in Fig.2 and additional technical details are presented in the Supplementary Material.
We initially employed Grovers amplification procedure33,34 for the decision selection mechanism. While Grovers procedure works on a quantum simulator, it fails to correctly work in a quantum computer. The quantum circuit depth of Grovers procedure (for 4 or higher qubits) is much greater than the coherence length of the current quantum processor35. Whenever the quantum circuit length exceeds the coherence length, quantum state becomes significantly affected by the system noise and loses vital information. Therefore, we designed a quantum controller circuit that is shorter than the coherence length and is suitable for the task of decision selection. The merit of our design is its fixed length; since its length is fixed for any number of qubits, it is suitable for higher qubit systems, as much as permitted by the circuit width. Technical details regarding its implementation in quantum processor is presented in the Supplementary Materials.
An example of a controller circuit is given in Fig.5. Controller circuits use twice the number of qubits (n), which can be divided into control and main. Optimal eigen-states obtained from deep Q-net are created in the control by selecting the appropriate pre-control gates. Then the control is entangled with the qubits from the main via controlled NOT (CNOT) gates. CNOT gates are connected between a control qubit from the control and a target qubit from the main. CNOT gates flip the target qubit from (left| 1 rightrangle) to (left| 0 rightrangle) only when the control is in (left| 1 rightrangle) state and does not perform any operation otherwise. Because all the main qubits are prepared in (left| 0 rightrangle) state, we introduced the reverse gates (n X-gates in parallel) to flip them to (left| 1 rightrangle). X-gates flip (left| 0 rightrangle) to (left| 1 rightrangle), and vice-versa. The CNOT flips all the qubits whose controls are in (left| 1 rightrangle) state, creating a state that is element-wise opposite to the marked state. Finally, another set of reverse gates is applied to the main before making a measurement.
Quantum controller circuit for a 5 qubit (32 bit) system. (a) Quantum controller circuit for the selection of the state (left| {10101} rightrangle). The probability distribution corresponding to (b) failed Grovers amplification procedure for one iteration run in the 5-qubit IBMQ Santiago quantum processor and (c) successful quantum controller selection run in the 15-qubit IBMQ Melbourne quantum processor.
Another advantage of the controller circuit is controlled uncertainty level. The controller circuit has additional degrees of freedom that can control the level of uncertainty that might be needed to model a highly dubious clinical situation. By replacing the CNOT gate by a more general (CU3left( {theta ,phi ,lambda } right)) gate, we can control the level of additional stochasticity with the rotation angles (theta), (phi), and (lambda), which corresponds to the angles in the Bloch sphere. The angles can either be fixed or, for additional control, changed with training episode.
The patients state in the ARTE is defined by 5 biological features: cytokine (IP10), PET imaging feature (GLSZM-ZSV), radiation doses (Tumor gEUD and lung gEUD), and genetics (cxcr1- Rs2234671). Their descriptions are presented in Table 2. These 5 variables were selected from a multi-objective Bayesian Network study13, which considered over 297 various biological features and found the best features for predicting the joint LC and RP2 RT outcomes.
The training data analyzed in this study are obtained from the University of Michigan study UMCC 2007.123 (NCI clinical trial NCT01190527) and the validation data analyzed in this study are obtained from the RTOG-0617 study (NCI clinical trial NCT00533949). Both trials were conducted in accordance with relevant guidelines and regulations and informed consent was obtained from all subjects and/or legal guardians. Details on training and validation datasets, and necessary model imputation carried out to accommodate the differences in the datasets are presented in the Supplementary Materials.
Deep Neural Networks (DNN) were applied as transition functions for IP10 and GLSZM-ZSV features. They were trained with a longitudinal (time-series) dataset, with the pre-irradiation patient state and corresponding radiation dose as input features and post-irradiation state as output. For lung and tumor gEUD, we utilized prior knowledge and applied a monotonic relationship for the transition function since we know that gEUD should increase with increasing radiation dose. We assumed that the change in gEUD is proportional to the dose fractionation and tissue radiosensitivity,
$$frac{{gleft( {t_{n} } right) - gleft( {t_{n - 1} } right)}}{{t_{n} - t_{n - 1} }} propto d_{n} left( {1 + frac{{d_{n} }}{{frac{alpha }{beta }}}} right).$$
(1)
Here (gleft( {t_{n} } right)) is the gEUD at time point (t_{n}), (d_{n}) is the radiation dose fractionation given during the nth time period, and (alpha /beta) ratio is the radiosensitivity parameter which differs between tissue type. Note that we first applied constrained training42 to maintain monotonicity with DNN model, however the gEUD over time trend was flatter than anticipated, thus we opted for a process-driven approach in the final implementation. The technical details on the NNs and its training are presented in the Supplementary Material.
DNN classifiers were applied as the RT outcome estimator for LC and RP2 treatment outcomes. They were trained with post irradiation patient states as input and binary LC and RP2 outcomes as its labels.
RT outcome estimator must also satisfy a monotone condition between increasing radiation dose and increasing probability of local control as well as probability of radiation induced pneumonitis. To maintain this monotonic relationship, we used a generic logistic function,
$$p_{LC|RP2} = frac{1}{{1 + exp left( {frac{{gleft( {t_{6} } right) - mu }}{T}} right)}},$$
(2)
where (gleft( {t_{6} } right)) is the gEUD at week 6, and (mu) and (T) are two patient-specific parameters that are learned from training the DNN. Here, (mu) and (T) are the outputs of two neural networks that are fed into the logistic function and tuned one after the other, leaving the other fixed. The training details are presented in the Supplementary Materials.
The task of the agent is to determine the optimal dose that maximizes (p_{LC}) while minimizing (p_{RP2}). Accordingly, we built a reward function on the base function (P^{ + } = P_{LC} left( {1 - P_{RP2} } right)) as shown in Fig.6. The algebraic form is as follows,
$$R = left{ {begin{array}{*{20}l} {P^{ + } + 10 } hfill & { {text{if}} 70% < p_{Lc} < 100% ;{text{and}}; 0% < p_{RP2} < 17.2% } hfill \ {P^{ + } + 5} hfill & {{text{if}} 50% < p_{Lc} < 70% ;{text{and}}; 17.2% < p_{RP2} < 50% } hfill \ {P^{ + } - 1} hfill & {{text{if}} 0% < p_{Lc} < 50% ;{text{and}}; 50 < p_{RP2} < 100% } hfill \ end{array} } right.$$
(3)
Reward function for reinforcement learning. Contour plot of reward function as a function of the probability of local control (PLC) and radiation induced pneumonitis of grade 2 or higher (PRP2). Area enclosed by the blue line corresponds to the clinically desirable outcome, i.e., (P_{LC} > 70{%}) and ({P_{RP2}} <17.2{%}). Similarly, the area enclosed by the green lines corresponds to the computationally desirable outcome, i.e., (P_{LC} > 50{%}) and ({P_{RP2}} <50{%}). Along with (P_{LC} times (1-P_{RP2})) the AI agent receives+10 reward for achieving clinically desirable outcome,+5 for achieving computationally desirable outcome, and -1 when unable to achieve a desirable outcome.
Here the AI agent receives additional 10 points for achieving clinically desirable outcome (i.e., (p_{LC} > 70% quad {text{and}} quad p_{RP2} < 17.2%)), 5 points for achieving computationally desirable outcome (i.e., (p_{LC} > 50% quad {text{and}} quad p_{RP2} < 50%)), and -1 point for failing to achieve a desirable outcome altogether. The negative point motivates the AI agent to search for the optimal dose as soon as possible.
To compensate for low number of data points we employed WGAN-GP43, which learns the underlying data distribution and generates more data points. We generated 4000 additional data points for training qDRL models. Having a larger training dataset helps the reinforcement learning algorithm in accurately representing the state space. The training details are presented in the Supplementary Material.
See the rest here:
Quantum deep reinforcement learning for clinical decision support in oncology: application to adaptive radiotherapy | Scientific Reports - Nature.com
- Michio Kaku: How quantum computers compute in multiple universes at once - Big Think - December 14th, 2025 [December 14th, 2025]
- Quantum Computing Stocks To Keep An Eye On - December 14th - MarketBeat - December 14th, 2025 [December 14th, 2025]
- Vanderbilt University and EPB launch innovation institute to accelerate quantum science and technology breakthroughs - Vanderbilt University - December 14th, 2025 [December 14th, 2025]
- Move Over D-Wave, Alphabet Is Taking Over Quantum Computing - The Motley Fool - December 14th, 2025 [December 14th, 2025]
- Forget IonQ: This Quantum Computing Stock Is a Better Buy - The Motley Fool - December 14th, 2025 [December 14th, 2025]
- Quantum Computing Stocks IonQ, Rigetti Computing, and D-Wave Quantum Have a Date With History in 2026 - The Motley Fool - December 14th, 2025 [December 14th, 2025]
- Forget D-Wave: This Stock Is the Next Quantum Computing Winner - The Motley Fool - December 14th, 2025 [December 14th, 2025]
- Move over D-Wave, Alphabet is taking over quantum computing - MSN - December 14th, 2025 [December 14th, 2025]
- Quantum Computers Measure Hall Viscosity of Fractional Quantum Hall State with Hilbert-Space Truncation - Quantum Zeitgeist - December 14th, 2025 [December 14th, 2025]
- The mind-bending complexities of quantum investing - Financial Times - December 14th, 2025 [December 14th, 2025]
- Will Quantum Computing Inc. Stock Rebound in 2026? - The Motley Fool - December 14th, 2025 [December 14th, 2025]
- Forget Rigetti Computing: This Quantum Computing Stock Is a Much Better Buy Right Now - The Motley Fool - December 14th, 2025 [December 14th, 2025]
- Headlands Technologies LLC Buys 268,087 Shares of Quantum Computing Inc. $QUBT - MarketBeat - December 14th, 2025 [December 14th, 2025]
- Is D-Wave Quantum One of the Most Overlooked Tech Stories of the Decade? - The Motley Fool - December 14th, 2025 [December 14th, 2025]
- Is D-Wave Quantum one of the most overlooked tech stories of the decade? - MSN - December 14th, 2025 [December 14th, 2025]
- NQCC Partners with Google Quantum AI to Offer UK Researchers Access to Willow - HPCwire - December 14th, 2025 [December 14th, 2025]
- Celebrating the Institute for Quantum Computing's year of impact and collaboration - University of Waterloo - December 14th, 2025 [December 14th, 2025]
- Quantum computing cant advance without solving a critical problem - Earth.com - December 14th, 2025 [December 14th, 2025]
- New iron telluride thin film achieves superconductivity for quantum computer chips - Phys.org - December 12th, 2025 [December 12th, 2025]
- Prediction: This Stock Will Be the Biggest Quantum Computing Winner of 2026 - Yahoo Finance - December 10th, 2025 [December 10th, 2025]
- How Fujitsu Is Tackling a 10,000-Qubit Quantum Computer for Practical Applications - The Quantum Insider - December 10th, 2025 [December 10th, 2025]
- Bubble Warning: Don't Buy IonQ Stock Until It Falls to This Price - Yahoo Finance - December 10th, 2025 [December 10th, 2025]
- Quantum computing reality check: What business needs to know now - MIT Sloan - December 10th, 2025 [December 10th, 2025]
- Four Things Every Business Leader Should Know About Quantum Computing, According to an MIT Quantum Engineer - The Quantum Insider - December 10th, 2025 [December 10th, 2025]
- IonQ, Rigetti Computing, D-Wave Quantum, and Quantum Computing Inc. Have Issued a $926 Million Warning to Wall Street for 2026 - The Motley Fool - December 10th, 2025 [December 10th, 2025]
- The Best Quantum Computing Stock to Own If the Bubble Bursts (Hint: It's Not D-Wave, IonQ, or Rigetti) - Yahoo Finance - December 10th, 2025 [December 10th, 2025]
- The 3 Smartest Quantum Computing Stocks to Buy With $1,000 in 2026 - Yahoo Finance - December 10th, 2025 [December 10th, 2025]
- 1 Quantum Computing Stock That Should Be on Every Investor's Holiday List - The Motley Fool - December 10th, 2025 [December 10th, 2025]
- SEALSQ Boosts Quantum Investment Fund from $35 Million to Over $100 Million - The Quantum Insider - December 10th, 2025 [December 10th, 2025]
- QuEra Computing Marks Record 2025 as the Year of Fault Tolerance and Over $230M of New Capital to Accelerate Industrial Deployment - PR Newswire - December 10th, 2025 [December 10th, 2025]
- 1 quantum computing stock that should be on every investor's holiday list - MSN - December 10th, 2025 [December 10th, 2025]
- 3 Quantum Computing Stocks to Buy and Hold Forever - The Motley Fool - December 10th, 2025 [December 10th, 2025]
- A Big-Name Analyst Started D-Wave Quantum as a Buy. It Might Have Further to Fly - 24/7 Wall St. - December 10th, 2025 [December 10th, 2025]
- Prediction: The Quantum Computing Bubble Will Burst in 2026, and These 3 Stocks Will Go Down With It - Yahoo Finance - December 10th, 2025 [December 10th, 2025]
- Bubble Warning: Don't Buy IonQ Stock Until It Falls to This Price - The Motley Fool - December 10th, 2025 [December 10th, 2025]
- The Quantum Revolution Is Here, And Its About More Than Just Computing - Bernard Marr - December 10th, 2025 [December 10th, 2025]
- Billionaire Ken Griffin Buys 2 Quantum Computing Stocks Up 3,750% and 1,770% Since 2023. Wall Street Says They Are Headed Higher. - Nasdaq - December 10th, 2025 [December 10th, 2025]
- Students from the "Quantum Information Engineering Department" newly established by Sungkyunkwan Uni.. - - December 10th, 2025 [December 10th, 2025]
- Xanadu Expands Partnership with A*STAR to Advance Photonic Quantum Computing - The Quantum Insider - December 10th, 2025 [December 10th, 2025]
- How Quantum Control Systems Will Unlock the Next Leap in Computing - The Fast Mode - December 10th, 2025 [December 10th, 2025]
- The Best Quantum Computing Stock to Own If the Bubble Bursts (Hint: It's Not D-Wave, IonQ, or Rigetti) - The Motley Fool - December 10th, 2025 [December 10th, 2025]
- Quantum Computing: A $3 Billion Company With Almost No Revenue - Seeking Alpha - December 10th, 2025 [December 10th, 2025]
- Billionaire Ken Griffin Buys 2 Quantum Computing Stocks Up 3,750% and 1,770% Since 2023. Wall Street Says They Are Headed Higher. - The Motley Fool - December 10th, 2025 [December 10th, 2025]
- D-Wave Quantum's Stock Price Crashed Nearly 40% in November. What's Next For The Quantum Computing Company? - The Motley Fool - December 10th, 2025 [December 10th, 2025]
- This Is the Quantum Computing Stock Billionaires Want to Own for 2026 (Even Warren Buffett) -- and It's Not IonQ, Rigetti Computing, or D-Wave Quantum... - December 10th, 2025 [December 10th, 2025]
- Quantum computing and blockchains: Matching urgency to actual threats - a16z crypto - December 7th, 2025 [December 7th, 2025]
- Here Are My Top 3 Quantum Computing Stocks to Buy in December - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- Forget Rigetti Computing and Buy This Safer Quantum Stock Instead - Nasdaq - December 7th, 2025 [December 7th, 2025]
- Prediction: This Stock Will Be the Biggest Quantum Computing Winner of 2026 - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- Where Will Rigetti Computing Stock Be in 5 Years? - Yahoo Finance - December 7th, 2025 [December 7th, 2025]
- Combined with AI and classical computing, quantum computing is the most influential and dangerous tool weve ever had - CTech - December 7th, 2025 [December 7th, 2025]
- Quantum Computing Turned $1,000 Into Nearly $6,000 While Losing $27 on Every Dollar of Revenue - 24/7 Wall St. - December 7th, 2025 [December 7th, 2025]
- 1 Quantum Computing Stock to Buy Hand Over Fist in December - Nasdaq - December 7th, 2025 [December 7th, 2025]
- Where Will Quantum Computing Stock Be in 1 Year? - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- Why I Wouldn't Touch D-Wave Quantum Stock With a 10-Foot Pole - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- 1 Quantum Computing Stock to Buy Hand Over Fist in December - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- Race to Find the Next Nvidia in Quantum Computing - EE Times - December 7th, 2025 [December 7th, 2025]
- Japan Brings Ion-Trap Qubits Online Through The Cloud in a Step Toward Remote Quantum Computing - The Quantum Insider - December 7th, 2025 [December 7th, 2025]
- Quantum Computing Turned $1,000 Into Nearly $6,000 While Losing $27 on Every Dollar of Revenue - AOL.com - December 7th, 2025 [December 7th, 2025]
- Should You Buy Rigetti Computing Stock After Its 2,750% Gain Since 2024? Wall Street Has a Surprising Answer. - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- Forget Rigetti Computing and Buy This Safer Quantum Stock Instead - Yahoo Finance - December 7th, 2025 [December 7th, 2025]
- Should you buy Rigetti Computing stock after its 2,750% gain since 2024? Wall Street has a surprising answer. - MSN - December 7th, 2025 [December 7th, 2025]
- Quantum computing: The UKs next big leap in global tech leadership - Innovation News Network - December 7th, 2025 [December 7th, 2025]
- Cardano Builders are Now Betting on AI and Quantum Computing Growth - Yahoo Finance - December 7th, 2025 [December 7th, 2025]
- Forget IonQ: Alphabet is a Much Better Bet on Quantum Computing. - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- Harnessing Quantum Power to Shape the Future - UConn Today - December 7th, 2025 [December 7th, 2025]
- Q&A on the next big cyber threat: Post-quantum cryptography - SC Media - December 7th, 2025 [December 7th, 2025]
- When Will Quantum Technologies Become Part of Everyday Life? - The Quantum Insider - December 7th, 2025 [December 7th, 2025]
- ParityQC Awarded Contract by DLR to Integrate Quantum Computing for Next-Generation Mobility Solutions - The Quantum Insider - December 7th, 2025 [December 7th, 2025]
- As D-Wave Launches a New Government Unit, Should You Buy, Sell, or Hold the Quantum Computing Stock Here? - Yahoo Finance - December 7th, 2025 [December 7th, 2025]
- Looking for a Better Quantum Computing Stock Than IonQ? Wall Street Loves This One. - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- Better quantum computing stock: D-Wave Quantum vs. IBM - MSN - December 7th, 2025 [December 7th, 2025]
- IonQ Is Yesterday's News: Buy This Quantum Computing Stock Instead - The Motley Fool - December 7th, 2025 [December 7th, 2025]
- dMY Squared Technology Group, Inc. Announces Completion of Quantum Computer - TradingView - December 4th, 2025 [December 4th, 2025]
- Did the US quantum computer really crack the Bitcoin key and steal $15 billion? - The Globe and Mail - November 24th, 2025 [November 24th, 2025]
- Meet the Genius Quantum Computing Stock Warren Buffett and Berkshire Hathaway Just Bought - Yahoo Finance - November 23rd, 2025 [November 23rd, 2025]
- IBM and Cisco Join Forces to Build a Quantum Internet - TipRanks - November 23rd, 2025 [November 23rd, 2025]
- Institutional Investors Piled Into IonQ, Rigetti Computing, D-Wave Quantum, and Quantum Computing Inc. Stocks -- and They'll Likely Regret It - Nasdaq - November 23rd, 2025 [November 23rd, 2025]
- World Record Broken: 50-Qubit Quantum Computer Fully Simulated for the First Time - SciTechDaily - November 23rd, 2025 [November 23rd, 2025]
- The Basics Of Using Python For Quantum Computing - Open Source For You - November 23rd, 2025 [November 23rd, 2025]