Quantum deep reinforcement learning for clinical decision support in oncology: application to adaptive radiotherapy | Scientific Reports – Nature.com
Quantum deep reinforcement learning
Quantum deep reinforcement learning is a novel action value-based decision-making framework derived from QRL23 and deep q-learning10 framework. Like conventional RL9,31, our qDRL based CDSS framework is comprised of 5 main elements: clinical AI agent, ARTE, radiation dose decision-making policy, reward, and q-value function. Here, the AI agent is a clinical decision-maker that learns to make dose decisions for achieving clinically desirable outcomes within the ARTE. The learning takes place by the agent-environment interaction, which can be sequentially ordered as: the AI decides on a dose and executes it, and in response, a patient (part of the ARTE) transits from one state to the next. Each transition provides the AI with feedback for its decision in terms of RT outcome and associated reward value. The goal of RL is for the AI to learn a decision-making policy that maximizes the reward in the long run, defined in terms of a specified q-value function that assigns a value to every state-dose-decision pair obtained from the accumulation of rewards over time (returns).
Assuming Markovs property (i.e., an environments response at time (t + 1) depends only on the state and dose-decision at time (t)), the qDRL task can be mathematically described as a 5-tuple ((S, left| D rightrangle , TF, P, R)), where (S) is a finite set of patients states, (left| D rightrangle) is a superimposed quantum state representing the finite set of eigen-dose decision, (TF:S times D to S^{prime }) is the transition function that maps patients state (s_{t}) and eigen-dose (left| d rightrangle_{t}) to the next state (s_{t + 1}), (P_{LC|RP2} :S^{prime } to left[ {0,1} right]) is the RT outcome estimator that assigns probability values (p_{LC}) and (p_{RP2}) to the state (s_{t + 1}), and (R:left[ {0,1} right] times left[ {0,1} right] to {mathbb{R}}) is the reward function that assigns a reward (r_{t + 1}) to the state-decision pair (left( {s_{t} ,left| d rightrangle_{t} } right)) based on the outcome probability estimates.
Eigen-dose (left| d rightrangle) is a physically performable decision that is selected via quantum methods from the superimposed quantum state (left| D rightrangle) which simultaneously represents all possible eigen-doses at once. In simple words, (left| D rightrangle) is the collection of all possible dose options and (left| d rightrangle) is one of those options which is selected after a decision is made. Selecting dose decision (left| d rightrangle) is carried out in two steps: (1) amplifying the optimal eigen-dose (left| d rightrangle^{*}) from the superimposed state (left| D rightrangle) (i.e., (left| D rightrangle^{prime } = widehat{Amp}_{{left| d rightrangle^{*} }} left| D rightrangle)) and (2) measuring the amplified state (i.e., (left| d rightrangle = widehat{Measure}(left| {D^{prime } } rightrangle )).
The optimal eigen-dose (left| d rightrangle^{*}) is obtained from deep Q-net, which is the AIs memory. Deep Q-net, (DQN:S to {mathbb{R}}^{d}), is a neural network that takes patients state as input and then outputs q-value for each eigen-dose ((left{ {q_{left| d rightrangle } } right})). The optimal dose is then selected following greedy policy where the dose with the maximum q-value is selected (i.e., (left| d rightrangle^{*} = begin{array}{*{20}c} {argmax} \ {left| {d^{prime } } rightrangle } \ end{array} { q_{left| d rightrangle } })). We have applied a double Q-learning 32 algorithm in training the deep Q-net. The schematic of a training cycle is presented in Fig.2 and additional technical details are presented in the Supplementary Material.
We initially employed Grovers amplification procedure33,34 for the decision selection mechanism. While Grovers procedure works on a quantum simulator, it fails to correctly work in a quantum computer. The quantum circuit depth of Grovers procedure (for 4 or higher qubits) is much greater than the coherence length of the current quantum processor35. Whenever the quantum circuit length exceeds the coherence length, quantum state becomes significantly affected by the system noise and loses vital information. Therefore, we designed a quantum controller circuit that is shorter than the coherence length and is suitable for the task of decision selection. The merit of our design is its fixed length; since its length is fixed for any number of qubits, it is suitable for higher qubit systems, as much as permitted by the circuit width. Technical details regarding its implementation in quantum processor is presented in the Supplementary Materials.
An example of a controller circuit is given in Fig.5. Controller circuits use twice the number of qubits (n), which can be divided into control and main. Optimal eigen-states obtained from deep Q-net are created in the control by selecting the appropriate pre-control gates. Then the control is entangled with the qubits from the main via controlled NOT (CNOT) gates. CNOT gates are connected between a control qubit from the control and a target qubit from the main. CNOT gates flip the target qubit from (left| 1 rightrangle) to (left| 0 rightrangle) only when the control is in (left| 1 rightrangle) state and does not perform any operation otherwise. Because all the main qubits are prepared in (left| 0 rightrangle) state, we introduced the reverse gates (n X-gates in parallel) to flip them to (left| 1 rightrangle). X-gates flip (left| 0 rightrangle) to (left| 1 rightrangle), and vice-versa. The CNOT flips all the qubits whose controls are in (left| 1 rightrangle) state, creating a state that is element-wise opposite to the marked state. Finally, another set of reverse gates is applied to the main before making a measurement.
Quantum controller circuit for a 5 qubit (32 bit) system. (a) Quantum controller circuit for the selection of the state (left| {10101} rightrangle). The probability distribution corresponding to (b) failed Grovers amplification procedure for one iteration run in the 5-qubit IBMQ Santiago quantum processor and (c) successful quantum controller selection run in the 15-qubit IBMQ Melbourne quantum processor.
Another advantage of the controller circuit is controlled uncertainty level. The controller circuit has additional degrees of freedom that can control the level of uncertainty that might be needed to model a highly dubious clinical situation. By replacing the CNOT gate by a more general (CU3left( {theta ,phi ,lambda } right)) gate, we can control the level of additional stochasticity with the rotation angles (theta), (phi), and (lambda), which corresponds to the angles in the Bloch sphere. The angles can either be fixed or, for additional control, changed with training episode.
The patients state in the ARTE is defined by 5 biological features: cytokine (IP10), PET imaging feature (GLSZM-ZSV), radiation doses (Tumor gEUD and lung gEUD), and genetics (cxcr1- Rs2234671). Their descriptions are presented in Table 2. These 5 variables were selected from a multi-objective Bayesian Network study13, which considered over 297 various biological features and found the best features for predicting the joint LC and RP2 RT outcomes.
The training data analyzed in this study are obtained from the University of Michigan study UMCC 2007.123 (NCI clinical trial NCT01190527) and the validation data analyzed in this study are obtained from the RTOG-0617 study (NCI clinical trial NCT00533949). Both trials were conducted in accordance with relevant guidelines and regulations and informed consent was obtained from all subjects and/or legal guardians. Details on training and validation datasets, and necessary model imputation carried out to accommodate the differences in the datasets are presented in the Supplementary Materials.
Deep Neural Networks (DNN) were applied as transition functions for IP10 and GLSZM-ZSV features. They were trained with a longitudinal (time-series) dataset, with the pre-irradiation patient state and corresponding radiation dose as input features and post-irradiation state as output. For lung and tumor gEUD, we utilized prior knowledge and applied a monotonic relationship for the transition function since we know that gEUD should increase with increasing radiation dose. We assumed that the change in gEUD is proportional to the dose fractionation and tissue radiosensitivity,
$$frac{{gleft( {t_{n} } right) - gleft( {t_{n - 1} } right)}}{{t_{n} - t_{n - 1} }} propto d_{n} left( {1 + frac{{d_{n} }}{{frac{alpha }{beta }}}} right).$$
(1)
Here (gleft( {t_{n} } right)) is the gEUD at time point (t_{n}), (d_{n}) is the radiation dose fractionation given during the nth time period, and (alpha /beta) ratio is the radiosensitivity parameter which differs between tissue type. Note that we first applied constrained training42 to maintain monotonicity with DNN model, however the gEUD over time trend was flatter than anticipated, thus we opted for a process-driven approach in the final implementation. The technical details on the NNs and its training are presented in the Supplementary Material.
DNN classifiers were applied as the RT outcome estimator for LC and RP2 treatment outcomes. They were trained with post irradiation patient states as input and binary LC and RP2 outcomes as its labels.
RT outcome estimator must also satisfy a monotone condition between increasing radiation dose and increasing probability of local control as well as probability of radiation induced pneumonitis. To maintain this monotonic relationship, we used a generic logistic function,
$$p_{LC|RP2} = frac{1}{{1 + exp left( {frac{{gleft( {t_{6} } right) - mu }}{T}} right)}},$$
(2)
where (gleft( {t_{6} } right)) is the gEUD at week 6, and (mu) and (T) are two patient-specific parameters that are learned from training the DNN. Here, (mu) and (T) are the outputs of two neural networks that are fed into the logistic function and tuned one after the other, leaving the other fixed. The training details are presented in the Supplementary Materials.
The task of the agent is to determine the optimal dose that maximizes (p_{LC}) while minimizing (p_{RP2}). Accordingly, we built a reward function on the base function (P^{ + } = P_{LC} left( {1 - P_{RP2} } right)) as shown in Fig.6. The algebraic form is as follows,
$$R = left{ {begin{array}{*{20}l} {P^{ + } + 10 } hfill & { {text{if}} 70% < p_{Lc} < 100% ;{text{and}}; 0% < p_{RP2} < 17.2% } hfill \ {P^{ + } + 5} hfill & {{text{if}} 50% < p_{Lc} < 70% ;{text{and}}; 17.2% < p_{RP2} < 50% } hfill \ {P^{ + } - 1} hfill & {{text{if}} 0% < p_{Lc} < 50% ;{text{and}}; 50 < p_{RP2} < 100% } hfill \ end{array} } right.$$
(3)
Reward function for reinforcement learning. Contour plot of reward function as a function of the probability of local control (PLC) and radiation induced pneumonitis of grade 2 or higher (PRP2). Area enclosed by the blue line corresponds to the clinically desirable outcome, i.e., (P_{LC} > 70{%}) and ({P_{RP2}} <17.2{%}). Similarly, the area enclosed by the green lines corresponds to the computationally desirable outcome, i.e., (P_{LC} > 50{%}) and ({P_{RP2}} <50{%}). Along with (P_{LC} times (1-P_{RP2})) the AI agent receives+10 reward for achieving clinically desirable outcome,+5 for achieving computationally desirable outcome, and -1 when unable to achieve a desirable outcome.
Here the AI agent receives additional 10 points for achieving clinically desirable outcome (i.e., (p_{LC} > 70% quad {text{and}} quad p_{RP2} < 17.2%)), 5 points for achieving computationally desirable outcome (i.e., (p_{LC} > 50% quad {text{and}} quad p_{RP2} < 50%)), and -1 point for failing to achieve a desirable outcome altogether. The negative point motivates the AI agent to search for the optimal dose as soon as possible.
To compensate for low number of data points we employed WGAN-GP43, which learns the underlying data distribution and generates more data points. We generated 4000 additional data points for training qDRL models. Having a larger training dataset helps the reinforcement learning algorithm in accurately representing the state space. The training details are presented in the Supplementary Material.
See the rest here:
Quantum deep reinforcement learning for clinical decision support in oncology: application to adaptive radiotherapy | Scientific Reports - Nature.com
- D-Wave Quantum Marks Milestone With Further Push Into Europe - Barron's - October 17th, 2025 [October 17th, 2025]
- What Is One of the Best Quantum Computing Stocks for the Next 10 Years? - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- D-Wave Quantum (QBTS) Named Winner in Fast Company's 2025 Next Big Things in Tech Awards - NewMediaWire - October 17th, 2025 [October 17th, 2025]
- D-Wave stock rises again after it strikes a deal to bring its Advantage2 quantum computer to Italy - Fast Company - October 17th, 2025 [October 17th, 2025]
- What Is One of the Best Quantum Computing Stocks to Buy Right Now? - AOL.com - October 17th, 2025 [October 17th, 2025]
- What Is One of the Best Quantum Computing Stocks for the Next 10 Years? - The Motley Fool - October 17th, 2025 [October 17th, 2025]
- Great News for IonQ Stock, Rigetti Stock, and Quantum Computing Stock Investors - The Motley Fool - October 17th, 2025 [October 17th, 2025]
- Swiss Quantum Technology inks 10M partnership with Californias D-Wave to expand quantum computing access in Europe - Silicon Canals - October 17th, 2025 [October 17th, 2025]
- Study on quantum thermalization from thermal initial states in a superconducting quantum computer - Nature - October 17th, 2025 [October 17th, 2025]
- Cybersecurity gives UT San Antonio a head start in the Texas quantum race - UT San Antonio - October 17th, 2025 [October 17th, 2025]
- What Is One of the Best Quantum Computing Stocks to Buy Right Now? - TECHi - October 17th, 2025 [October 17th, 2025]
- How Quantum Computing Will Upend Cybersecurity - Boston Consulting Group - October 17th, 2025 [October 17th, 2025]
- Why Is Quantum Computing Inc. Stock Jumping Today? - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- 2 Top Stocks in Quantum Computing and Robotics That Could Soar in 2026 - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- Why D-Wave Quantum Stock Fell as Much as 11.5% on Thursday - AOL.com - October 17th, 2025 [October 17th, 2025]
- John Martinis and Michel Devoret win 2025 Nobel Prize in Physics - The Daily Nexus - October 17th, 2025 [October 17th, 2025]
- Biotechs bet on quantum shaping future of healthcare - - Global Venturing - October 17th, 2025 [October 17th, 2025]
- Can Rigetti's 264% Year-to-Date Rally Hold as Quantum Race Heats Up? - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- 2025-10 - How Africas quantum tech could rewrite the future - Wits University - October 17th, 2025 [October 17th, 2025]
- Is IonQ a Better Pick Than RGTI and QBTS Amid the 2025 Quantum Boom? - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- RGTX: Taking Advantage Of The Quantum Computing Momentum (NASDAQ:RGTX) - Seeking Alpha - October 17th, 2025 [October 17th, 2025]
- Oxford physicists achieve teleportation between two quantum supercomputers - The Brighter Side of News - October 15th, 2025 [October 15th, 2025]
- Isentroniq Raises 7.5M to Solve Wiring Bottleneck in Quantum Computers - EE Times Europe - October 15th, 2025 [October 15th, 2025]
- Financial, Other Industries Urged to Prepare for Quantum Computers - Dark Reading - October 15th, 2025 [October 15th, 2025]
- Beyond the Hype: 4 Monumental Risks to Quantum Computing Pure-Plays IonQ, Rigetti Computing, and D-Wave Quantum - The Motley Fool - October 15th, 2025 [October 15th, 2025]
- Classiq Awarded Fast Company's 2025 Next Big Things in Tech - GlobeNewswire - October 15th, 2025 [October 15th, 2025]
- D-Wave Named Winner in Fast Companys 2025 Next Big Things in Tech Awards - Yahoo Finance - October 15th, 2025 [October 15th, 2025]
- Qilimanjaro and QURECA Partner to Strengthen Quantum Education and Workforce Development - The Quantum Insider - October 15th, 2025 [October 15th, 2025]
- AI and quantum computing are converging. Both could get a boost - Yahoo! Tech - October 15th, 2025 [October 15th, 2025]
- Why D-Wave Quantum Stock Zoomed 6% Skyward on Tuesday - The Motley Fool - October 15th, 2025 [October 15th, 2025]
- Qilimanjaro and QURECA Partner to Strengthen Quantum Education and Workforce Development - HPCwire - October 15th, 2025 [October 15th, 2025]
- This 250-year-old equation just got a quantum makeover - ScienceDaily - October 15th, 2025 [October 15th, 2025]
- The 5 next big things in computing, chips, and foundational technology for 2025 - Fast Company - October 15th, 2025 [October 15th, 2025]
- IBM inaugurates powerful computer that puts Spain in the race for quantum utility - EL PAS English - October 15th, 2025 [October 15th, 2025]
- 2 Pure-Play Quantum Computing Stocks That Can Plunge Up to 62%, According to Select Wall Street Analysts - The Motley Fool - October 13th, 2025 [October 13th, 2025]
- Are we ready for Quantum AI and Australias next cyber war? - The Australian - October 13th, 2025 [October 13th, 2025]
- Infleqtion And Silicon Light Machines Partner To Boost Quantum Computer Performance - Quantum Zeitgeist - October 13th, 2025 [October 13th, 2025]
- Rigetti, IonQ, and Other Quantum Stocks Might Be in a Bubble - Barron's - October 11th, 2025 [October 11th, 2025]
- From artificial atoms to quantum information machines: Inside the 2025 Nobel Prize in physics - The Conversation - October 11th, 2025 [October 11th, 2025]
- Quantum Brilliances Quoll Earns TIME Recognition as One of the Best Inventions of 2025 - The Quantum Insider - October 11th, 2025 [October 11th, 2025]
- Researchers Propose Realizing (mostly) Quantum-autonomous Gates on Three Platforms, Reducing Reliance on Time-dependent Control - Quantum Zeitgeist - October 11th, 2025 [October 11th, 2025]
- The Next Big Theme: Positioning For Early Growth In Quantum Computing - Seeking Alpha - October 11th, 2025 [October 11th, 2025]
- If You Own Quantum Computing Stocks IonQ, Rigetti, or D-Wave, the Time to Be Fearful When Others Are Greedy Has Arrived - Nasdaq - October 11th, 2025 [October 11th, 2025]
- Quantum LDPC Codes Achieve Single-Shot Universality Via Code-Switching for Fault-Tolerant Computation - Quantum Zeitgeist - October 11th, 2025 [October 11th, 2025]
- Quantum Advantage from Sampling Shallow Circuits Achieves Distance of from Classical Simulations - Quantum Zeitgeist - October 11th, 2025 [October 11th, 2025]
- Quantum breakthrough in digital security: How Indian researchers achieved this, significance - The Indian Express - October 11th, 2025 [October 11th, 2025]
- Quantum memory may be closer to reality thanks to this new router - Earth.com - October 11th, 2025 [October 11th, 2025]
- IQC faculty secure more than $1 million in federal funding - University of Waterloo - October 11th, 2025 [October 11th, 2025]
- Infleqtion and Silicon Light Machines Partner to Boost Quantum Computer Performance - Yahoo Finance - October 11th, 2025 [October 11th, 2025]
- Infleqtion and Silicon Light Machines Partner to Boost Quantum Computer Performance - The Quantum Insider - October 11th, 2025 [October 11th, 2025]
- Quantum Computer Security: Protecting Systems from Attacks in the Age of Cloud-Based Processors - Quantum Zeitgeist - October 11th, 2025 [October 11th, 2025]
- Michel Devoret, 2025 Physics Nobel laureate: 'I thought it was a prank. The quantum computer is not here yet' - Le Monde.fr - October 11th, 2025 [October 11th, 2025]
- Fields medalist: As of today we have no quantum computer. It does not exist. - Network World - October 9th, 2025 [October 9th, 2025]
- 3 Quantum Computing Stocks That Could Make a Millionaire - Yahoo Finance - October 9th, 2025 [October 9th, 2025]
- Discoveries behind quantum computers win the Nobel Prize in physics - Science News Explores - October 9th, 2025 [October 9th, 2025]
- Discoveries that enabled quantum computers win the Nobel Prize in physics - Science News - October 9th, 2025 [October 9th, 2025]
- Library exhibit marks 100 years since quantum theory revolution - northernstar.info - October 9th, 2025 [October 9th, 2025]
- Harvard team builds quantum computer that runs continuously for over two hours - Digital Watch Observatory - October 9th, 2025 [October 9th, 2025]
- Trio win Nobel prize for revealing quantum physics in action - Reuters - October 9th, 2025 [October 9th, 2025]
- Advances in quantum error correction showcased at Q2B25 - Physics World - October 9th, 2025 [October 9th, 2025]
- Nobel Prize in physics awarded to 3 University of California faculty - University of California - October 9th, 2025 [October 9th, 2025]
- Nobel Prize in Physics goes to early research that led to todays quantum computers - The Verge - October 9th, 2025 [October 9th, 2025]
- Nobel in physics awarded to scientists showing quantum mechanics on macro scale - The Washington Post - October 9th, 2025 [October 9th, 2025]
- 3 scientists at US universities win Nobel Prize in physics for advancing quantum technology - ABC7 Los Angeles - October 9th, 2025 [October 9th, 2025]
- Nobel Prize in physics goes to three scientists who discovered bizarre quantum effect on large scales - Live Science - October 9th, 2025 [October 9th, 2025]
- Trio who made foundational quantum computing discovery bag Nobel physics prize - theregister.com - October 9th, 2025 [October 9th, 2025]
- Clarke, Devoret, and Martinis Awarded Nobel Prize in Physics for Macroscopic Quantum Discoveries - Quantum Computing Report - October 9th, 2025 [October 9th, 2025]
- Macroscopic quantum tunneling wins 2025s Nobel Prize in physics - Big Think - October 9th, 2025 [October 9th, 2025]
- The time to invest in quantum is now - PwC - October 7th, 2025 [October 7th, 2025]
- Nokia bets on sovereign quantum-safe connectivity - Light Reading - October 7th, 2025 [October 7th, 2025]
- ChattState and UTC Partner With Chattanooga Quantum Collaborative on $1.33M NSF Grant to Protect the Nations Power Grid + Build Quantum Workforce... - October 7th, 2025 [October 7th, 2025]
- Rigetti Computing: I Caught The Falling Knife, And My Hand Never Felt Better! (RGTI) - Seeking Alpha - October 7th, 2025 [October 7th, 2025]
- Quantum Computing Inc. Announces $750 Million Oversubscribed Private Placement of Common Stock Priced at the Market Under Nasdaq Rules - The Quantum... - October 7th, 2025 [October 7th, 2025]
- Investing in Quantum Computing: How IONQ, QUBT, RGTI & QBTS Stocks Are Revolutionizing Technology and Climate Solutions - CarbonCredits.com - October 7th, 2025 [October 7th, 2025]
- Quantum City to Host Annual Summit to Tackle Tech Adoption in a Changing World - The Quantum Insider - October 7th, 2025 [October 7th, 2025]
- D-Wave Quantum (QBTS) Soars to New High on Real-World Quantum Computer Significance - MSN - October 7th, 2025 [October 7th, 2025]
- Rigettis $13 Billion Quantum Leap Stock Hits Record High on Big Deals, But Is the Hype Real? - ts2.tech - October 7th, 2025 [October 7th, 2025]
- Invest in quantum adoption now to be a winner in the quantum revolution - Data Center Dynamics - October 7th, 2025 [October 7th, 2025]
- Quantum Stocks Are Surging: Time to Load Up on D-Wave, or Is IonQ the Safer Bet? - 24/7 Wall St. - October 7th, 2025 [October 7th, 2025]
- Quantum Leap or Speculative Bubble? Wall Street Bets Big on the Future of Computing - FinancialContent - October 7th, 2025 [October 7th, 2025]