Quantum deep reinforcement learning for clinical decision support in oncology: application to adaptive radiotherapy | Scientific Reports – Nature.com
Quantum deep reinforcement learning
Quantum deep reinforcement learning is a novel action value-based decision-making framework derived from QRL23 and deep q-learning10 framework. Like conventional RL9,31, our qDRL based CDSS framework is comprised of 5 main elements: clinical AI agent, ARTE, radiation dose decision-making policy, reward, and q-value function. Here, the AI agent is a clinical decision-maker that learns to make dose decisions for achieving clinically desirable outcomes within the ARTE. The learning takes place by the agent-environment interaction, which can be sequentially ordered as: the AI decides on a dose and executes it, and in response, a patient (part of the ARTE) transits from one state to the next. Each transition provides the AI with feedback for its decision in terms of RT outcome and associated reward value. The goal of RL is for the AI to learn a decision-making policy that maximizes the reward in the long run, defined in terms of a specified q-value function that assigns a value to every state-dose-decision pair obtained from the accumulation of rewards over time (returns).
Assuming Markovs property (i.e., an environments response at time (t + 1) depends only on the state and dose-decision at time (t)), the qDRL task can be mathematically described as a 5-tuple ((S, left| D rightrangle , TF, P, R)), where (S) is a finite set of patients states, (left| D rightrangle) is a superimposed quantum state representing the finite set of eigen-dose decision, (TF:S times D to S^{prime }) is the transition function that maps patients state (s_{t}) and eigen-dose (left| d rightrangle_{t}) to the next state (s_{t + 1}), (P_{LC|RP2} :S^{prime } to left[ {0,1} right]) is the RT outcome estimator that assigns probability values (p_{LC}) and (p_{RP2}) to the state (s_{t + 1}), and (R:left[ {0,1} right] times left[ {0,1} right] to {mathbb{R}}) is the reward function that assigns a reward (r_{t + 1}) to the state-decision pair (left( {s_{t} ,left| d rightrangle_{t} } right)) based on the outcome probability estimates.
Eigen-dose (left| d rightrangle) is a physically performable decision that is selected via quantum methods from the superimposed quantum state (left| D rightrangle) which simultaneously represents all possible eigen-doses at once. In simple words, (left| D rightrangle) is the collection of all possible dose options and (left| d rightrangle) is one of those options which is selected after a decision is made. Selecting dose decision (left| d rightrangle) is carried out in two steps: (1) amplifying the optimal eigen-dose (left| d rightrangle^{*}) from the superimposed state (left| D rightrangle) (i.e., (left| D rightrangle^{prime } = widehat{Amp}_{{left| d rightrangle^{*} }} left| D rightrangle)) and (2) measuring the amplified state (i.e., (left| d rightrangle = widehat{Measure}(left| {D^{prime } } rightrangle )).
The optimal eigen-dose (left| d rightrangle^{*}) is obtained from deep Q-net, which is the AIs memory. Deep Q-net, (DQN:S to {mathbb{R}}^{d}), is a neural network that takes patients state as input and then outputs q-value for each eigen-dose ((left{ {q_{left| d rightrangle } } right})). The optimal dose is then selected following greedy policy where the dose with the maximum q-value is selected (i.e., (left| d rightrangle^{*} = begin{array}{*{20}c} {argmax} \ {left| {d^{prime } } rightrangle } \ end{array} { q_{left| d rightrangle } })). We have applied a double Q-learning 32 algorithm in training the deep Q-net. The schematic of a training cycle is presented in Fig.2 and additional technical details are presented in the Supplementary Material.
We initially employed Grovers amplification procedure33,34 for the decision selection mechanism. While Grovers procedure works on a quantum simulator, it fails to correctly work in a quantum computer. The quantum circuit depth of Grovers procedure (for 4 or higher qubits) is much greater than the coherence length of the current quantum processor35. Whenever the quantum circuit length exceeds the coherence length, quantum state becomes significantly affected by the system noise and loses vital information. Therefore, we designed a quantum controller circuit that is shorter than the coherence length and is suitable for the task of decision selection. The merit of our design is its fixed length; since its length is fixed for any number of qubits, it is suitable for higher qubit systems, as much as permitted by the circuit width. Technical details regarding its implementation in quantum processor is presented in the Supplementary Materials.
An example of a controller circuit is given in Fig.5. Controller circuits use twice the number of qubits (n), which can be divided into control and main. Optimal eigen-states obtained from deep Q-net are created in the control by selecting the appropriate pre-control gates. Then the control is entangled with the qubits from the main via controlled NOT (CNOT) gates. CNOT gates are connected between a control qubit from the control and a target qubit from the main. CNOT gates flip the target qubit from (left| 1 rightrangle) to (left| 0 rightrangle) only when the control is in (left| 1 rightrangle) state and does not perform any operation otherwise. Because all the main qubits are prepared in (left| 0 rightrangle) state, we introduced the reverse gates (n X-gates in parallel) to flip them to (left| 1 rightrangle). X-gates flip (left| 0 rightrangle) to (left| 1 rightrangle), and vice-versa. The CNOT flips all the qubits whose controls are in (left| 1 rightrangle) state, creating a state that is element-wise opposite to the marked state. Finally, another set of reverse gates is applied to the main before making a measurement.
Quantum controller circuit for a 5 qubit (32 bit) system. (a) Quantum controller circuit for the selection of the state (left| {10101} rightrangle). The probability distribution corresponding to (b) failed Grovers amplification procedure for one iteration run in the 5-qubit IBMQ Santiago quantum processor and (c) successful quantum controller selection run in the 15-qubit IBMQ Melbourne quantum processor.
Another advantage of the controller circuit is controlled uncertainty level. The controller circuit has additional degrees of freedom that can control the level of uncertainty that might be needed to model a highly dubious clinical situation. By replacing the CNOT gate by a more general (CU3left( {theta ,phi ,lambda } right)) gate, we can control the level of additional stochasticity with the rotation angles (theta), (phi), and (lambda), which corresponds to the angles in the Bloch sphere. The angles can either be fixed or, for additional control, changed with training episode.
The patients state in the ARTE is defined by 5 biological features: cytokine (IP10), PET imaging feature (GLSZM-ZSV), radiation doses (Tumor gEUD and lung gEUD), and genetics (cxcr1- Rs2234671). Their descriptions are presented in Table 2. These 5 variables were selected from a multi-objective Bayesian Network study13, which considered over 297 various biological features and found the best features for predicting the joint LC and RP2 RT outcomes.
The training data analyzed in this study are obtained from the University of Michigan study UMCC 2007.123 (NCI clinical trial NCT01190527) and the validation data analyzed in this study are obtained from the RTOG-0617 study (NCI clinical trial NCT00533949). Both trials were conducted in accordance with relevant guidelines and regulations and informed consent was obtained from all subjects and/or legal guardians. Details on training and validation datasets, and necessary model imputation carried out to accommodate the differences in the datasets are presented in the Supplementary Materials.
Deep Neural Networks (DNN) were applied as transition functions for IP10 and GLSZM-ZSV features. They were trained with a longitudinal (time-series) dataset, with the pre-irradiation patient state and corresponding radiation dose as input features and post-irradiation state as output. For lung and tumor gEUD, we utilized prior knowledge and applied a monotonic relationship for the transition function since we know that gEUD should increase with increasing radiation dose. We assumed that the change in gEUD is proportional to the dose fractionation and tissue radiosensitivity,
$$frac{{gleft( {t_{n} } right) - gleft( {t_{n - 1} } right)}}{{t_{n} - t_{n - 1} }} propto d_{n} left( {1 + frac{{d_{n} }}{{frac{alpha }{beta }}}} right).$$
(1)
Here (gleft( {t_{n} } right)) is the gEUD at time point (t_{n}), (d_{n}) is the radiation dose fractionation given during the nth time period, and (alpha /beta) ratio is the radiosensitivity parameter which differs between tissue type. Note that we first applied constrained training42 to maintain monotonicity with DNN model, however the gEUD over time trend was flatter than anticipated, thus we opted for a process-driven approach in the final implementation. The technical details on the NNs and its training are presented in the Supplementary Material.
DNN classifiers were applied as the RT outcome estimator for LC and RP2 treatment outcomes. They were trained with post irradiation patient states as input and binary LC and RP2 outcomes as its labels.
RT outcome estimator must also satisfy a monotone condition between increasing radiation dose and increasing probability of local control as well as probability of radiation induced pneumonitis. To maintain this monotonic relationship, we used a generic logistic function,
$$p_{LC|RP2} = frac{1}{{1 + exp left( {frac{{gleft( {t_{6} } right) - mu }}{T}} right)}},$$
(2)
where (gleft( {t_{6} } right)) is the gEUD at week 6, and (mu) and (T) are two patient-specific parameters that are learned from training the DNN. Here, (mu) and (T) are the outputs of two neural networks that are fed into the logistic function and tuned one after the other, leaving the other fixed. The training details are presented in the Supplementary Materials.
The task of the agent is to determine the optimal dose that maximizes (p_{LC}) while minimizing (p_{RP2}). Accordingly, we built a reward function on the base function (P^{ + } = P_{LC} left( {1 - P_{RP2} } right)) as shown in Fig.6. The algebraic form is as follows,
$$R = left{ {begin{array}{*{20}l} {P^{ + } + 10 } hfill & { {text{if}} 70% < p_{Lc} < 100% ;{text{and}}; 0% < p_{RP2} < 17.2% } hfill \ {P^{ + } + 5} hfill & {{text{if}} 50% < p_{Lc} < 70% ;{text{and}}; 17.2% < p_{RP2} < 50% } hfill \ {P^{ + } - 1} hfill & {{text{if}} 0% < p_{Lc} < 50% ;{text{and}}; 50 < p_{RP2} < 100% } hfill \ end{array} } right.$$
(3)
Reward function for reinforcement learning. Contour plot of reward function as a function of the probability of local control (PLC) and radiation induced pneumonitis of grade 2 or higher (PRP2). Area enclosed by the blue line corresponds to the clinically desirable outcome, i.e., (P_{LC} > 70{%}) and ({P_{RP2}} <17.2{%}). Similarly, the area enclosed by the green lines corresponds to the computationally desirable outcome, i.e., (P_{LC} > 50{%}) and ({P_{RP2}} <50{%}). Along with (P_{LC} times (1-P_{RP2})) the AI agent receives+10 reward for achieving clinically desirable outcome,+5 for achieving computationally desirable outcome, and -1 when unable to achieve a desirable outcome.
Here the AI agent receives additional 10 points for achieving clinically desirable outcome (i.e., (p_{LC} > 70% quad {text{and}} quad p_{RP2} < 17.2%)), 5 points for achieving computationally desirable outcome (i.e., (p_{LC} > 50% quad {text{and}} quad p_{RP2} < 50%)), and -1 point for failing to achieve a desirable outcome altogether. The negative point motivates the AI agent to search for the optimal dose as soon as possible.
To compensate for low number of data points we employed WGAN-GP43, which learns the underlying data distribution and generates more data points. We generated 4000 additional data points for training qDRL models. Having a larger training dataset helps the reinforcement learning algorithm in accurately representing the state space. The training details are presented in the Supplementary Material.
See the rest here:
Quantum deep reinforcement learning for clinical decision support in oncology: application to adaptive radiotherapy | Scientific Reports - Nature.com
- Quantum Motion Delivers the Industrys First Full-Stack Silicon CMOS Quantum Computer - Yahoo Finance - September 15th, 2025 [September 15th, 2025]
- The quantum threat timeline is shorter than you think - Fast Company - September 15th, 2025 [September 15th, 2025]
- Quantum Motion Delivers the Industrys First Full-Stack Silicon CMOS Quantum Computer - The Quantum Insider - September 15th, 2025 [September 15th, 2025]
- Is Honeywells $600 Million Quantum Bet Rewriting the Investment Narrative for HON? - simplywall.st - September 15th, 2025 [September 15th, 2025]
- Quantum breakthroughs could threaten Bitcoin in the 2030s - Digital Watch Observatory - September 15th, 2025 [September 15th, 2025]
- In the Race for Quantum Advantage, Old-Timer IBM Is Leading the Way - The Wall Street Journal - September 15th, 2025 [September 15th, 2025]
- Quantum Motion Delivers the Industrys First Full-Stack Silicon CMOS Quantum Computer - Enidnews.com - September 15th, 2025 [September 15th, 2025]
- Breaking Down the Quantum W State: New Insights from Recent Measurements - BIOENGINEER.ORG - September 15th, 2025 [September 15th, 2025]
- Affine Automata Achieve Real-Time Verification of Non-Regular Languages with Tunable Bounded Error - Quantum Zeitgeist - September 15th, 2025 [September 15th, 2025]
- Sm Nucleus Exhibits SU(3) Rigid Triaxiality, Validating Theory with Experimental Energy Spectra and B(E2) Values - Quantum Zeitgeist - September 15th, 2025 [September 15th, 2025]
- Quantum Processors Achieve Global Control With ZZ Interactions - Quantum Zeitgeist - September 15th, 2025 [September 15th, 2025]
- The AI Bubble Is About To Burst, But The Next Bubble Is Already Growing - Medium - September 15th, 2025 [September 15th, 2025]
- Confined Few-Particle Systems Beyond Mean-Field Theory Adopt Gaussian-Type Orbitals and Morse Interactions - Quantum Zeitgeist - September 15th, 2025 [September 15th, 2025]
- Critical 2030 Deadline: Arqit Tapped by UK Government to Shield National Infrastructure from Quantum Threats - Stock Titan - September 15th, 2025 [September 15th, 2025]
- Good Old IBM Is Leading the Way in the Race for Quantum Advantage - The Wall Street Journal - September 13th, 2025 [September 13th, 2025]
- Canada had and lost its lead in AI. Can it avoid making the same mistake in the next emerging global technology race? - The Globe and Mail - September 13th, 2025 [September 13th, 2025]
- Meet the Once-in-a-Generation Stock That Could Dominate Quantum Computing - Yahoo Finance - September 13th, 2025 [September 13th, 2025]
- Guest Post Ethics at the Edge: Trust and Agency in the Quantum Era - The Quantum Insider - September 13th, 2025 [September 13th, 2025]
- IonQ Skyrocketed Today -- Is the Quantum Computing Stock a Buy Right Now? - Yahoo Finance - September 13th, 2025 [September 13th, 2025]
- This Quantum Computing Stock Could Be the Secret AI Winner by 2035 - Yahoo Finance - September 13th, 2025 [September 13th, 2025]
- Ueno Bank Brings Its 2.2 Million Customers Quantum-Resistant Banking with SignQuantum and QANplatform - The Quantum Insider - September 13th, 2025 [September 13th, 2025]
- What will the Quantum-Safe 360 Alliance mean for your business and its post-quantum security posture? - IT Pro - September 13th, 2025 [September 13th, 2025]
- Quantum Computing Stocks To Keep An Eye On - September 10th - MarketBeat - September 13th, 2025 [September 13th, 2025]
- This Artificial Intelligence (AI) Stock Could Be the Nvidia of Quantum Computing - The Motley Fool - September 13th, 2025 [September 13th, 2025]
- PsiQuantum Raises $1 Billion, Says Its Computer Will Be Ready in Two Years - The Wall Street Journal - September 11th, 2025 [September 11th, 2025]
- In Quantum Sensing, What Beats Beating Noise? Meeting Noise Halfway. | NIST - National Institute of Standards and Technology (.gov) - September 11th, 2025 [September 11th, 2025]
- IonQ Announces IonQ Federal to Meet the Increasing Demand for Quantum Advantage Across the U.S. and Allied Governments - IonQ - September 11th, 2025 [September 11th, 2025]
- Google Quantum AI has been selected for the DARPA Quantum Benchmarking Initiative. - The Keyword - September 11th, 2025 [September 11th, 2025]
- Horizon Quantum to Go Public in the U.S. Through Definitive Business Combination Agreement with dMY Squared Technology Group - The Quantum Insider - September 11th, 2025 [September 11th, 2025]
- PsiQuantum valued at $7 billion in latest funding round, teams up with Nvidia - Reuters - September 11th, 2025 [September 11th, 2025]
- Exotic phase of matter realized on quantum processor - Phys.org - September 11th, 2025 [September 11th, 2025]
- This Artificial Intelligence (AI) Stock Has a First-Mover Advantage in Quantum Integration - The Motley Fool - September 11th, 2025 [September 11th, 2025]
- Quantum computers the key to elusive Theory of Everything - Asia Times - September 11th, 2025 [September 11th, 2025]
- Infleqtion, quantum startup with ties to Chicago, announces plan to go public - The Business Journals - September 11th, 2025 [September 11th, 2025]
- PsiQuantum Raises $1bn, Partners with Nvidia in Bid to Build First Million-Qubit Quantum Computer - Tekedia - September 11th, 2025 [September 11th, 2025]
- Quantum Leaders: Quantum is Moving From Lab to The Marketplace - The Quantum Insider - September 9th, 2025 [September 9th, 2025]
- Neuromorphic computing and the future of edge AI - cio.com - September 9th, 2025 [September 9th, 2025]
- Analog vs. Digital: The Race Is On To Simulate Our Quantum Universe - Quanta Magazine - September 6th, 2025 [September 6th, 2025]
- 5 Best Quantum Computing Stocks to Buy in September - Yahoo Finance - September 6th, 2025 [September 6th, 2025]
- The year of quantum science: Promise and peril in the race for breakthroughs - EL PAS English - September 6th, 2025 [September 6th, 2025]
- Rigetti Computing Just Announced a New Quantum Deal. Should You Buy RGTI Stock Here? - Yahoo Finance - September 6th, 2025 [September 6th, 2025]
- University of Chicago and Partners Receive $4 Million NSF Grant for Quantum Supercomputer Initiative - Quantum Computing Report - September 6th, 2025 [September 6th, 2025]
- Dealmakers Bet on Quantum Computing Coming Sooner Than You Think - Bloomberg.com - September 6th, 2025 [September 6th, 2025]
- Bitcoin : The quantum menace is real - Cointribune - September 6th, 2025 [September 6th, 2025]
- Nvidia Invests in Honeywells Quantinuum. What It Means for D-Wave, IonQ, and Quantum Stocks. - Barron's - September 6th, 2025 [September 6th, 2025]
- UK-based Quantinuum closes $600M at $10B valuation to build next-gen quantum computer - Tech Funding News - September 6th, 2025 [September 6th, 2025]
- New Mexico at the Quantum Frontier: state and DARPA forge bold partnership - governor.state.nm.us - September 5th, 2025 [September 5th, 2025]
- 3D printing could improve the future of large scale quantum computers - Open Access Government - September 5th, 2025 [September 5th, 2025]
- Could a Quantum Computer Break Bitcoin? The SEC has Now Taken the Threat Seriously - CryptoRank - September 5th, 2025 [September 5th, 2025]
- Whats in a name: How two rectangles and a wave function shaped Equal1s brand story - Silicon Canals - September 5th, 2025 [September 5th, 2025]
- Quantum breakthroughs lead to surge in corporate funding for the sector - - Global Venturing - September 5th, 2025 [September 5th, 2025]
- IonQ Advance in Synthetic Diamond Materials Accelerates Quantum Networking Scale and Production - The Quantum Insider - September 5th, 2025 [September 5th, 2025]
- Quantinuum receives $10bn valuation following close of $600m funding round - Data Center Dynamics - September 5th, 2025 [September 5th, 2025]
- IQM Quantum Computers: Over $300 Million Series B Raised To Expand Globally - Pulse 2.0 - September 5th, 2025 [September 5th, 2025]
- Quantinuum valued at $10 billion after $600 million venture round - Constellation Research - September 5th, 2025 [September 5th, 2025]
- Quantum computing startup IQM raises $320 million as investors pile into the tech - MSN - September 5th, 2025 [September 5th, 2025]
- Researchers Expand Quantum Subspace with Q-SENSE, Reducing Circuit Depth for Near-term Devices - Quantum Zeitgeist - September 3rd, 2025 [September 3rd, 2025]
- Meet Quantum Computing's Potential Monster Stocks of the Next Decade - Nasdaq - September 3rd, 2025 [September 3rd, 2025]
- Quantum Circuits Integrates With NVIDIA CUDA-Q to Advance Creation And Testing of First Quantum Applications Based on Dual-Rail Qubits - The Quantum... - September 3rd, 2025 [September 3rd, 2025]
- IBM and AMD Join Forces on Quantum-Centric Supercomputing Initiative - The Futurum Group - September 3rd, 2025 [September 3rd, 2025]
- IQM raises $300m in largest quantum-focused Series B outside of the US - BeBeez International - September 3rd, 2025 [September 3rd, 2025]
- Quantum Tech Leader QCI to Showcase Integrated Photonics Innovation at Lake Street Growth Conference - Stock Titan - September 3rd, 2025 [September 3rd, 2025]
- Exclusive: the father of quantum computing believes AGI will be a person, not a program - Digital Trends - September 3rd, 2025 [September 3rd, 2025]
- An Exploration Of The Noise Sensitivity Of Shors Algorithm - Quantum Zeitgeist - September 3rd, 2025 [September 3rd, 2025]
- Quantum Computing's Next Frontier: How New Mexico's Strategic $315M Push Is Building the Silicon Valley of Tomorrow - AInvest - September 3rd, 2025 [September 3rd, 2025]
- Rigetti and Indias Centre for Development of Advanced Computing Announce MOU to Explore Co-Development of Hybrid Quantum Computing Systems - The... - September 3rd, 2025 [September 3rd, 2025]
- Quantum Circuits Integrates With NVIDIA CUDA-Q to Advance Creation And Testing of First Quantum Applications Based on Dual-Rail Qubits - PR Newswire - September 3rd, 2025 [September 3rd, 2025]
- From Hype to Hardware: What Investors Need to Know About Quantum Computing - Tokenist - September 3rd, 2025 [September 3rd, 2025]
- Prediction: IonQ Stock Will Soar Over the Next 5 Years. Here's 1 Reason Why - The Motley Fool - September 1st, 2025 [September 1st, 2025]
- Accelerating the Quantum Toolkit for Python (QuTiP) with cuQuantum on AWS - Amazon Web Services - September 1st, 2025 [September 1st, 2025]
- Heavy Electrons Hold the Key to a New Type of Quantum Computer - SciTechDaily - September 1st, 2025 [September 1st, 2025]
- D-Wave (QBTS) crowned quantum pioneer, but can it ever make money? - Investorsobserver - September 1st, 2025 [September 1st, 2025]
- What Q Day means for your business and how to prepare - TechRadar - September 1st, 2025 [September 1st, 2025]
- IBM And AMD Collaborate On New Hybrid Quantum-Supercomputing Architectures - Forbes - September 1st, 2025 [September 1st, 2025]
- Quantum Risk Mitigation as a Strategic Edge in Institutional Bitcoin Holdings - AInvest - September 1st, 2025 [September 1st, 2025]
- Researchers Achieve Universal Approximation in Variational Machine Learning with Bit-bit Encoding Schemes - Quantum Zeitgeist - September 1st, 2025 [September 1st, 2025]
- AMD Is Making a Big Bet on Quantum Computing. Should You Buy AMD Stock Here? - Yahoo Finance - August 29th, 2025 [August 29th, 2025]
- Quantum Computing News: IBM and AMD Partner on Hybrid Systems, Q-CTRL Wins DARPA Contracts, Quantinuum Valued at $10 Billion - TipRanks - August 29th, 2025 [August 29th, 2025]
- How to build larger, more reliable quantum computers - University of California - August 29th, 2025 [August 29th, 2025]
- Quantinuum CEO Weighs In on Quantum Programming and Whats Next for the Industry - Barron's - August 29th, 2025 [August 29th, 2025]