Archive for the ‘Quantum Computer’ Category

MIT’s Superconducting Qubit Breakthrough Boosts Quantum Performance – Tom’s Hardware

Science (like us) isn't always sure of where the best possible future is, and computing is no exception. Whether in classic semiconductor systems or in the forward-looking reality of quantum computing, there are sometimes multiple paths forward (and here's our primer on quantum computing if you want a refresher). Transmon superconducting qubits (such as the ones used by IBM, Google, and Alice&Bob) have gained traction as one of the most promising qubit types. But new MIT research could open up a door towards another type of superconducting qubits that are more stable and could offer more complex computation circuits: fluxonium qubits.

Qubits are the quantum computing equivalent to transistors - get increasing numbers of them together, and you get increased computing performance (in theory). But while transistors are deterministic and can only represent a binary system (think of the result being either side of a coin, mapped to either 0 or 1), qubits are probabilistic and can represent the different positions of the coin while it's spinning in the air. This allows you to explore a bigger space of possible solutions than what can easily be represented through binary languages (which is why quantum computing can offer much faster processing of certain problems).

One current limitation to quantum computing is the accuracy of the computed results - if you're looking for, say, new healthcare drug designs, it'd be an understatement to say you need the results to be correct, replicable, and demonstrable. But qubits are sensitive and finicky to external stressors such as temperature, magnetism, vibrations, fundamental particle collisions, and other elements, which can introduce errors into the computation or collapse entangled states entirely. The reality of qubits being much more prone to external interference than transistors is one of the roadblocks on the road to quantum advantage; so a solution lies in being able to improve the accuracy of the computed results.

It's also not just a matter of applying error-correcting code to low-accuracy results and magically turning them into the correct results we want. IBM's recent breakthrough in this area (applying to transmon qubits) showed the effects of an error-correction code that predicted the environmental interference within a qubit system. Being able to predict interference means you can account for its effects within the skewed results and can compensate for them accordingly - arriving at the desired ground truth.

But in order for it to be possible to apply error-correction codes, the system has to already have passed a "fidelity threshold" - a minimum operating-level accuracy that enables those error-correcting codes to be just enough for us to be able to extract predictably useful, accurate results from our quantum computer.

Some qubit architectures - such as fluxonium qubits, the qubit architecture the research is based on - possess higher base stability against external interference. This enables them to stay coherent for longer periods of time - a measure of how long the qubit system can be effectively used between shut-downs and total information loss. Researchers are interested in fluxonium qubits because they've already unlocked coherence times of more than a millisecond - around ten times longer than can be achieved with transmon superconducting qubits.

The novel qubit architecture enables operations to be performed between fluxonium qubits with important accuracy levels. Within it, the research team enabled fluxonium-based two-qubit gates to run at 99.9% accuracy and single-qubit gates to run at a record 99.99% accuracy. The architecture and design were published under the title "High-Fidelity, Frequency-Flexible Two-Qubit Fluxonium Gates with a Transmon Coupler" in PHYSICAL REVIEW X.

You could think about fluxonium qubits as being an alternative qubit architecture with its own strengths and weaknesses; not as an evolution of the quantum computing that has come before. Transmon qubits are made of a single Josephson junction shunted by a large capacitor, while fluxonium qubits are made of a small Josephson junction in series with an array of larger junctions or a high kinetic inductance material. It's partly for this that fluxonium qubits are harder to scale: they require more sophisticated coupling schemes between qubits, sometimes even using transmon qubits for this purpose. The fluxonium architecture design described in the paper does just that in what's called a Fluxonium-Transmon-Fluxonium (FTF) architecture.

Transmon qubits such as the ones used by IBM and Google are relatively easier to manipulate into bigger qubits arrays (IBM's Osprey is already at 433 qubits) and have faster operation times, performing fast and simple gate operations mediated by microwave pulses. Fluxonium qubits do offer the possibility of performing slower yet more accurate gate operations through shaped pulses than a transmon-only approach would enable.

There's no promise of an easy road to quantum advantage through any qubit architecture; that's the reason why so many companies are pursuing their differing approaches. In this scenario, it may be useful to think about this Noisy-Intermediate Scale Quantum (NISQ) era being the age where multiple quantum architectures flourish. From topological superconductors (as per Microsoft) through diamond vacancies, transmon superconduction (IBM, Google, others), ion traps, and a myriad of other approaches, this is the age where we will settle into certain patterns within quantum computing. All architectures may flourish, but it's perhaps most likely that only some will - which also justifies why states and corporations aren't pursuing a single qubit architecture as their main focus.

The numerous, apparently viable approaches to quantum computing we're witnessing put us right in the middle of the branching path before x86 gained dominance as the premier architecture for binary computing. It remains to be seen whether the quantum computing future will readily (and peacefully) agree on a particular technology, and how will a heterogeneous quantum future look like.

See the original post:
MIT's Superconducting Qubit Breakthrough Boosts Quantum Performance - Tom's Hardware

Quantum computing enters the fluxonium era: Breakthrough sends … – Study Finds

CAMBRIDGE, Mass. Researchers at MIT have achieved a significant breakthrough in quantum computing, bringing the potential of these incredible thinking machines closer to realization. Quantum computers promise to handle calculations far too complex for current supercomputers, but many hurdles remain. A primary challenge is addressing computational errors faster than they arise.

In a nutshell, quantum computersfind better and quicker ways to solve problems. Scientists believe quantum technology could solve extremely complex problems in seconds, while traditional supercomputers you see today could need months or even years to crack certain codes.

What makes these next generation supercomputers different from your everyday smartphone and laptop is how they process data. Quantum computers harness the properties of quantum physics to store data and perform their functions. While traditional computers use bits (either a 1 or a 0) to encode information on your devices, quantum technology uses qubits.

These qubits can be in a state of 1, 0, or both at once, enabling more complex computations. However, they are highly susceptible to errors.

To reduce these errors, the MIT team developed a new type of superconducting qubit named fluxonium, which has a longer lifespan than the traditional kind. The team crafted a unique architecture involving these fluxonium qubits that can perform operations (known as gates) more accurately. Their design enabled two-qubit gates that exceeded 99.9 percent accuracy and single-qubit gates with 99.99 percent accuracy.

Building a large-scale quantum computer starts with robust qubits and gates. We showed a highly promising two-qubit system and laid out its many advantages for scaling. Our next step is to increase the number of qubits, says study lead author Dr. Leon Ding PhD 23, who was a physics graduate student in the Engineering Quantum Systems (EQuS) group, in a university release.

To give a comparison, in classical computing, a gate would be an operation performed on bits. In quantum computing, a gate would be a logical operation on one or two qubits. Achieving higher accuracy in these operations is essential as errors in quantum systems can multiply quickly, leading to system failures.

For years, the primary focus in quantum research was on a type of qubit known as transmon. The newer fluxonium qubits boast a longer working lifespan, which means they can run algorithms for extended periods without losing data. This longer lifespan has led to the MIT teams development of high-accuracy gates.

Dr. Ding explained that their novel architecture connects two fluxonium qubits using a system that prevents unwanted background noise, which can introduce errors. This system has shown promise in keeping background interactions to a minimum.

The longer a qubit lives, the higher fidelity the operations it tends to promote, says Dr. Ding. These two numbers are tied together. But it has been unclear, even when fluxonium qubits themselves perform quite well, if you can perform good gates on them.

Drawing an analogy, senior researcher William Oliver, likened working with low-quality qubits to trying to perform a task with a room full of kindergartners.

Thats a lot of chaos, and adding more kindergartners wont make it better, notes Oliver. However, several mature graduate students working together leads to performance that exceeds any one of the individuals thats the threshold concept. While there is still much to do to build an extensible quantum computer, it starts with having high-quality quantum operations that are well above threshold.

Following these positive results, a group from MIT has founded Atlantic Quantum, a startup aiming to use fluxonium qubits to construct a practical quantum computer for commercial use.

These results are immediately applicable and could change the state of the entire field, says Dr. Bharath Kannan, CEO of Atlantic Quantum. This shows the community that there is an alternate path forward. We strongly believe that this architecture, or something like this using fluxonium qubits, shows great promise in terms of actually building a useful, fault-tolerant quantum computer.

Experts in the field, such as Chunqing Deng from Alibabas global research institution, have hailed the MIT teams work as a pivotal milestone.

This work pioneers a new architecture for coupling two fluxonium qubits. The achieved gate fidelities are not only the best on record for fluxonium, but also on par with those of transmons, the currently dominating qubit. More importantly, the architecture also offers a high degree of flexibility in parameter selection, a feature essential for scaling up to a multi-qubit fluxonium processor, says Deng.

For those of us who believe that fluxonium is a fundamentally better qubit than transmon, this work is an exciting and affirming milestone. It will galvanize not just the development of fluxonium processors but also more generally that for qubits alternative to transmons.

The study is published in the journal Physical Review X.

Read the original here:
Quantum computing enters the fluxonium era: Breakthrough sends ... - Study Finds

Putting AI on the Fast Track to Sure-Fire Success – InvestorPlace

Editors note: Putting AI on the Fast Track to Sure-Fire Success was previously published in September 2023. It has since been updated to include the most relevant information available.

Artificial intelligence is not just a buzzword it is a reality that will transform every aspect of our daily lives in the coming years. It will revitalize industries from healthcare to education, from entertainment to cybersecurity, and offer new possibilities currently unheard of.

One possibility comes from an area hardly anyone is talking about right now a top-secret technology that will fuel the AI Stock Boom.

Before I reveal that technology, you must first understand what makes AI models run. At their core, AI models are like cars. They have an engine the computer on top of which the models are run. And they have fuel the volume of data the model is trained on.

Obviously, the better the engine in a car and the more fuel it has, the better and farther that car will drive.

Its the same with AI.

The better the engine of an AI model (computing power) and the more fuel it has (data), the better that model will perform.

The top-secret tech Im referring to is all about radically upgrading the computing power AI models have.

And Bank of Americas head of global thematic investing Haim Israel has said this technology could create a revolution for humanity bigger than fire, bigger than the wheel.

Thats because this tech will essentially drive everything in the emerging Age of AI.

What on Earth am I talking about?

Two words: quantum computing.

Ill start by saying that the underlying physics of this breakthrough quantum mechanics is highly complex. It would likely require over 500 pages to fully understand.

But, alas, heres my best job at making a Cliffs Notes version in 500 words instead.

For centuries, scientists have developed, tested, and validated the laws of the physical world, known as classical mechanics. These scientifically explain how and why things work, where they come from, so on and so forth.

But in 1897, J.J. Thomson discovered the electron. And he unveiled a new, subatomic world of super-small things that didnt obey the laws of classical mechanics at all. Instead, they obeyed their own set of rules, which have since become known as quantum mechanics.

The rules of quantum mechanics differ from that of classical mechanics in two very weird, almost-magical ways.

First, in classical mechanics, objects are in one place at one time. You are either at the store or at home, not both.

But in quantum mechanics, subatomic particles can theoretically exist in multiple places at once before theyre observed. A single subatomic particle can exist in point A and point B at the same time until we observe it. And at that point, it only exists at either point A or point B.

So, the true location of a subatomic particle is some combination of all its possible positions.

This is called quantum superposition.

Second, in classical mechanics, objects can only work with things that are also real. You cant use an imaginary friend to help move the couch. You need a real friend instead.

But in quantum mechanics, all of those probabilistic states of subatomic particles are not independent. Theyre entangled. That is, if we know something about the probabilistic positioning of one subatomic particle, then we know something about the probabilistic positioning of another subatomic particle meaning that these already super-complex particles can actually work together to create a super-complex ecosystem.

This is called quantum entanglement.

So in short, subatomic particles can theoretically have multiple probabilistic states at once, and all those probabilistic states can work together again, all at once to accomplish their task.

And that, in a nutshell, is the scientific breakthrough that stumped Einstein back in the early 1900s.

It goes against everything classical mechanics had taught us about the world. It goes against common sense. But its true. Its real. And now, for the first time ever, we are learning how to harness this unique phenomenon to change everything about everything

This is why the U.S. government is pushing forward on developing a National Quantum Internet in southwest Chicago. It understands that this tech could be more revolutionary than the discovery of fire or the invention of the wheel.

I couldnt agree more.

Mark my words. Everything will change over the next few years because of quantum mechanics and some investors will make a lot of money.

The study of quantum theory has led to huge advancements over the past century. Thats especially true over the past decade. Scientists at leading tech companies have started to figure out how to harness the power of quantum mechanics to make a new generation of super quantum computers. And theyre infinitely faster and more powerful than even todays fastest supercomputers.

And in fact, Haim Israel, managing director of research at Bank of America, believes that: By the end of this decade, the amount of calculations that we can make [on a quantum computer] will be more than the atoms in the visible universe.

Again, the physics behind quantum computers is highly complex, but heres my shortened version

Todays computers are built on top of the laws of classical mechanics. That is, they store information on what are called bits, which can store data binarily as either 1 or 0.

But what if you could turn those classical bits into quantum bits qubits to leverage superposition to be both 1 and 0 stores at once?

Further, what if you could leverage entanglement and have all multi-state qubits work together to solve computationally taxing problems?

Theoretically, youd create a machine with so much computational power that it would make todays most advanced supercomputers seem ancient.

Thats exactly whats happening today.

Google has built a quantum computer that is about 158 million times faster than the worlds fastest supercomputer.

Thats not hyperbole. Thats a real number.

Imagine the possibilities if we could broadly create a new set of quantum computers that are 158 million times faster than even todays fastest computers

Imagine what AI could do.

Today, AI is already being used to discover and develop new drugs and automate manual labor tasks like cooking, cleaning, and packaging products. It is already being used to write legal briefs, craft ads, create movie scripts, and more.

And thats with AI built on top of classical computers.

But built upon quantum computers computer that are a 158 million times faster than classical computers AI will be able to do nearly everything.

The economic opportunities at the convergence of artificial intelligence and quantum computing are truly endless.

Quantum computing is a game-changer thats flying under the radar.

Its not just another breakthrough its the seismic shift weve been waiting for, rivaling the impact of the internet and the discovery of fire itself.

We think the top stocks at the convergence of AI and QC have a realistic opportunity to soar 1,000% over the next few years alone.

And at the epicenter of this boom is one stock that stands out from the pack.

It is the unrivaled technical and commercial leader in quantum computing. It could be a true millionaire-maker opportunity.

And today, that stock is trading for less than $15.

Learn all about this front-runner and its top-secret technology before the stock soars to $100-plus.

On the date of publication, Luke Lango did not have (either directly or indirectly) any positions in the securities mentioned in this article.

Excerpt from:
Putting AI on the Fast Track to Sure-Fire Success - InvestorPlace

Researchers Secure Prestigious Federal Grants | News | New York … – New York Institute of Technology

Pictured from left: Weikang Cai, Jerry Cheng, Sophia Domokos, Eve Armstrong, and Yusui Chen

In recent weeks, five research projects led by New York Tech faculty have collectively secured more than $1.6 million in federal funding from the National Science Foundation (NSF) and the National Institutes of Health (NIH).

The funding will support projects spanning physics, computer science, and biomedical science, captained by faculty from the College of Arts and Sciences, College of Osteopathic Medicine (NYITCOM), and College of Engineering and Computing Sciences. Findings from these studies could help to advance quantum computing, lead to new Alzheimers disease treatments, explain how heavy elements first formed, enable mobile devices to detect cardiovascular disease, and offer insight that could revolutionize magnetic resonance imaging (MRI) and magnetic levitation (maglev) technologies.

The research projects will also engage undergraduate, graduate, and medical students, providing excellent opportunities for them to gain a deeper understanding of the scientific process and mentorship from some of the universitys brightest minds.

A research team led by Assistant Professor of Physics Yusui Chen, Ph.D., has secured an NSF grant totaling $650,0001 for a three-year project that could enhance understanding of quantum physics within real environmentsa necessary step to advancing the field of quantum computing.

Many scientists and experts believe that quantum computing could provide the necessary insight to help solve some of societys biggest issues, including climate change and deadly diseases. However, much remains unknown about how these systems operate, and uncovering their full potential first requires an advanced understanding of the physics principles that provide their theoretical framework.

Quantum computers, which are made of information storage units called qubits, are inherently subject to environmental influences. Some multi-qubit systems are influenced by a memory of past interactions with the environment, thereby affecting their future behavior (non-Markovian systems). However, few mathematical tools exist to study these dynamics, and as systems grow larger and more complex, modeling them on classic, binary computers is unfeasible.

Chen and his research team, which includes undergraduate and graduate physics, computer science, and engineering students, as well as a researcher from Rutgers University, will establish a comprehensive method to investigate these dynamics while improving the accuracy of existing quantum simulation algorithms. Their insights could deepen understanding of the fundamental physics in which quantum computers operate.

The project also includes efforts to build a pipeline of diverse talent and researchers, a critical factor in helping to advance the field of quantum information science engineering (QISE). As such, Chen will mentor undergraduate New York Tech students, particularly female students and those from traditionally underrepresented backgrounds. He will also conduct outreach to K12 schools with the aim of introducing STEM concepts and sparking younger students interest in QISE.

A project led by Assistant Professor of Physics Eve Armstrong, Ph.D., has received a three-year NSF grant totaling $360,0002 in support of her continued research into one of sciences greatest mysteries: how the universe formed from stardust.

The research will build on Armstrongs earlier NSF-funded project, which received a two-year $299,998 NSF EAGER grant in 2021.

While the Big Bang created the first and lightest elements (hydrogen and helium), the next and heavier elements (up to iron on the periodic table) formed later inside ancient, massive stars. When these stars exploded, their matter catapulted into space, seeding that space with elements. Eventually, their stardust matter formed the sun and planets, and over billions of years, Earths matter coalesced into the first life forms. However, the origins of elements heavier than iron, like gold and copper, remain unknown. While they may have formed during a supernova explosion, current computational techniques render it difficult to comprehensively study the physics of these events. In addition, supernovae are rare, occurring about once every 50 years, and the only existing data is from the last explosion in 1987.

Armstrong posits that a weather prediction technique called data assimilation may enhance understanding of these events. The technique relies on limited information to sequentially estimate weather changes over time, which may make it conducive to modeling supernovae conditions. With simulated data, in preparation for the next supernova event, Armstrong and undergraduate New York Tech students will use data assimilation to predict whether the supernova environment could have given rise to some heavy elements. If successful, these forecasts may allow scientists to determine which elements formed from supernova stardust.

Since receiving her EAGER grant in 2021, Armstrong and her students have begun using the technique for the first time with real data from the suns neutrinos (tiny, nearly massless particles that travel at near-light speeds). This is an important test to assess the techniques performance with real data, which is significantly more challenging than simulation. Their most recent paper, published in the journal Physical Review D, is promising.

Armstrongs NSF-funded project will also support her broader impacts work on science communication. Since 2021, she has led workshops for young scientists at New York Tech and the American Museum of Natural History, where participants use techniques from standup comedy, storytelling, and improvisation to create original performances. In addition, for the first time, Armstrong is teaching a formal course on improvisation for New York Tech students this semester.

Assistant Professor of Biomedical Sciences Weikang Cai, Ph.D., has received a $306,000 NIH grant3 to lead a two-year research project that will investigate how certain molecules may play a role in the progression of Alzheimers disease.

Adenosine triphosphate (ATP) is a small molecule within cells that fuels nearly all biochemical and cellular processes in living organisms. Under specific scenarios, both neurons and non-neuronal cells in the brain can release ATP outside of cells. Consequently, ATP can serve as a signaling molecule to communicate with nearby brain cells and regulate their functions. In addition, growing evidence demonstrates that astrocytes, the most abundant non-neuronal cells in the brain, may contribute to the development of Alzheimers disease.

Using a mouse model, the researchers will assess how ATP released from astrocytes is regulated with Alzheimers disease and whether eliminating astrocyte-released ATP could alter disease progression. Their findings may lead to the development of new strategies to treat or alleviate Alzheimers disease and its related symptoms.

Other members of the research team include Biomedical Sciences Instructor Qian Huang, Ph.D., and Senior Research Associate Hiu Ham Lee, M.S., who initially spearheaded the project, as well as NYITCOM students Zoya Ramzan, Lucy Yu, David Shi, Alexandra Abrams, Sky Lee, and Yash Patel, and undergraduate Life Sciences students Addison Li and Priyal Gajera. In addition, several other NYITCOM students contributed to preliminary studies leading up to the current project, including Marisa Wong, Shan Jin, Min Seong Kim, and Matthew Jiang.

In 2021, Cai also received an NIH grant to research how chronic stress inhibits ATP release, thereby reducing dopamine activity and potentially contributing to clinical depression.

Assistant Professor of Computer Science Jerry Cheng, Ph.D., has received an NSF grant totaling $159,9794 for a three-year project to establish a data analytics and machine learning (artificial intelligence) framework that could allow at-home mobile devices like smartphones to detect biomarkers for early symptoms of cardiovascular disease.

Mobile devices usually have restrictions in memory, computing power, and battery capacity for complex computations. To address this, Cheng and his research team will develop software deep learning accelerators, which will allow mobile devices to perform AI modeling. They will also develop security measures to mitigate attacks on cloud systems (computationally efficient trusted execution environment), as well as time-dependent models to analyze sensing data, such as respiratory rate, blood pressure, heart rate, etc. Graduate and undergraduate students from the College of Engineering and Computing Sciences will be recruited to assist in the project, which will also focus on promoting female engineering student participation.

Cheng has secured multiple NSF awards since arriving at New York Tech in 2019. In 2021, he received funding for mobile edge research to help ensure that smart device computing advancements do not outpace experiments in the field; in 2020, he received an award to design more efficient and secure deep learning processing machines that can reliably process and interpret extremely large-scale sets of data with little delay.

Associate Professor of Physics Sophia Domokos, Ph.D., has secured an NSF grant totaling $135,0005 for a three-year research project to explore the inner workings of matter. Domokos seeks to uncover how tiny elementary particles (quarks and gluons) interact to create new orders, like clumping together to form protons and neutrons in an atoms nucleus.

While scientists have a relatively useful mathematical explanation regarding how these tiny elementary particles behave, these models do not account for particles interacting frequently and forcefully. To address this, Domokos and her research team will use holographic duality, a string theory concept, and a mathematical structure called supersymmetry to categorize and classify the clumps of elementary particles that emerge in strongly interacting systems.

The insights they gain could shed light on the inner workings of protons and neutrons, as well as other strongly coupled systems such as high-Tc superconductors, special materials that could revolutionize key technologies like MRIs and maglev trains.

Domokos, who has recruited undergraduate students to assist in her previous NSF grant-funded research, will continue to do so for this latest study. Students will gain a deeper understanding of theoretical physics, as well as skills like solving differential equations and using scientific computation software, and first-hand experience drafting physics research papers.

1This project is funded by NSF Award ID No. 2328948 and will be completed in partnership with researcher Hang Liu, Ph.D., of Rutgers University. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF.

2This project is funded by NSF Award No. ID 2310066 and will be completed in partnership with University of WisconsinMadison physicistAkif Baha Balantekin, Ph.D.The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF.

3This grant was supported by the National Institute on Aging of the National Institutes of Health under Award Number 1R03AG083363. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

4This project is funded by NSF Award No. ID 2311598. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF.

5This project is funded by NSF Award No. ID 2310305. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF.

View post:
Researchers Secure Prestigious Federal Grants | News | New York ... - New York Institute of Technology

Nonnegative/Binary matrix factorization for image classification … – Nature.com

NBMF9 extracts features by decomposing data into basis and coefficient matrices. The dataset was converted into a positive matrix to prepare the input for the NBMF method. If the dataset contains n-dimensional m data, then the input is an (n times m) matrix V. The input matrix is decomposed into a base matrix W of (n times k), representing the dataset features, and a coefficient matrix H of (k times m), representing the combination of features selected to reconstruct the original matrix. Then,

$$begin{aligned} V approx WH, end{aligned}$$

(1)

where W and H are positive and binary matrices, respectively. The column number k of W corresponds to the number of features extracted from the data and can be set to any value. To minimize the difference between the left and right sides of Eq.(1), W and H are updated alternately as

$$ W: = mathop {{text{arg}}~{text{min}}}limits_{{X in mathbb{R}^{{ + n times k}} }} parallel V - XHparallel _{F} + alpha parallel Xparallel _{F} , $$

(2)

$$ H: = mathop {{text{arg}}~{text{min}}}limits_{{X in { 0,1} ^{{k times m}} }} parallel V - WXparallel _{F} , $$

(3)

where (parallel cdot parallel _F) denotes the Frobenius norm. The components of W and H are initially given randomly. The hyperparameter (alpha ) is a positive real value that prevents overfitting and is set to (alpha = 1.0 times 10^{-4}).

In previous studies, the Projected Gradient Method (PGM) was used to update Eq.(2)16. The loss function that updates Eq.(2) is defined as

$$begin{aligned} f_{W}(varvec{x}) = parallel varvec{v} - H^{text{T}} varvec{x} parallel ^{2} + alpha parallel varvec{x} parallel ^{2}, end{aligned}$$

(4)

where (varvec{x}^{text{T}}) and (varvec{v}^{text{T}}) are the row vectors of W and V, respectively. The gradient of Eq.(4) is expressed as

$$begin{aligned} nabla f_{W} = -H (varvec{v} - H^T varvec{x}) + alpha varvec{x}. end{aligned}$$

(5)

The PGM minimizes the loss functions in Eq.(4) by updating (varvec{x}):

$$begin{aligned} varvec{x}^{t+1} = Pleft[varvec{x}^t - gamma _t nabla f_W (varvec{x}^t)right], end{aligned}$$

(6)

where (gamma _t) is the learning rate and

$$begin{aligned} P[x_i] = {left{ begin{array}{ll} 0 &{} (x_i le 0), \ x_i &{} (0< x_i < x_mathrm{{max}}), \ x_mathrm{{max}} &{} (x_mathrm{{max}} le x_i), end{array}right. } end{aligned}$$

(7)

where (x_mathrm{{max}}) is the upper bound and is set to (x_mathrm{{max}}=1). Eq.(7) is a projection that keeps the components of (varvec{x}) nonnegative.

However, because H is a binary matrix, Eq.(3) can be regarded as a combinatorial optimization problem that can be minimized by using an annealing method. To solve Eq.(3) using a D-Wave machine, a quantum annealing computer, we formulated the loss function as a quadratic unconstrained binary optimization model:

$$begin{aligned} f_{H}(varvec{q}) = sum _i sum _r W_{ri}left(W_{ri} - 2 v_{r}right) q_i + 2 sum _{i

(8)

where (varvec{q}) and (varvec{v}) are the column vectors of H and V, respectively.

After the alternate updating method converges, we obtain W and H which minimize the difference between the left and right sides of Eq.(1). W consists of representative features extracted from the input data, and H represents the combination of features in W using binary values to reconstruct V. Therefore, V can be approximated as the product of W and H.

Previous studies used NBMF to extract features from facial images9. When the number of annealing steps is small, the computation time is shorter than a classical combinatorial optimization solver. However, using the D-Wave machine is disadvantageous in that the computing time increases linearly with the number of annealings, whereas the classical solver does not significantly change the computing time. The results were compared with NMF14. Unlike NBMF, matrix H in NMF is positive and not binary. While the matrix H produced by NBMF was sparser than NMF, the difference between V and WH of NBMF was approximately 2.17 times larger than NMF. Although NBMF can have a shorter data processing time than the classical method, it is inferior to NMF as a machine-learning method in accuracy. Moreover, because previous studies did not demonstrate tasks beyond data reconstruction, the usefulness of NBMF as a machine-learning model is uncertain.

In this study, we propose the application of NBMF to a multiclass classification model. Inspired by the structure of a fully connected neural network (FCNN), we define an image classification model using NBMF. In an FCNN, image data are fed into the network as input, as shown in Fig.1, and the predicted classes are obtained as the output of the network through the hidden layers.

An overview of a fully-connected neural network.

To perform fully connected network learning using NBMF, we interpret the structure shown in Fig.1 as a single-matrix decomposition. When the input and output layers of the FCNN are combined into one input layer, the network becomes a two-layer network with the same structure as NBMF. As the input to the training network by NBMF, we used a matrix consisting of image data and the corresponding class information. Class information is represented by a one-hot vector multiplied by an arbitrary real number g. The image data and class information vectors are combined row-wise and eventually transformed into an input matrix V. We use NBMF to decompose V to obtain the basis matrix W and the coefficient matrix H, as shown in Fig.2. The column vectors in H correspond to the nodes in the hidden layer of the FCNN network, and the components of W correspond to the weights of the edges. The number of feature dimensions k in the NBMF corresponds to the number of nodes in the hidden layer of the FCNN.

An overview of training by NBMF.

To obtain H, we minimize Eq.(8) by using an annealing solver, as in a previous study. However, to obtain W by minimizing Eq.(4), we propose using the projected Root Mean Square Propagation (RMSProp) method instead of the PGM used in a previous study. RMSProp is a gradient descent method that adjusts the learning and decay rates to help the solution escape local minima17. RMSProp updates the vector (varvec{h}), whose components are denoted by (h_i) as

$$begin{aligned} h^{t+1}_{i} = beta h^{t}_{i} + (1-beta ) g^{2}_{i}, end{aligned}$$

(9)

where (beta ) is the decay rate, (varvec{g} = nabla f_W), and vector (varvec{x}) is

$$begin{aligned} varvec{x}^{t+1} = varvec{x}^{t} - eta frac{1}{sqrt{varvec{h}^{t} + epsilon }} nabla f_{W}, end{aligned}$$

(10)

where (eta ) is the learning rate, and (epsilon ) is a small value that prevents computational errors. After updating (varvec{x}) using Eq.(10), we apply the projections described in Eq.(7), to ensure that the solution does not exceed the bounds. We propose this method as a projected RMSProp.

In Fig.3, we demonstrated the information contained in W. Because the row vectors of W correspond to those of V, W consists of (W_1) corresponding to the image data information, and (W_2) corresponding to the class information. We plotted four column vectors selected from W trained with MNIST handwritten digit images under the conditions (m = 300) and (k=40), as shown in Fig.3. The images in Fig.3 show the column vectors of (W_1). The blue histograms show the frequencies at which the column vectors were selected to reconstruct the training data images with each label. The orange bar graphs show the component values of the corresponding column vectors of (W_2). For example, the image in Fig.3a resembles Number 0. From the histogram next to the image, we understand that the image is often used to reconstruct the training data labeled as 0. In the bar graph on the right, the corresponding column vector of (W_2) has the largest component value at an index of 0. This indicates that the column vector corresponding to the image has a feature of Number 0. Similarly, the image in Fig.3b has a label of 9. However, the image in Fig.3c appears to have curved features. From the histogram and bar graph next to the image, it appears that the image is often used to represent labels 2 and 3. This result is consistent with the fact that both numbers have a curve, which explains why the column vector of (W_1) was used in the reconstruction of images with labels 2 and 3. The image in Fig.3d has the shape of a straight line, and the corresponding histogram shows that the image is mainly used to express label 1 and is also frequently used to express label 6. Because Number 6 has a straight-line part, the result is reasonable.

The figure shows four sets of images, (a), (b), (c), and (d), corresponding to column vectors selected from W. Each set contains an image, a histogram, and a bar graph. The image represents a column vector of (W_1), and the histogram shows how often the column vector was selected to reconstruct the training data images with each label. The orange bar graph plots the component values of the corresponding column vector of (W_2).

In our multiclass classification model using NBMF, we used the trained matrices (W_1) and (W_2) to classify the test data in the workflow shown in Fig. 4.

An overview of testing by NBMF.

First, we decompose the test data matrix (V_text{test}) to obtain (H_text{test}) by using (W_1). Here, M represents the amount of test data, which corresponds to the number of column vectors of (V_text{test}). We use Eq.(3) for decomposition. Each column vector of (H_text{test}) represents the features selected from the trained (W_1) to approximate the corresponding column vector of (V_text{test}). Second, we multiply (W_2) by (H_text{test}) to obtain (U_text{test}), which expresses the prediction of the class vector corresponding to each column vector in (V_text{test}). Finally, we applied the softmax function to the components of (U_text{test}) and considered the index with the largest component value in each column vector to be the predicted class.

View original post here:
Nonnegative/Binary matrix factorization for image classification ... - Nature.com