Archive for the ‘Machine Learning’ Category

Machine learning-based risk factor analysis of adverse birth outcomes in very low birth weight infants | Scientific Reports – Nature.com

Participants and variables

Data consisted of 10,423 VLBW infants from the Korean Neonatal Network (KNN) database during January 2013-December 2017. The KNN started on April 2013 as a national prospective cohort registry of VLBW infants admitted or transferred to neonatal intensive care units across South Korea (It covers 74 neonatal intensive care units now). It collects the perinatal and neonatal data of VLBW infants based on a standardized operating procedure37.

Five adverse birth outcomes were considered as binary dependent variables (no, yes), i.e., gestational age less than 28weeks (GA<28), GA less than 26weeks (GA<26), birth weight less than 1000g (BW<1000), BW less than 750g (BW<750) and SGA. Thirty-three predictors were included: sexmale (no, yes), birth-year (2013, 2014, 2015, 2016, 2017), birth-month (1, 2, , 12), birth-season-spring (no, yes), birth-season-summer (no, yes), birth-season-autumn (no, yes), birth-season-winter (no, yes), number of fetuses (1, 2, 3, 4 or more), in vitro fertilization (no, yes), gestational diabetes mellitus (no, yes), overt diabetes mellitus (no, yes), pregnancy-induced hypertension (no, yes), chronic hypertension (no, yes), chorioamnionitis (no, yes), prelabor rupture of membranes (no, yes), prelabor rupture of membranes>18h (no, yes), antenatal steroid (no, yes), cesarean section (no, yes), oligohydramnios (no, yes), polyhydramnios (no, yes), maternal age (years), primipara (no, yes), maternal education (elementary, junior high, senior high, college or higher), maternal citizenship (Korea, Vietnam, China, Philippines, Japan, Cambodia, United States, Thailand, Mongolia, Other), paternal education (elementary, junior high, senior high, college or higher), paternal citizenship (Korea, Vietnam, China, Philippines, Japan, Cambodia, United States, Thailand, Mongolia, Other), unmarried (no, yes), congenital infection (no, yes), PM10 year (PM10 for each year), PM10 month (PM10 for each birth-month), temperature average (for each year), temperature min (for each year) and temperature max (for each year). PM10 and temperature data came from the Korea Meteorological Administration (PM10 https://data.kma.go.kr/data/climate/selectDustRltmList.do?pgmNo=68; temperature https://web.kma.go.kr/weather/climate/past_cal.jsp). The definition of each variable is given in Text S1, supplementary text.

The artificial neural network, the decision tree, the logistic regression, the Nave Bayes, the random forest and the support vector machine were used for predicting preterm birth38,39,40,41,42,43. A decision tree includes three elements, i.e., a test on an independent variable (intermediate note), an outcome of the test (branch) and a value of the dependent variable (terminal node). A nave Bayesian classifier performs classification on the basis of Bayes theorem. Here, the theorem states that the probability of the dependent variable given certain values of independent variables can be calculated based on the probabilities of the independent variables given a certain value of the dependent variable. A random forest is a collection of many decision trees, which make majority votes on the dependent variable (bootstrap aggregation). Let us take a random forest with 1000 decision trees as an example. Let us assume that original data includes 10,000 participants. Then, the training and test of this random forest takes two steps. Firstly, new data with 10,000 participants is created based on random sampling with replacement, and a decision tree is created based on this new data. Here, some participants in the original data would be excluded from the new data and these leftovers are called out-of-bag data. This process is repeated 1000 times, i.e., 1000 new data are created, 1000 decision trees are created and 1000 out-of-bag data are created. Secondly, the 1000 decision trees make predictions on the dependent variable of every participant in the out-of-bag data, their majority vote is taken as their final prediction on this participant, and the out-of-bag error is calculated as the proportion of wrong votes on all participants in the out-of-bag data38,39.

A support vector machine estimates a group of support vectors, that is, a line or space called hyperplane. The hyperplane separates data with the greatest gap between various sub-groups. An artificial neural network consists of neurons, information units combined through weights. In general, the artificial neural network includes one input layer, one, two or three intermediate layers and one output layer. Neurons in a previous layer link with weights in the next layer (Here, these weights denote the strengths of linkages between neurons in a previous layer and their next-layer counterparts). This feedforward operation begins from the input layer, runs through intermediate layers and ends in the output layer. Then, this process is followed by learning: These weights are updated according to their contributions for a gap between the actual and predicted final outputs. This backpropagation operation begins from the output layer, runs through intermediate layers and ends in the input layer. The two processes are repeated until the performance measure reaches a certain limit38,39. Data on 10,423 observations with full information were divided into training and validation sets with a 70:30 ratio (7296 vs. 3127). Accuracy, a ratio of correct predictions among 3127 observations, was employed as a standard for validating the models. Random forest variable importance, the contribution of a certain variable for the performance (GINI) of the random forest, was used for examining major predictors of adverse birth outcomes in VLBW infants including PM10. The random split and analysis were repeated 50 times then its average was taken for external validation44,45. R-Studio 1.3.959 (R-Studio Inc.: Boston, United States) was employed for the analysis during August 1, 2021September 30, 2021.

The KNN registry was approved by the institutional review board (IRB) at each participating hospital (IRB No. of Korea University Anam Hospital: 2013AN0115). Informed consent was obtained from the parent(s) of each infant registered in the KNN. All methods were carried out in accordance with the IRB-approved protocol and in compliance with relevant guidelines and regulations.

The names of the institutional review board of the KNN participating hospitals were as follows: The institutional review board of Gachon University Gil Medical Center, The Catholic University of Korea Bucheon ST. Marys Hospital, The Catholic University of Korea Seoul ST. Marys Hospital, The Catholic University of Korea ST. Vincents Hospital, The Catholic University of Korea Yeouido ST. Marys Hospital, The Catholic University of Korea Uijeongbu ST. Marys Hospital, Gangnam Severance Hospital, Kyung Hee University Hospital at Gangdong, GangNeung Asan Hospital, Kangbuk Samsung Hospital, Kangwon National University Hospital, Konkuk University Medical Center, Konyang University Hospital, Kyungpook National University Hospital, Gyeongsang National University Hospital, Kyung Hee University Medical center, Keimyung University Dongsan Medical Center, Korea University Guro Hospital, Korea University Ansan Hospital, Korea University Anam Hospital, Kosin University Gospel Hospital, National Health Insurance Service Iilsan Hospital, Daegu Catholic University Medical Center, Dongguk University Ilsan Hospital, Dong-A University Hospital, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Pusan National University Hospital, Busan ST. Marys Hospital, Seoul National University Bundang Hospital, Samsung Medical Center, Samsung Changwon Medical Center, Seoul National University Hospital, Asan Medical Center, Sungae Hospital, Severance Hospital, Soonchunhyang University Hospital Bucheon, Soonchunhyang University Hospital Seoul, Soonchunhyang University Hospital Cheonan, Ajou University Hospital, Pusan National University Childrens Hospital, Yeungnam University Hospital, Ulsan University Hospital, Wonkwang University School of Medicine & Hospital, Wonju Severance Christian Hospital, Eulji University Hospital, Eulji General Hospital, Ewha Womans University Medical.

Center, Inje University Busan Paik Hospital, Inje University Sanggye Paik Hospital, Inje University Ilsan Paik Hospital, Inje University Haeundae Paik Hospital, Inha University Hospital, Chonnam National University Hospital, Chonbuk National University Hospital, Cheil General Hospital & Womens Healthcare Center, Jeju National University Hospital, Chosun University Hospital, Chung-Ang University Hospital, CHA Gangnam Medical Center, CHA University, CHA Bundang Medical Center, CHA University, Chungnam National University Hospital, Chungbuk National University, Kyungpook National University Chilgok Hospital, Kangnam Sacred Heart Hospital, Kangdong Sacred Heart Hospital, Hanyang University Guri Hospital, and Hanyang University Medical Center.

See the original post:
Machine learning-based risk factor analysis of adverse birth outcomes in very low birth weight infants | Scientific Reports - Nature.com

ART-ificial Intelligence: Leveraging the Creative Power of Machine Learning – Little Black Book – LBBonline

Above: Chago's AI self-portrait, generated in Midjourney.

I have learnt to embrace and explore the creative possibilities of computer-generated imagery. It all started with the introduction of Photoshop thirty years ago, and more recently, I became interested in the AI software program, Midjourney, a wonderful tool that allows creatives to explore ideas more efficiently than ever before. The best description for Midjourney that Ive found is an AI-driven tool for the exploration of creative ideas.

If I was talking to somebody who was unfamiliar with AI-generated art, I would show them some examples, as this feels like a great place to start. Midjourney is syntax-driven; users must break down the language and learn the key phrases and special order of the words, in order to take full advantage of the program. As well as using syntax, users can upload reference imagery to help bring their idea to life. An art director could upload a photo of Mars and use that as a reference to create new imagery I think this is a fantastic tool.

Im a producer, with an extensive background as a production artist, mostly in retouching and leading post production teams. I also have a background in CGI, I took some postgraduate classes at NYU for a couple semesters, and I went to college for architecture, so I can draw a little bit but I'm not going to pretend that I could ever do a CGI project. A lot of art directors and creative directors are in the same boat, they direct and creative direct - especially on the client side - a lot of CGI projects, but dont necessarily know CGI. Programs like Midjourney let people like us dip our toes into the creative waters, by giving us access to an inventive and artistic toolset.

Last week, the Steelworks team was putting together a treatment deck for a possible new project. We had some great ideas to send to the client, but sourcing certain specific references felt like finding a needle in the haystack. If we were looking for a black rose with gold dust powder on the petals, it is hard to find exactly what we want. Its times like these when a program like Midjourney can boost the creative. By entering similar references into the software and developing a syntax that is as close to what youre looking for as possible, you are given imagery that provides more relevant references for a treatment deck. For this reason, in the future I see us utilising Midjourney more often for these tasks, as it can facilitate the creative ideation for treatments and briefs for clients.

I'm optimistic about Midjourney because, as technology evolves, humans in the creative industries continue to find ways to stay relevant. I was working as a retoucher during the time Photoshop first came out with the Healing Brush. Prior to that, all retouching was done manually by manipulating and blending pixels. All of a sudden, the introduction of the Healing Brush meant that with one swipe, three hours of work was removed. I remember we were sitting in our post production studio when someone showed it to us and we thought, Oh my God, we're gonna be out of a job. Twenty years later, retouching still has relevance, as do the creatives who are valued for their unique skill sets.

I don't do much retouching anymore, but I was on a photo shoot recently and I had to get my hands in the sauce and put comps together for people. There were plenty of new selection tools in Photoshop that have come out in the last three years and I had no idea about most of them. I discovered that using these tools cut out roughly an hour's worth of work, which was great. As a result, it opened up time for me to talk to clients, and be more present at work and home. It's less time in front of the computer at the end of the day.

While these advancements in technology may seem daunting at first, I try not to think of it as a threat to human creativity, rather a tool which grants us more time to immerse ourselves in the activities that boost our creative thinking. Using AI programs like Midjourney helps to speed up the creative process which, in turn, frees up more time to do things like sit outside and enjoy our lunch in the sun, go to the beach or to the park with your kids things that feed our frontal cortex and inspire us creatively. It took me a long time to be comfortable with taking my nose off the grindstone and relearn how to be inspired creatively.

The rest is here:
ART-ificial Intelligence: Leveraging the Creative Power of Machine Learning - Little Black Book - LBBonline

Machine learning can reduce risk of antibiotic resistance in chickens and humans as well – Innovation Origins

Why we write about this topic:

The rapid increase in poultry production has resulted in extensive and indiscriminate use of antibiotics. This has led to a worrying increase in cases of antimicrobial resistance which could potentially spread to humans.

Scientists have used machine learning to find new ways to identify and pinpoint disease in poultry farms, which will help to reduce the need for antibiotic treatment, lowering the risk of antibiotic resistance transferring to human populations, says the University of Nottingham in a press release.

The study, published inSpringer Nature, was led by Dr. Tania Dottorini at the University of Nottingham. The research is part of the FARMWATCH project, a 1.5m partnership between the University and the China National Center for Food Safety Risk Assessment.

With antibiotic resistance now one of the most threatening issues worldwide, effective and rapid diagnostics of bacterial infection in chicken farming can reduce the need for antibiotics, which will reduce the risk of epidemics and antibiotic resistance.

In this project, researchers in Nottingham collected samples from the animals, humans and environment in a Chinese farm and connected slaughterhouse. This complex big data has now been analysed for new diagnostic that will predict and detect bacterial infection, insurgence of antimicrobial resistance (AMR), and transfer to humans. This data will then allow early intervention and treatment, reducing spread and the need for antibiotics.

Smart software discovers new types of antibiotics Innovation Origins

A research group at Leiden University is using artificial intelligence (AI) to find new types of antibiotics. The software could replace the old cultivation method.

The study produced three key findings. Firstly, several similar clinically relevant antimicrobial resistance genes (ARGs) and associated mobile genetic elements (antibiotic resistance genes able to move within genomes and between bacteria), were found in both human and broiler chicken samples.

Secondly, by developing a machine learning-powered approach the team found the existence of a core chicken gut resistome that is correlated with the AMR circulating in the farms. Finally, using sensing technology and machine learning, the team uncovered that the AMR-related core resistome are themselves associated with various external factors such as temperature and humidity.

Dr Dottorini said: The food production industry represents a major consumer of antibiotics, but the AMR risks within these environments are still not fully understood. It is therefore critical to set out studies and improved methods optimised to these environments where animals and humans may be in close contact. Precision farming, cost-effective DNA sequencing and the increased adoption of machine learning technologies offer the opportunity to develop methods giving a better understanding and quantification of AMR risks in farming environments.

Supercomputers joined the fight against antibiotic resistance Innovation Origins

The development of antibiotics is one of the most significant breakthroughs in medicine. However, pathogens develop resistance mechanisms that thwart the effectiveness of antibiotics.

See the original post here:
Machine learning can reduce risk of antibiotic resistance in chickens and humans as well - Innovation Origins

Locus granted patent for machine learning models to ‘accurately predict traffic time’ for last-mile deliveries – Robotics and Automation News

Locus, a logistics software technology company (not to be confused with Locus Robotics), has been awarded a patent for Machine Learning Models for Predicting Time in Traffic, the second patent awarded to the company in 2022.

Predictability of last-mile deliveries is a logistical challenge that continues to affect businesses across various sectors such as e-commerce, 3PLs, and so on.

Locus new patent will enable enterprises to achieve better precision in their last-mile deliveries by factoring in traffic patterns, which have historically been considered too dynamic to map.

The patent covers unique technology that analyzes the historical data of traffic and predicts the travel time between origin and destination locations.

It also factors in sub-variables such as day of the week and time of day to provide hyper-accurate estimated travel times for logistics providers to get ahead.

Geet Garg, founder and CTO, Locus, says: To constantly add value to our growing customer base across 30+ geographies, we at Locus, are pushing our engineering boundaries each day.

The latest patent highlights our relentless effort to bring out new levels of innovation in our product suite, enabling enterprises across industries to drive high-precision logistics operations in any geography, reduce costs and achieve business success.

To foster a culture rooted in innovation, Locus rolled out a company-wide program in 2018 that empowers all employees to apply for patents and includes regular training sessions, end-to-end support for IPR filings, and much more.

Through this initiative, Locus has secured four patents in the last four years, with additional filings already in the pipeline that collectively will continue to bring advancements in the companys unique product suite to serve its customers better.

Nishith Rastogi, founder and CEO of Locus, says: Innovation lives within our DNA and we believe that every employee should have an equal opportunity to contribute to the next best idea.

Thus, with the goal to cultivate an environment of entrepreneurship and creativity, we introduced this policy to empower and support our team in their commitment to solving challenges of the last-mile logistics.

You might also like

Original post:
Locus granted patent for machine learning models to 'accurately predict traffic time' for last-mile deliveries - Robotics and Automation News

Deep Learning Could Bring the Concert Experience Home – IEEE Spectrum

Now that recorded sound has become ubiquitous, we hardly think about it. From our smartphones, smart speakers, TVs, radios, disc players, and car sound systems, its an enduring and enjoyable presence in our lives. In 2017, a survey by the polling firm Nielsen suggested that some 90 percent of the U.S. population listens to music regularly and that, on average, they do so 32 hours per week.

Behind this free-flowing pleasure are enormous industries applying technology to the long-standing goal of reproducing sound with the greatest possible realism. From Edisons phonograph and the horn speakers of the 1880s, successive generations of engineers in pursuit of this ideal invented and exploited countless technologies: triode vacuum tubes, dynamic loudspeakers, magnetic phonograph cartridges, solid-state amplifier circuits in scores of different topologies, electrostatic speakers, optical discs, stereo, and surround sound. And over the past five decades, digital technologies, like audio compression and streaming, have transformed the music industry.

And yet even now, after 150 years of development, the sound we hear from even a high-end audio system falls far short of what we hear when we are physically present at a live music performance. At such an event, we are in a natural sound field and can readily perceive that the sounds of different instruments come from different locations, even when the sound field is criss-crossed with mixed sound from multiple instruments. Theres a reason why people pay considerable sums to hear live music: It is more enjoyable, exciting, and can generate a bigger emotional impact.

Today, researchers, companies, and entrepreneurs, including ourselves, are closing in at last on recorded audio that truly re-creates a natural sound field. The group includes big companies, such as Apple and Sony, as well as smaller firms, such as Creative. Netflix recently disclosed a partnership with Sennheiser under which the network has begun using a new system, Ambeo 2-Channel Spatial Audio, to heighten the sonic realism of such TV shows as Stranger Things and The Witcher.

There are now at least half a dozen different approaches to producing highly realistic audio. We use the term soundstage to distinguish our work from other audio formats, such as the ones referred to as spatial audio or immersive audio. These can represent sound with more spatial effect than ordinary stereo, but they do not typically include the detailed sound-source location cues that are needed to reproduce a truly convincing sound field.

We believe that soundstage is the future of music recording and reproduction. But before such a sweeping revolution can occur, it will be necessary to overcome an enormous obstacle: that of conveniently and inexpensively converting the countless hours of existing recordings, regardless of whether theyre mono, stereo, or multichannel surround sound (5.1, 7.1, and so on). No one knows exactly how many songs have been recorded, but according to the entertainment-metadata concern Gracenote, more than 200 million recorded songs are available now on planet Earth. Given that the average duration of a song is about 3 minutes, this is the equivalent of about 1,100 years of music.

That is a lot of music. Any attempt to popularize a new audio format, no matter how promising, is doomed to fail unless it includes technology that makes it possible for us to listen to all this existing audio with the same ease and convenience with which we now enjoy stereo musicin our homes, at the beach, on a train, or in a car.

We have developed such a technology. Our system, which we call 3D Soundstage, permits music playback in soundstage on smartphones, ordinary or smart speakers, headphones, earphones, laptops, TVs, soundbars, and in vehicles. Not only can it convert mono and stereo recordings to soundstage, it also allows a listener with no special training to reconfigure a sound field according to their own preference, using a graphical user interface. For example, a listener can assign the locations of each instrument and vocal sound source and adjust the volume of eachchanging the relative volume of, say, vocals in comparison with the instrumental accompaniment. The system does this by leveraging artificial intelligence (AI), virtual reality, and digital signal processing (more on that shortly).

To re-create convincingly the sound coming from, say, a string quartet in two small speakers, such as the ones available in a pair of headphones, requires a great deal of technical finesse. To understand how this is done, lets start with the way we perceive sound.

When sound travels to your ears, unique characteristics of your headits physical shape, the shape of your outer and inner ears, even the shape of your nasal cavitieschange the audio spectrum of the original sound. Also, there is a very slight difference in the arrival time from a sound source to your two ears. From this spectral change and the time difference, your brain perceives the location of the sound source. The spectral changes and time difference can be modeled mathematically as head-related transfer functions (HRTFs). For each point in three-dimensional space around your head, there is a pair of HRTFs, one for your left ear and the other for the right.

So, given a piece of audio, we can process that audio using a pair of HRTFs, one for the right ear, and one for the left. To re-create the original experience, we would need to take into account the location of the sound sources relative to the microphones that recorded them. If we then played that processed audio back, for example through a pair of headphones, the listener would hear the audio with the original cues, and perceive that the sound is coming from the directions from which it was originally recorded.

If we dont have the original location information, we can simply assign locations for the individual sound sources and get essentially the same experience. The listener is unlikely to notice minor shifts in performer placementindeed, they might prefer their own configuration.

Even now, after 150 years of development, the sound we hear from even a high-end audio system falls far short of what we hear when we are physically present at a live music performance.

There are many commercial apps that use HRTFs to create spatial sound for listeners using headphones and earphones. One example is Apples Spatialize Stereo. This technology applies HRTFs to playback audio so you can perceive a spatial sound effecta deeper sound field that is more realistic than ordinary stereo. Apple also offers a head-tracker version that uses sensors on the iPhone and AirPods to track the relative direction between your head, as indicated by the AirPods in your ears, and your iPhone. It then applies the HRTFs associated with the direction of your iPhone to generate spatial sounds, so you perceive that the sound is coming from your iPhone. This isnt what we would call soundstage audio, because instrument sounds are still mixed together. You cant perceive that, for example, the violin player is to the left of the viola player.

Apple does, however, have a product that attempts to provide soundstage audio: Apple Spatial Audio. It is a significant improvement over ordinary stereo, but it still has a couple of difficulties, in our view. One, it incorporates Dolby Atmos, a surround-sound technology developed by Dolby Laboratories. Spatial Audio applies a set of HRTFs to create spatial audio for headphones and earphones. However, the use of Dolby Atmos means that all existing stereophonic music would have to be remastered for this technology. Remastering the millions of songs already recorded in mono and stereo would be basically impossible. Another problem with Spatial Audio is that it can only support headphones or earphones, not speakers, so it has no benefit for people who tend to listen to music in their homes and cars.

So how does our system achieve realistic soundstage audio? We start by using machine-learning software to separate the audio into multiple isolated tracks, each representing one instrument or singer or one group of instruments or singers. This separation process is called upmixing. A producer or even a listener with no special training can then recombine the multiple tracks to re-create and personalize a desired sound field.

Consider a song featuring a quartet consisting of guitar, bass, drums, and vocals. The listener can decide where to locate the performers and can adjust the volume of each, according to his or her personal preference. Using a touch screen, the listener can virtually arrange the sound-source locations and the listeners position in the sound field, to achieve a pleasing configuration. The graphical user interface displays a shape representing the stage, upon which are overlaid icons indicating the sound sourcesvocals, drums, bass, guitars, and so on. There is a head icon at the center, indicating the listeners position. The listener can touch and drag the head icon around to change the sound field according to their own preference.

Moving the head icon closer to the drums makes the sound of the drums more prominent. If the listener moves the head icon onto an icon representing an instrument or a singer, the listener will hear that performer as a solo. The point is that by allowing the listener to reconfigure the sound field, 3D Soundstage adds new dimensions (if youll pardon the pun) to the enjoyment of music.

The converted soundstage audio can be in two channels, if it is meant to be heard through headphones or an ordinary left- and right-channel system. Or it can be multichannel, if it is destined for playback on a multiple-speaker system. In this latter case, a soundstage audio field can be created by two, four, or more speakers. The number of distinct sound sources in the re-created sound field can even be greater than the number of speakers.

This multichannel approach should not be confused with ordinary 5.1 and 7.1 surround sound. These typically have five or seven separate channels and a speaker for each, plus a subwoofer (the .1). The multiple loudspeakers create a sound field that is more immersive than a standard two-speaker stereo setup, but they still fall short of the realism possible with a true soundstage recording. When played through such a multichannel setup, our 3D Soundstage recordings bypass the 5.1, 7.1, or any other special audio formats, including multitrack audio-compression standards.

A word about these standards. In order to better handle the data for improved surround-sound and immersive-audio applications, new standards have been developed recently. These include the MPEG-H 3D audio standard for immersive spatial audio with Spatial Audio Object Coding (SAOC). These new standards succeed various multichannel audio formats and their corresponding coding algorithms, such as Dolby Digital AC-3 and DTS, which were developed decades ago.

While developing the new standards, the experts had to take into account many different requirements and desired features. People want to interact with the music, for example by altering the relative volumes of different instrument groups. They want to stream different kinds of multimedia, over different kinds of networks, and through different speaker configurations. SAOC was designed with these features in mind, allowing audio files to be efficiently stored and transported, while preserving the possibility for a listener to adjust the mix based on their personal taste.

To do so, however, it depends on a variety of standardized coding techniques. To create the files, SAOC uses an encoder. The inputs to the encoder are data files containing sound tracks; each track is a file representing one or more instruments. The encoder essentially compresses the data files, using standardized techniques. During playback, a decoder in your audio system decodes the files, which are then converted back to the multichannel analog sound signals by digital-to-analog converters.

Our 3D Soundstage technology bypasses this. We use mono or stereo or multichannel audio data files as input. We separate those files or data streams into multiple tracks of isolated sound sources, and then convert those tracks to two-channel or multichannel output, based on the listeners preferred configurations, to drive headphones or multiple loudspeakers. We use AI technology to avoid multitrack rerecording, encoding, and decoding.

In fact, one of the biggest technical challenges we faced in creating the 3D Soundstage system was writing that machine-learning software that separates (or upmixes) a conventional mono, stereo, or multichannel recording into multiple isolated tracks in real time. The software runs on a neural network. We developed this approach for music separation in 2012 and described it in patents that were awarded in 2022 and 2015 (the U.S. patent numbers are 11,240,621 B2 and 9,131,305 B2).

The listener can decide where to locate the performers and can adjust the volume of each, according to his or her personal preference.

A typical session has two components: training and upmixing. In the training session, a large collection of mixed songs, along with their isolated instrument and vocal tracks, are used as the input and target output, respectively, for the neural network. The training uses machine learning to optimize the neural-network parameters so that the output of the neural networkthe collection of individual tracks of isolated instrument and vocal datamatches the target output.

A neural network is very loosely modeled on the brain. It has an input layer of nodes, which represent biological neurons, and then many intermediate layers, called hidden layers. Finally, after the hidden layers there is an output layer, where the final results emerge. In our system, the data fed to the input nodes is the data of a mixed audio track. As this data proceeds through layers of hidden nodes, each node performs computations that produce a sum of weighted values. Then a nonlinear mathematical operation is performed on this sum. This calculation determines whether and how the audio data from that node is passed on to the nodes in the next layer.

There are dozens of these layers. As the audio data goes from layer to layer, the individual instruments are gradually separated from one another. At the end, in the output layer, each separated audio track is output on a node in the output layer.

Thats the idea, anyway. While the neural network is being trained, the output may be off the mark. It might not be an isolated instrumental trackit might contain audio elements of two instruments, for example. In that case, the individual weights in the weighting scheme used to determine how the data passes from hidden node to hidden node are tweaked and the training is run again. This iterative training and tweaking goes on until the output matches, more or less perfectly, the target output.

As with any training data set for machine learning, the greater the number of available training samples, the more effective the training will ultimately be. In our case, we needed tens of thousands of songs and their separated instrumental tracks for training; thus, the total training music data sets were in the thousands of hours.

After the neural network is trained, given a song with mixed sounds as input, the system outputs the multiple separated tracks by running them through the neural network using the system established during training.

After separating a recording into its component tracks, the next step is to remix them into a soundstage recording. This is accomplished by a soundstage signal processor. This soundstage processor performs a complex computational function to generate the output signals that drive the speakers and produce the soundstage audio. The inputs to the generator include the isolated tracks, the physical locations of the speakers, and the desired locations of the listener and sound sources in the re-created sound field. The outputs of the soundstage processor are multitrack signals, one for each channel, to drive the multiple speakers.

The sound field can be in a physical space, if it is generated by speakers, or in a virtual space, if it is generated by headphones or earphones. The function performed within the soundstage processor is based on computational acoustics and psychoacoustics, and it takes into account sound-wave propagation and interference in the desired sound field and the HRTFs for the listener and the desired sound field.

For example, if the listener is going to use earphones, the generator selects a set of HRTFs based on the configuration of desired sound-source locations, then uses the selected HRTFs to filter the isolated sound-source tracks. Finally, the soundstage processor combines all the HRTF outputs to generate the left and right tracks for earphones. If the music is going to be played back on speakers, at least two are needed, but the more speakers, the better the sound field. The number of sound sources in the re-created sound field can be more or less than the number of speakers.

We released our first soundstage app, for the iPhone, in 2020. It lets listeners configure, listen to, and save soundstage music in real timethe processing causes no discernible time delay. The app, called 3D Musica, converts stereo music from a listeners personal music library, the cloud, or even streaming music to soundstage in real time. (For karaoke, the app can remove vocals, or output any isolated instrument.)

Earlier this year, we opened a Web portal, 3dsoundstage.com, that provides all the features of the 3D Musica app in the cloud plus an application programming interface (API) making the features available to streaming music providers and even to users of any popular Web browser. Anyone can now listen to music in soundstage audio on essentially any device.

When sound travels to your ears, unique characteristics of your headits physical shape, the shape of your outer and inner ears, even the shape of your nasal cavitieschange the audio spectrum of the original sound.

We also developed separate versions of the 3D Soundstage software for vehicles and home audio systems and devices to re-create a 3D sound field using two, four, or more speakers. Beyond music playback, we have high hopes for this technology in videoconferencing. Many of us have had the fatiguing experience of attending videoconferences in which we had trouble hearing other participants clearly or being confused about who was speaking. With soundstage, the audio can be configured so that each person is heard coming from a distinct location in a virtual room. Or the location can simply be assigned depending on the persons position in the grid typical of Zoom and other videoconferencing applications. For some, at least, videoconferencing will be less fatiguing and speech will be more intelligible.

Just as audio moved from mono to stereo, and from stereo to surround and spatial audio, it is now starting to move to soundstage. In those earlier eras, audiophiles evaluated a sound system by its fidelity, based on such parameters as bandwidth, harmonic distortion, data resolution, response time, lossless or lossy data compression, and other signal-related factors. Now, soundstage can be added as another dimension to sound fidelityand, we dare say, the most fundamental one. To human ears, the impact of soundstage, with its spatial cues and gripping immediacy, is much more significant than incremental improvements in fidelity. This extraordinary feature offers capabilities previously beyond the experience of even the most deep-pocketed audiophiles.

Technology has fueled previous revolutions in the audio industry, and it is now launching another one. Artificial intelligence, virtual reality, and digital signal processing are tapping in to psychoacoustics to give audio enthusiasts capabilities theyve never had. At the same time, these technologies are giving recording companies and artists new tools that will breathe new life into old recordings and open up new avenues for creativity. At last, the century-old goal of convincingly re-creating the sounds of the concert hall has been achieved.

From Your Site Articles

Related Articles Around the Web

Read more from the original source:
Deep Learning Could Bring the Concert Experience Home - IEEE Spectrum