Archive for the ‘Machine Learning’ Category

What Is the Role of a Machine Learning Engineer? – TechSpective

Machine learning seems to be picking up steam as one of the buzzwords to look out for this decade.

Among the U.S. and Japan-based I.T. professionals surveyed in 2017, three-fourths said they were already using machine learning for cybersecurity. Most were also confident that the cyberattacks on their businesses within the past year used machine learning. Despite its increasing use, machine learning remains an ambiguous concept among more than half of the respondents.

Regardless, data has become the new black gold in recent years, according to some experts. The entrepreneur in this data-driven economy relies on information derived from collected data to make more informed decisions. It wouldnt be surprising for a business to invest heavily in software and other solutions built on sophisticated neural networks.

Creating such networks is no easy task. Whether feed-forward or recurrent, a neural network must be capable of learning as it feeds on more data. It also has to learn new things in a period measured in days, if not seconds. By contrast, the human brain takes years for something to become second nature to a person.

Central to this effort is the machine learning engineer. It has grown to become the most in-demand profession in the U.S., with related job opportunities spiking by 344% in 2019. Heres an in-depth look into the role of a machine learning engineer and the reasons for the jobs increase in demand.

To say that a machine learning engineers job is similar to a computer programmer is a dichotomy. While performing programming to an extent, a machine learning engineers task is to develop the machine to perform tasks without being explicitly told.

Computer programming takes rules and data, and then turning them into solutions. Meanwhile, machine learning takes solutions and data, and then turning them into rules. Furthermore, computer programming can develop a general-use calculator, while machine learning can develop one for a specific niche.

Machine learning engineers work closely with data scientists and software engineers. They create control models using data that are derived from the models defined by data scientists, allowing the machine to understand commands. From there, the software engineer designs the user interface from which the machine will operate.

The final product is software, like cnvrg MLOps, combining best practices from DevOps, software development and I.T. operations, and machine learning engineering. Organizations tend to spend more on infrastructure development when a machine learning-ready software can provide a precise estimate on how much they need.

Machine learning engineers have a diverse skill setwith some skills encompassing those found in data scientists and software engineers. Its usual for one to graduate from college and begin working with some skills missing since theyll learn these skills as they move up the career ladder anyway.

The necessary skills for machine learning engineering fall under any of the four categories.

As mentioned earlier, the end product of machine learning engineering is software. Still, its applications are far and widebeyond predicting business trends and auto-filling search terms.

For instance, Stanford Universitys Autonomous Helicopter Program demonstrates the feasibility of teaching an aircraft flight. Researchers installed a system that uses reinforcement learning on a Yamaha R-50 helicopter. It managed to perform stunts a human-crewed helicopter would have difficulty doing, if not impossible to do, continually correcting its course with each pass.

Similar autonomous technology found its way in the drivers seat of Googles self-driving vehicle. Described as on the bleeding edge of artificial intelligence research, the car learns from human behavior on the road to drive. While the technology wont replace human drivers anytime soon, it shows the possibilities machine learning engineering is turning into reality.

Its safe to say that machine learning engineers fill capability gaps among software engineers and data scientists. When these disciplines work together, they create technologies previously thought impractical or impossible. No doubt that theyre paving the way to the future.

Go here to see the original:
What Is the Role of a Machine Learning Engineer? - TechSpective

Machine learning helps cancer center with targeted COVID-19 outreach – Healthcare IT News

Regional Cancer Care Associates, based in New Jersey, has more than 20 locations throughout New Jersey, Connecticut, Maryland, Pennsylvania and the Washingtonarea. Staff realized they needed a risk-stratified list of patients for COVID-19 vulnerability that nurses could manage through phone calls and by coordinatingservices with other providers.

THE PROBLEM

Because of staffing challenges, the list had to identify only the high-risk patients who staff needed to manage first, not the entire population or those patients who could wait a bit longer for nurse outreach.

"Even though we already had an indigenous and independent scoring logic/mechanism for patient risk, this was mainly based on a combination of comorbidities that differentiated it from the usual scoring techniques," explained Lani M. Alison, vice president of quality and value transformation at RCCA.

"Thus," she said, "there was a need to further stratify the risk patients for COVID-19 vulnerability and to establish a patient-centered assessment and outreach."

On another note, staff observed challenges in assigning these patients and a defined patient roster to care coordination executives or support staff, which was hindering a patient-centric outreach approach, Alison added.

PROPOSAL

RCCA turned to artificial intelligence-based health IT vendor Health EC to help address the challenges.

"HealthEC was able to run their machine learning algorithms to identify the patients at highest risk for COVID-19 and therefore focus our care coordination resources," Alison said. "Algorithms re-stratified these patients and assigned a ranking to each patient with an associated risk score."

Lani M. Alison, Regional Cancer Care Associates

The result was a defined patient list that enabled the RCCA team to reach the highest of the high-risk population. The list proved very helpful, and it became an essential part of RCCA's care management documentation platform. It helped focus initial care management calls and increase the effectiveness of the team.

"RCCA also used the list to streamline the COVID-19 huddles and provide this information to practice administrators at each of our sites to help manage patient outreach, mitigate the risk and provide educational information," she said.

MEETING THE CHALLENGE

Data was aggregated from claims, clinical, labs and HIE data sources into the universal data warehouse used by HealthEC. This created a longitudinal, 360-degree view of the patient.

"This single longitudinal view gave us easy access to all the patients' care records and pooled data, including demographics, vitals, diagnosis, etc., from different sources, like the EHR, claim files, CCDAs and ADTs," Alison explained.

"Users were able to have access to patient clinical information without jumping around into different modules. It created a one-stop shop."

HealthEC's Care Connect Pro empowered RCCA staff to stratifyhigh-risk patients (10% of its entire population), not only for COVID-19 risk management, but also for better care management overall, she said.

"Care coordinators, nurses and staff used the CCPro tool to document patient outreach, education material and medication management," she said. "Each patient was assigned a dedicated care coordinator to help mitigate the risk of hospitalization."

Along with the aforementioned clinical data, diagnostic information was added for integrated patient care plans with LabCorp data. This ensured a real-time dynamic flow of information that proved crucial for physicians to design a care pathway or to decide the next milestones of a care plan, she added.

Data received from CRISP theChesapeake Regional Information System for our Patients, the area's HIE was also processed and synchronized into the system to ensure real-time availability of admissions and discharge information.

That is all part of phase one:patient identification. Phase two is interventions and outcomes. This phase requires RCCA staff to:

RESULTS

RCCA reports success with three key metrics.

First, billable transitional care management and chronic care management services now live in some of the practices.

"With targeted patient outreach, patient-specific CCM and TCM, and customized COVID-19 assessments, services were made available to patients after running rigorous risk-stratification protocols to filter out high-risk patients; 10% of the identified entire high-risk population for COVID-19 was validated by the practice by outreach and tele-connections," Alison explained.

Second, improvement in pain and advance care planning measures.

"We had timely interventions to close care gaps," Alison said. "The ACP measure requires patients to report the status of pain within 48 hours. The real-time pain assessments and scores help to close care gaps and ensure the patients are contacted within a specific time interval, 48 hours, to ensure patients' pain was brought to comfortable levels and satisfy the measure compliance."

And third, access to CRISP (Maryland's health information exchange) proved to be a game changer for the provider organization.

"Ease of integration was key," Alison said. "Embedding and onboarding of data from multiple sources, like EHRs, HIEs, claims, CCDAs, etc.,was a big plus to provide caregivers easy access to all types of data in one single place."

ADVICE FOR OTHERS

"Targeted patient outreach using preprocessed and intuitive data sets formed as a result of the summary of various clinical and nonclinical information can help optimize the utilization of staff or resources and thereby ensure better care outcomes and patient satisfaction," Alison advised.

"Inferences from data analytical tools work best in scenarios where data flow is not intermittent but continuous, real-time and unbiased, or deduplicated," she said. "In order to derive definitive insights that can help in decision-making and planning for the organization, the quality and quantity of data inputs is very critical."

Twitter:@SiwickiHealthITEmail the writer:bsiwicki@himss.orgHealthcare IT News is a HIMSS Media publication.

Read more from the original source:
Machine learning helps cancer center with targeted COVID-19 outreach - Healthcare IT News

This Biotech Company Combines Single Cell Genomics with Machine Learning (ML) Algorithms To Enable High Resolution Profiling of the Immune System -…

Immunai is a biotech company using machine learning algorithms that combine single-cell genomics to empower the human immune systems high-resolution profiling. Based out of New York, this company was established merely three years ago, but it is growing at a breakneck pace with the largest dataset in the world for single-cell immunity characteristics. Recently, the startup managed to raise a whopping $60 million in Series A funding. The total number of funds raised now stands at $80 million. With its machine learning algorithms, Immunai has already powered the existing immunotherapies with an enhanced performance level by bettering the analysis of an individuals immunity. It is now ready for a new dawn. With the help of the new funding received, Immunai will delve into the arena of creating new therapies altogether with the help of its vast expanse of data and advanced machine learning algorithms.

The human immune system has been a highly researched topic, and with the onset of the pandemic, the reprogramming of immunity has been under the limelight. To get an in-depth analysis of the same, Immunai makes use of the multiomic approach which helps in the layering analysis of the various types of biological data available. What makes Immunai stand out from the crowd is that it uses and combines the richest data sets. These data sets are procured from the best immunological research organizations from across the globe with machine learning algorithms designed to deliver analytics at a never seen before pace.

Immunai has two great co-founders in its ambit: Noan Solomon and Luis Voloch. Both the founders have extreme knowledge in computer science as well as artificial intelligence. Their efforts right from the beginning were aligned towards the usage of machine learning technology in the field of immunology.Prior to the funding, the main job being done at Immunai was the observation of cells. In contrast, now, they will observe the cells and perturb them to see the aftermath. The machine learning algorithms being used at Immunai allow them to evaluate an approach practically. This makes their model more feasible and influential in the real world.

After successfully understanding the human immune profile, the next step will be to administer new drugs to help fight potential diseases. To understand it better, we can take the example of Google Maps, wherein initially, it takes years to understand the road mapping solely. Similarly, as of now, Immunai is working on understanding the different pathways present in the immune system with the help of machine learning. Once done, the roads and paths underdeveloped or those that havent been built can be given a lending hand. This will eventually lead to a healthier world more armed to fight any disease, even like the pandemic that we are faced with within the current scenario.

One major milestone that any immunotherapy needs to achieve is finding the right immunotherapy for the right patient. This poses a herculean task given the complex structure of the human immune system, but with the advancement of the machine learning models, one can expect to overcome this roadblock soon.

Source:https://www.immunai.com/

Suggested

Link:
This Biotech Company Combines Single Cell Genomics with Machine Learning (ML) Algorithms To Enable High Resolution Profiling of the Immune System -...

Immunai Raises $60M to Decode the Immune System with Machine Learning and AI – AlleyWatch

The immune system at its core is a complex system of cells, organs, and tissues. These components work in unison to fight infection in the form of microbes. Developing an understanding of how this intricate system works is critical in ensuring that society as a whole has adequate immune health to combat disease and infection.Immunaihas built the largest database for immunology in the world using machine learning and AI to map the entire immune system at a granular and specific level. This data can be leveraged by the healthcare industry to provide better therapeutics that get to market faster. This understanding will also allow biotech companies and pharmaceutical manufacturers to radically personalize therapeutics in the future. Immunai is initially focused on the oncology market but the offering is versatile can be applied to things like autoimmune disorders and infectious diseases like COVID-19.

AlleyWatch caught up with CEO and Cofounder Noam Solomon to learn more about the impact that Immunai is having in the understanding of the immune system, the companys partnerships, experience fundraising during the pandemic, latest funding round, and much, much more

Who were your investors and how much did you raise?

This $60M Series A round was led by Schusterman Family Investments, Duquesne Family Office, Catalio Capital Management, and Dexcel Pharma, with additional participation from existing investors Viola Ventures and TLV Partners.

Tell us about the product or service that Immunai offers.

Immunai is on a mission to reprogram the immune system to advance personalized medicine to better detect, diagnose, and treat disease. To do so, Immunai has generated the largest proprietary database for immunology in the world, known as the Annotated Multi-omic Immune Cell Atlas (AMICA). This platform incorporates variables such as clinical lab metadata (e.g., processing wait time) and batch data (e.g., hospital), and others; then, it leverages machine learning and artificial intelligence to complete the annotation and characterization of immune cells. Immunais team of computational biologists and immunologists work with our partners at pharmaceutical companies to figure out the implications of what Immunai has found, whether its a new therapy, a drug combination, or a diagnostic.

What inspired the start of Immunai?

When I met my cofounder Luis, I was a math postdoc at MIT and Luis was working to apply machine learning to biology. Together, we wanted to bring transfer learning AI methods to what we believe would solve the biggest problem in society today disease.

All disease can be traced back to the immune system. But what we realized is that pharmaceutical companies dont have access to any comprehensive, granular insight into how the immune system works, how it responds to the drugs or therapies theyre developing, and what patients are most likely to benefit. With our scientific cofounders, Ansu Satpathy (assistant professor at Stanford for cancer immunology), Danny Wells (researcher at the Parker Institute for cancer immunotherapy) and Dan Littman (Professor at NYU and HHMI investigator) we realized that with single-cell technologies we would be able to measure and map the immune system with granularity and specificity like never available before.

At Immunai, weve combined the brightest minds across single-cell genomics, data science, and engineering to build the largest proprietary database on immunology in the world. We hope our work will lead to a better understanding of how to overcome the key unsolved problems and bottlenecks in immunotherapy discovery and development. We want to enable the development of more effective therapies and combinations for each patient, accelerate the ability to bring these therapies to market, and ultimately, provide better options for patients at a faster pace than ever before.

How is Immunai different?

No one is doing exactly what were doing. Companies have been trying to understand the immune system for years, but have been limited by traditional bulk sequencing technologies, which dont provide nearly enough data. By analyzing gene expression levels, protein markers, TCR and BCR fragments, and other single-cell omics, weve compiled 10,000 times more data for each immune cell than others before, giving partners a view of the immune system with a full spectrum of color and dimensionality.

Further, our proprietary machine learning and single-cell analysis that we apply to mine AMICA , the worlds largest proprietary Multiomic Immune Cell Atlas, allow us to understand the immune system at scale with unprecedented granularity and consistency. This provides a solution to the prohibitive batch effect problem that our competitors have not been able to solve.

What market does Immunai target and how big is it?

Immunais offering can be applied to multiple disease areas from cancer to autoimmune disorders to infectious diseases like COVID-19. The company is primarily focusing on the oncology market, which is currently set to surpass $469.5 billion by 2026.

Whats your business model?

Immunai partners with biopharmaceutical and biotech companies to answer critical questions like what makes T-cells expand, persist, and penetrate a tumor, which cells are cytotoxic, which cells in a cell therapy drive response, what are the immunological signatures that are more likely to lead to clinical response to different therapies, and more. These partnerships are usually structured as milestone-based collaborations, ranging from prospective clinical trial design and biomarker discovery to earlier target discovery and target validation.

How has COVID-19 impacted your business?

COVID-19 has impacted the way we work and the pace at which we work. Weve asked our employees who are not working in the lab to work from home and have implemented strict social distancing protocols within the lab. In the biopharma world, business is bigger than ever before, so we have many new partnerships in a variety of disease areas, including Immuno-Oncology, Autoimmunity, Neurodegenerative diseases, and infectious diseases .

What was the funding process like?

Fast but complex. It happened over a few very eventful months, with many important partnerships forged and multiple parties involved in the financing round, which all took place during a worldwide pandemic, of course.

What are the biggest challenges that you faced while raising capital?

The financing round happened as we were closing a few important partnerships, so running both responsibilities as CEO was non-trivial. In the middle of it all, life happened, and we had to deal with family health issues, including the fact that my wife and I had caught COVID, but we were both fine, luckily.

But what I didnt expect from the pandemic was being able to raise $60M without meeting the lead investors face to face. This is something that frankly, I didnt expect happening, and definitely didnt expect would happen so fast.

What factors about your business led your investors to write the check?

Our investors have witnessed the accelerated growth of our platform and are aligned with our vision to reprogram immunity. Machine learning crossed with genomics will unlock the mysteries of the immune system and lead to improved therapies. To actually execute on this vision, a world-class team is required, and weve put it together.

What are the milestones you plan to achieve in the next six months?

Were going to use this new financing round to build and improve our platform. With our expansion into functional genomics, well be funding collaborations with partners to answer the most pressing questions in immuno-oncology, cell therapy, infectious disease, and autoimmunity, including key biology driving clinical endpoints and target discovery.

We also plan to invest heavily in growth and double our team of 70 by year-end. We currently have a large lab in New York with 50 scientists working on sequencing and tech development. Were looking to add more people to the team to develop new assets and IP.

We also plan to invest heavily in growth and double our team of 70 by year-end. We currently have a large lab in New York with 50 scientists working on sequencing and tech development. Were looking to add more people to the team to develop new assets and IP.

What advice can you offer companies in New York that do not have a fresh injection of capital in the bank?

Understand the essence of what youre building and bring it to market quickly. Lean Startup is one of the most important business books Ive read; its critical for any business, but particularly for one with a limited runway. Whats the most expeditious experiment you can run to see if your customers actually care about your product.

Where do you see the company going now over the near term?

Were transitioning from observational genomics to functional genomics. Were concentrating on two major projects: improving the ability to target new checkpoints and validate targets for cell therapies. Just in the last year, weve been able to identify new mechanisms of resistance with partners in record time. At this pace, we hope the work well be able to do in the next couple of years will be groundbreaking and life-saving, but its too early to say specifically where well be.

Whats your favorite outdoor dining restaurant in NYC

Cafe Mogador on St Marks.

Go here to see the original:
Immunai Raises $60M to Decode the Immune System with Machine Learning and AI - AlleyWatch

Carin Meier Using Machine Learning to Combat Major Illness, such as the Coronavirus – InfoQ.com

00:22 Introduction

00:22 Wes Reisz: Worldwide, there have been 96 million cases of the coronavirus, with over 2 million deaths attributed to the disease. In particular, places like the US, India, and Brazil have been some of the hardest areas hit. In the US alone, 400,000 people have been attributed as dying from this disease, roughly the same number of American soldiers that died in World War II. Today, I thought we'd talk about how tech is combating major diseases--such as the coronavirus. While the coronavirus certainly has our attention, it won't be the sole focus of what we talk a bit about today. We'll talk about things like cancer and heart disease. We'll also talk about some of the challenges when working with private health care data and some of the techniques and things that still need to be solved when dealing with this type of data, things like safety and ethics. We'll be talking about ways of using this data in a responsible and effective way.

01:08 Wes Reisz: Hello and welcome to the InfoQueue podcast. I'm Wes Reisz, one of the hosts for the podcast. Today's guest is Carin Meier. Carin is a data engineer at Reify Health. Reify Health develops software that accelerates the development of new and lifesaving therapies. Carin is an avid functional developer. She's a committer, PPMC member for Apache MXNet, and you've seen her keynote at places like OSCon, Strangeloop, and most recently at QCon Plus (held towards the end of last year).

01:34 Wes Reisz: The next QCon plus, which is an online version of the QCon you know and love, will be taking place over two weeks between May 17th and 28 of 2021. QCon Plus focuses on emerging software trends and practices from the world's most innovative software shops. All 16 tracks are curated by domain experts to help you focus on the topics that matter the most in software today. Tracks include leading full-cycle engineering teams, modern data pipelines, and continuous delivery: workflows and platforms. You'll learn new ideas. You'll learn new insights from over 80 software practitioners and innovator/early adopter companies, all across software. Spaced over two weeks, just a few hours a day, these expert-level technical talks provide real time interactive sessions, regular sessions, async learning, and additional workshops to help you validate your software roadmap. If you're a senior software engineer, architect or team lead and want to take your technical learning and personal development to a whole new level this year, join us at QCon plus this May 17th to 28th. You can visit qcon.plus for more info.

02:34 Wes Reisz: With that, let's jump in. Carin, thank you for joining us on the podcast.

02:37 Carin Meier: Thank you for having me. I'm excited to be here.

02:40 Wes Reisz: Yeah. I'm excited to work this out. I thought we jumped right in and start with Apache MXNet. It seems like a good way to bridge right into this topic. As way of an introduction, you're committer PPMC on the project. What is it? What is the MXNet?

02:53 Carin Meier: Yeah, so Apache MXNet is a machine learning library, and we're all very familiar with that. The thing that I really enjoy about it is that it's an Apache model. I was able to come there as just an interested party wanting to use this and realizing there was a gap. There were no Clojure bindings as a language for the library. I was able to get involved and commit and contribute that binding so I could bring the Clojure community to it. Then also help cross-pollinate ideas between the functional communities and the regular view Python developers. Just regularly, I think that the Apache model is a great one for openness across not only different programming languages but across different cultures and different nations in the world. I think it's a great place.

03:47 Wes Reisz: There's a bunch of deep learning libraries, for example, out there. What is Apache MXNet's focus?

03:52 Carin Meier: It's incubating, so it's not fully graduated. I'll put that in there. That's something that you've always got to say until you graduate that you're incubating Apache. The focus is ... It's a full-fledged machine learning library. You can do deep neural networks with it, but it really focuses on being efficient and fast, as opposed to some of the other ones.

04:13 Wes Reisz: On this podcast, I wanted to see how tech in particular ML and AI is affecting and just being involved with the fight against the coronavirus. That was the original premise. What are you seeing in the machine learning space as ways that the disease is being combated?

04:30 Carin Meier: Yeah, everybody knows where we are in the pandemic. It's like a big spotlight has been shown on this area. I think it's to great effect that we've seen great strides with the Google AlphaFold and just many people using machine learning to generate possible solutions that we've come up with with our vaccines, which is fantastic. Also, in all the little supporting ways. There's been machine learning applied to just about every other way that you can accelerate, looking at the results of the trials and results, using machine learning to sift through papers, to find possible correlations between symptoms and bring stuff to the forefront that we couldn't discover in any sort of timely fashion. Then of course, you think about the machine learning and just every other supporting way. Amazon could still ship my things to me, even though the whole supply chain had been disrupted. There's ways that we can definitely point to we have a vaccine now, but just everything that's supporting that and accelerating us. How Zoom was able to scale, how schools were able to be able to move to online learning.

05:47 Carin Meier: All of that has been accelerated in ways that we just can't count by these techniques and our technology today.

05:55 Wes Reisz: Talking about Zoom, I was at a friend's, and they have a son. I think he's six years old, and (I was there for just a few minutes, keeping my social distancing, of course) off at the table, I could see him with eight other six-year-olds on the Zoom meeting. It was just the craziest thing to watch a group of six-year-olds doing a reading exercise or a writing exercise on Zoom. It's just amazing how the whole human experience had to change with this pandemic.

06:21 Carin Meier: Yeah. We're in it. Machine learning is around us so much now that it's like water and air.

06:27 Wes Reisz: Yeah, totally. Are there any specific cases? I know you can't mention very specifics, but are there any specific cases that you can point to, that we can talk about?

06:36 Carin Meier: There is a CORD-19 Open Research Dataset that the Semantic Scholar team at the Allen Institute for AI. They developed to partner with the global research community by accelerating finding insights in all the related published papers. That was an interesting one. The one that I'm interested in right now is... We can talk about it later, but it was from Google AI. It's a paper that came out when they're talking about bringing concept-model explanations for Electronic Health Records. Actually, there's been all sorts of ... We'll get into this later about how to make machine learning more trustworthy and reliable, but there's been exciting breakthroughs in that area as well.

07:21 Wes Reisz: I looked at when we were collaborating on some notes, this was one of the ones that was down there, but what is a concept-based model explanation? I went through there and checked a little bit at it, but I guess I didn't quite follow what is exactly meant by concept-based model explanations?

07:36 Carin Meier: I guess they always have abbreviations from this. That's TCAV, and this is out from Google, of course, doing a lot of great research in this area. It's bridging the gap between interpretability. In your traditional models, you'd have this person who has high blood pressure. We could point to all the little factors and then follow them through in a big decision tree. If/then. Here you get the answer at the end. You could really point the way and follow it like a ball through a maze to the end.

08:11 Carin Meier: In these deep learning models, of course, you've just got this huge black box full of billions or trillions of connections. You ask when you get the model out at the end, how could you possibly get to this answer? This approach, as I understand it has these concepts like high blood pressure, being an additional concept vector that's added to the input that then makes it easier to interpret it and be able to follow through those decisions. It's an approach to the interpretability to vectorize it and blended in to almost like a symbolic blend, but people would probably argue with that.

08:55 Wes Reisz: It's using the domain to actually explain the model itself right?

08:59 Carin Meier: Right.

09:01 Wes Reisz: I mentioned in the intro that you work at Reify Health. What are some of the things that you all are doing there?

09:05 Carin Meier: Yeah. Reify Health, we focus on a particular bottleneck to the clinical trial industry. We're all very interested in how fast things get through clinical trials and not only the COVID vaccines but lifesaving cancer therapies for breast cancer, all sorts of horrible diseases. There's potential lifesaving treatments out there. The faster we can get it through clinical trials and understand if they're going to work or not, the better for everybody. Our company works on the particular bottleneck of enrollment. Before you get the trial and try the drugs on the patients, this is actually getting enough people enrolled in the trial.

09:52 Carin Meier: There's a lot of opportunity in speeding up that whole process and making it more effective so you can get the trial actually going. That's where we put all our resources. Right now, our team is building out a data pipeline, which is interesting in itself and the healthcare domain, because you have a lot of data privacy and sensitive information. Then you have of course, different countries involved that have different rules about things. Being able to use that data and route it and protect it and being able to leverage it in an analytical fashion with machine learning ... There's a lot of interesting technical challenges. That's where we are. We're working with accelerating enrollment in this area.

10:39 Wes Reisz: As you were talking a bit about the concept-based model explanations and some of the challenges like regions with data, particularly in cases like things like GDPR, there's a lot of challenges with using data in machine learning--accuracy, safety, ethics, all these kinds of things. I thought we'd shift a bit and talk about some of the challenges that exist in working with this data. Let's start off. You mentioned already explainability. The ability in simple English, rather than just weighted numbers through thousands of ... This Plinko board going through, building a machine learning model, but in a way that you can explain it, maybe not simple English, but the domain of the business, be able to explain how a decision is made. Why is that a problem? Why has that traditionally been a problem with deep learning machine learning?

11:26 Carin Meier: I think it's just the scale. We've got random forests too, that might have this problem as you get to scale as well. I think it's anything where you get beyond somebody being able to sit down and look at a computer programming model or a flow sheet, or however, you want to describe it. Being not able to fully understand how a computer program got to the answer. Certainly with the deep learning models where you've got everything vectorized, you've got nonlinearity flowing through huge parameters and you get to the end, and it says, hey, that's a cat.

12:06 Wes Reisz: The way that I like to always envision it is back in my software experience, I remember building rules engines. Rules engines, you could retrace the path and be able to say because of this decision, we had the next decision, we had the next decision. Those were great and we built them larger and larger and larger. Then all of a sudden convolutional neural networks came along and we could replace this massive rules engine with all these different, again, Plinko boards on how things bounce through the system with something like the convolutional neural network, which was great. It was a lot less code and it was a lot easier to manage from the rules engine, but how it got to that result was lost. The things like what you talked about with explainability with that concept-based model explanation seemed like a way of addressing that. It's not just a nice to have anymore. It's legally required by things like GDPR in the European Union.

13:00 Carin Meier: There's a great conference that goes on every year called NeurIPS. They just had a really great tutorial on interpretability and on these machine learning models. That's actually free out there. I encourage everybody to go out there, especially if you're using machine learning models and interested in this. They went into ... Basically with simple models ... Like you said, with the rules engine, you can trace it through, but once something gets big enough that you can't, you have to move to a post-hoc explainability. You can't trace it from the beginning. You can only look afterward with a percentage. This is why it did what it did. You can see this, they have some nice tools out there, especially with text-based models. When you have a snippet of text and you ask it a question based on that, like who was the President in year X, then it'll light up the highlighted words of how relevant each word was to the answer that it derives. That's post-hoc explainability.

14:03 Carin Meier: You can look afterward and say, this word doesn't look like it's quite right. Then of course you'd have to go through the whole bother of trying to debug it. That's a whole different thing, if you didn't like the answer that it got. It's interesting. If you have that insight into seeing how the model is working, then you can start to address other balances like accuracy and safety. How accurate do you need it to solve your problem? Maybe a machine learning model isn't even worth it to you, if you don't need to be that accurate that you don't need that trade-off. If you do need that accuracy, how can you safely use it? If you have an explanation, can you insert human into the process and have them double-check the answer? I think going down to the core of this, we have wonderful tool machine learning, but it definitely doesn't replace thinking. Thinking just pushes to a broader picture of how can you incorporate this in this process? Do you need to incorporate this in the process? Do you understand your problem? What is your problem? That's the hard stuff.

15:19 Wes Reisz: I like that bit about human and the AI loop because I think a lot of times people think about AI and machine learning is just making all these decisions. Certainly, they do, but in many cases, it's augmenting a human's ability to, I guess, react on data more appropriately. I can remember talking about Stitch Fix, for example. Stitch Fix, it's not in the healthcare space, but it does clothing recommendations for people. There's still an individual there. They use machine learning extensively to give recommended sets of clothes and patterns of things to a person who then makes that final recommendation to the subscriber and the person. I think that's a really good way of thinking about how machine learning and AI is being used. It helps, it augments the person's ability to get to a set of data where the real decision can be made faster, I think.

16:11 Carin Meier: Exactly. I think the analogy earlier on was, we want machine learning to be like an Iron Man suit.

16:19 Wes Reisz: I like that. I like that. Yeah. Let's talk about the Iron Man suit. What are some of the challenges with creating this Iron Man suit? Things like you mentioned, accuracy, safety, you've already talked about explainability. What are some of the core challenges on being able to leverage machine learning, deep learning in this healthcare space?

16:36 Carin Meier: I think those are the key things that are holding us back trust, basically. Healthcare and the medical environment is a high-trust environment. Whatever tools that we use to leverage, we need to understand them and to be able to trust them because they're making deep impact on people's lives. The amount of trust that you need to pick a sweater for a person is not the amount of trust that you need to decide whether a person should get a life-saving treatment or not. Google and other big companies are tackling this problem. We need to find ways that we can make sure that privacy on the individual level is being preserved on these models, that they're explainable in some way that we can trust them, and that we can find best practices. I don't like to say best practices in a lot of ways that we can incorporate them into our businesses and our models.

17:41 Carin Meier: I'll just expand on that. The reason I don't like to say best practices it's because people use that as an excuse not to think. They're just like, I don't need to think about this. The best practice is the way we do it. This is our purpose. Our purpose is to be here and to think about our problems and to think about the trade-offs of every solution, and come to the best possible solution. Just taking an off the box answer and saying, we can use this and not thinking about it is doing a disservice to everyone.

18:09 Wes Reisz: Yeah. It leads us into some of the problems we've heard where ML models have gone wrong, for example. That reminds me of a cartoon I remember seeing years ago about design patterns. It was before a developer hears about design patterns, after a developer hears about it, and then after they have more experience leveraging design patterns. The first one, their code's going all over the place. Then the second one everything's a design pattern. Every single design pattern that they could possibly imagine is implemented end to end. Then at the end of it, it's like, here's just some simpler code that may happen to use a pattern. Once you learn about these things... Oh, I have to try to put them everywhere. It's not always the best approach.

18:45 Carin Meier: Right. I think that's led to some of the problems that we've had with machine learning models lately.

18:51 Wes Reisz: Let's talk about safety. In particular, one in the healthcare space that I think seems like a real challenge. I remember a couple of years ago, a few years back, there was a book by Cathy O'Neil Weapons of Math Destruction, that talk about just systemic bias data and reinforcing pre-existing inequity with machine learning models. That went to things like removing things like race, for example, from data sets, when decisions are being made. In healthcare, race can be very important. People with certain ethical backgrounds may be more inclined to having certain diseases like heart disease, for example, or high blood pressure or things like that. How do you balance privacy, safety with things like race when it comes to machine learning models, when it may be important to the decision, but it's been used for reinforcing pre-existing inequity? How do you balance?

19:44 Carin Meier: I think the first step is recognizing that there is a problem with this, and then you have to approach it carefully. Luckily now, it's been circulated that there is a problem in datasets. Just because you put it all in a model doesn't mean that the answer is perfectly free of human bias because we fed it this data. Data in, data out. It doesn't go away just because it's a machine learning model, that's our core truth. Making sure that your data is a good representative set to begin with is your fundamental thing. Of course, with sensitive data like race and ethnicity, you have the additional thing of this has got individuals' very sensitive data that you need to protect this. This is ways that differential privacy... If people haven't heard of it. It's a technique that you can protect an individual person's information while still gathering statistical insights on the whole. You can still get the core learnings that you need, but without compromising the individual's privacy.

20:53 Wes Reisz: The way that I understand differential privacy and correct me if I'm wrong cause I'm sure it's not accurate, but it's like rather than showing someone in individual, you show it in an aggregated set? That way, the privacy of the individual is respected, but the data is still presented. Is that accurate?

21:08 Carin Meier: It's got more math behind it. I'm not a math expert, but it's statistical fuzzing method I guess, is an appropriate way to think about it. There's also ways that you can use this in training the deep learning models in a distributed fashion as well. That way, the machine learning model is trained on that fuzzed data itself. The individual data never actually reaches the final model, which is an important thing as well. I don't want to get too far down in the differential privacy, but that's another technique that's used to be able to safely extract insights into race and ethnicity. That is an important component to making sure that it is not biased. Again, so then there's another process at the end, evaluating your model. Does your model have any bias in either direction? It's all throughout the process. It's at the beginning, looking at your data coming in, how you actually train the model, how you evaluate the model, and then a circular feedback loop. Let's get humans in it and make sure that it's doing the things that we want it to do in a safe manner.

22:24 Wes Reisz: Tying back to what we were talking about, that human in the AI loop before. I think what I've just heard is it's important for humans to audit the decisions that are coming out to make sure that they make sense. Is that accurate?

22:35 Carin Meier: Yeah. Just like any sort of code. You need to test it to make sure that your code is right. That's at the lower levels of, did it actually get the right answer that you wanted. Is the model accurate? Then the higher level, is it trustworthy? Can you explain how did he get this answer? Why did it say that this person should get this treatment? Then how do we make sure that that isn't biased? That's another question. It goes up and up in scope, and how do we safely incorporate this into our business practice? What happens if it's wrong? I think that's one of the reasons why so many people are attracted to this area because the problems are tough and they're important, and they're changing. A lot of people are attracted to computer science in our industry because we like solving problems, and there's a lot of problems to solve in this domain that directly impact everyone.

23:29 Wes Reisz: We seem only to be creating more problems with our society that have to be solved. We talked about privacy, we talked about explainability, we talked about safety, but one we haven't talked about is ethics. Just because we can doesn't mean we should. What are some of the ethical questions that at the heart of machine learning today, deep learning today?

23:48 Carin Meier: Wow. Yeah.

23:51 Wes Reisz: I don't know how I'd answer that question, so go.

23:55 Carin Meier: I think you ask broad questions, you can get broad answers.

24:00 Wes Reisz: Good response. As soon as I said it, I thought that was an unfair question to ask.

24:04 Carin Meier: The answer is, should we?

24:05 Wes Reisz: It depends. Maybe.

24:09 Carin Meier: That's the thing. There was a good example of it in the news. I think it was in England when the whole pandemic hit and people couldn't take their end exams. I think they just said, let's just put a machine learning model on all your prior test exams, and we'll just predict what you would have gotten on this test.

24:29 Wes Reisz: That's a great idea.

24:32 Carin Meier: It's pretty much the same as what you would have gotten. So what? You can't go to college now.

24:40 Wes Reisz: Yeah. I definitely would not have gone to college under that arrangement.

24:44 Carin Meier: Yes, in that case, you can, and we did, but should we, sort of thing.

24:51 Wes Reisz: Do no harm. I think that's a good answer. That's the best way to end it.

24:55 Carin Meier: I know people at various points, people are always like, "Computer science people should be a guild. We should have an ethics statement just like doctors." It's interesting not even getting into whether we should have a guild, but if we had an ethics statement, like the Hippocratic Oath for doctors, what would it be? What would our oath be?

25:16 Wes Reisz: There's a lot of discipline. There's a lot of fields that are in machine learning. I think that there's a perception that to be involved with these data pipelines that do the work that you're doing requires a PhD to be able to... Is that true? Does it require a PhD to be able to get involved and contribute in a meaningful way to things like deep learning solutions to the coronavirus?

25:38 Carin Meier: I would say definitely not. PhDs are helpful. If you have a PhD, please come and help us, but it's not required. I think data engineering as a field is one of the fastest-growing fields, just because we need good engineers. We need good engineers to build out our pipelines and to apply engineering practice to building models, maintaining them. The whole of our software industry has trained us to do, and we need it applied to this. Also we need just curious people generally to innovate. I think you were saying before about design patterns. Once you learn about design patterns, everything's design patterns until... I think a lot of that is with deep learning and deep learning models right now. We've got one dominant model and one dominant way that we're thinking about intelligence. That's not necessarily the best or the only way. We need more people to come to this with curious minds, bring their backgrounds, whether it's philosophy, whether it's game development, whatever it is, so we, as humanity, can press forward and look at all these different solutions and find the best ones.

26:55 Wes Reisz: Absolutely. I come at it from a web developer background. I'm a Java developer who comes from a web environment. These models still have to run. It still takes someone to be able to take that machine learning model, wrap it into a service and be able to operationalize it into a platform. There's so many roles that are needed to be able to tackle the problems that machine learning can help solve. We're at the very beginning of the year, 2020 is in our rearview mirror, thankfully. What do you hope that we're going to solve in 2021? What do you think are the big things we're set to solve?

27:29 Carin Meier: I'm going to bring it down to the scope of machine learning.

27:32 Wes Reisz: Yeah, there you go. Sorry. Let me qualify that. What are some of the things in the machine learning and deep learning space that you think we're poised to solve in 21?

27:39 Carin Meier: Trust. Building trust in these techniques and in the models so we can use them responsibly and effectively in our healthcare and other areas that we need high trust. That's a big, big gap right now for us.

27:57 Wes Reisz: Carin, thank you so much. I think we've been working on this since the end of last year. Thank you for working on this with me through the holidays and into the New Year. It was fun to sit down and chat finally.

28:07 Carin Meier: Thanks again.

Excerpt from:
Carin Meier Using Machine Learning to Combat Major Illness, such as the Coronavirus - InfoQ.com