Machine Learning – Mediaboss Marketing

New AI Software Makes Us Happier by Analyzing Facing Expressions – Finance Magnates

What was in the past just a figment of the imagination of some of our most famous scientists and writers, machine learning Machine Learning Machine learning is defined as an application of artificial intelligence (AI) that looks to automatically learn and improve from experience without being explicitly programmed. Machine learning is a rapidly growing field that also focuses on the development of computer programs that can access data and use it learn for themselves.This has many potential benefits for most industries and sectors, including the financial services industry. Machine Learning ExplainedMachine learning can be explained through observational behavior. For example, the process of learning begins with observations or data.This includes examples and indirect experience or instruction to help detect patterns in data. In doing so, the goal is to make better decisions in the future based on the examples that are provided. In an ideal set of circumstances, computers learn automatically without human intervention or assistance and adjust actions accordingly.Machine learning can take two different form, i.e. supervised or unsupervised learning. Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events. As such, the system is able to provide targets for any new input after sufficient levels of training. Learning algorithm can also compare its output to find errors in order to modify the model accordingly.By extension, unsupervised machine learning algorithms are used when the information used to train is neither classified nor labeled. Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. The system doesnt figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data. Machine learning is defined as an application of artificial intelligence (AI) that looks to automatically learn and improve from experience without being explicitly programmed. Machine learning is a rapidly growing field that also focuses on the development of computer programs that can access data and use it learn for themselves.This has many potential benefits for most industries and sectors, including the financial services industry. Machine Learning ExplainedMachine learning can be explained through observational behavior. For example, the process of learning begins with observations or data.This includes examples and indirect experience or instruction to help detect patterns in data. In doing so, the goal is to make better decisions in the future based on the examples that are provided. In an ideal set of circumstances, computers learn automatically without human intervention or assistance and adjust actions accordingly.Machine learning can take two different form, i.e. supervised or unsupervised learning. Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events. As such, the system is able to provide targets for any new input after sufficient levels of training. Learning algorithm can also compare its output to find errors in order to modify the model accordingly.By extension, unsupervised machine learning algorithms are used when the information used to train is neither classified nor labeled. Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. The system doesnt figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data. Read this Term and AI have without a doubt taken root in almost everything smart.

AI is now being used to not only solve a wide range of modern and common problems, but also to assist in the wellbeing of the human mind.

Recently, developers have attempted to use AI to make us happier, but can these applications help us?

In the early 1930s, at the height of the Second World War, British cities were taking heavy casualties by constant German air raids. The Germans were so effective with blitzkrieg and with the secretary of their war plans that at one point during the war, they cornered the entire British army at the beaches of a French coastal town called Dunkirk.

Will Autonomous Vehicle Makers Get Back into Gear in 2022? – InformationWeek

Like many manufacturers, autonomous vehicle (AV) developers suffered through a 2021 that was rocked by semiconductor shortages, global supply chain disruptions, eroding customer confidence, and other challenges.

Heading into 2022, the AV industrys biggest challenges will continue to be a disrupted supply chain, chip shortages, and a skeptical public. AVs rely on AI technology in the form of graphics processor units (GPUs) to handle deep learning and machine learning tasks. Those chips are advancing with Qualcomms SnapDragon being a big one and NVIDIA in the space as well with the TX2/Jetson models, says Chris Mattmann, CTIO at the NASA Jet Propulsion Laboratory. With the supply chain crisis that includes chip manufacturing, getting these chips and many of them per vehicle is even more important in the autonomous vehicle industry than it is in the consumer sector.

Moving into 2022, many AV manufacturers hope to build trust with increasingly skeptical lawmakers and consumers. Phil Koopman, a Carnegie-Mellon University associate professor with appointments in the department of electrical and computer engineering and with the Robotics Institute, believes that Tesla's use of vehicle owners as beta testers is reckless and damaging to the image of the entire autonomous vehicle industry. Reckless, because [drivers] are running stop signs, running red traffic lights, and veering across centerlines on public roads, he explains. Tesla is using civilian drivers who are neither specifically trained in testing safety nor operating according to best practices for road testing safety.

Koopman says that the stance taken by the entire AV industry to push back hard against any requirement to follow safety standards further erodes public trust. He notes that manufacturers face a choice in 2022 and beyond. They can continue to take an adversarial approach with regulators and have a problem when a high-profile crash forces regulators to intervene, or they can take a cooperative approach now while they still have time.

An excellent first step, Koopman says, would be for AV developers to voluntarily agree to follow the SAE J3018 standard for safe road tests. The industry itself wrote that standard based on lessons learned from the Uber ATG testing fatality in Tempe, Arizona, but there is not a single AV company that will publicly pledge to follow that standard.

The biggest challenge for vehicle manufacturers and their technology partners is developing models that can deliver a true autonomous driving experience. Within the AV industry, full autonomy is referred to as Level 5 Advanced Driver Assistance Systems (ADAS). At Level 5, there is no human intervention required and the vehicle is fully capable of driving itself, says Matt Desmond, automotive principal industry analyst at business advisory firm Capgemini Americas.

None of the AVs marketed to be sold in the next few years will incorporate Level 5 ADAS. Delivering a truly autonomous vehiclewithout steering wheel, accelerator, or brakesis a steep technological and safety challenge, and there are many significant hurdles to achieving fully autonomous solutions, Desmond says. In the meantime, leading vehicle manufacturers and technology firms are investing massive sums in developing, testing, and refining AV systems in an effort to mitigate technical issues and deliver a robust technology foundation, he notes.

As things currently stand, Level 5 ADAS vehicles may not reach market for at least several years. The reality is that the core technologies of ADAS needs to mature to a point where virtually any scenario can be identified and addressed safely by the autonomous software, Desmond explains. He notes that vehicle manufacturers and technology providers have already driven AVs for thousands of hours to train the onboard software to learn various driving environments. However, there's still much more work to do, especially in scenarios where there is inclement weather, such as snow, mud, sand, or rain, that can possibly interfere with sensors.

Several key issues need to be resolved before Level 5 ADAS vehicles can become mainstream transportation technology. Besides addressing the core technical challenges presented by code complexity, network latency, and hardware gaps, numerous market- and legal-oriented matters must be settled. Taken as a whole, the industry and the ecosystems of business, law, policy, and culture have a long way to go to provide solutions for the mass market deployment of autonomous vehicles, Desmond says.

In a sense, AV developers are facing a chicken and egg scenario, since many potential ADAS challenges can't be fully vetted and resolved until production AVs are released to market, Desmond says. As real production dates for autonomous vehicles are announced, we believe we will see real traction for resolution of these issues from the car and technology companies developing ADAS products, the insurance industry, and federal, state and local regulatory agencies.

While fully autonomous vehicles won't be generally available in the near-term, autonomous features will continue to be added to conventional vehicles, observed Raj Rajkumar, a professor in Carnegie-Mellon University's department of electrical and computer engineering and co-director of the General Motors-Carnegie Mellon Vehicular Information Technology Collaborative Research Lab. The endpoint of full autonomy will not be an overnight revolution, but the final stop in an evolutionary path of progress with multiple milestones, he says.

How AI is improving education, healthcare and farming in India – Moneycontrol.com

Diabetic retinopathy (DR) is one of the main reasons for avoidable blindness among adults in India, which has one of the worlds biggest diabetic populations.Early detection and treatment are critical to limiting the damage but a shortage of well-trained ophthalmologists, especially in rural areas, remains a challenge.

Artificial intelligence (AI) is coming in handy to bridge the gap. Sankara Eye Foundation India, a non-profit that aims to eliminate preventable and curable blindness, has collaborated with Singapore-based Leben Care to deploy a cloud-based AI software platformNetra.AI.

Built on Intel-powered technologies, Netra, which means eyes in Hindi, can help identify the retinal condition in a short time with the accuracy level of human doctors, using deep learning.It can tell a healthy retina from an unhealthy one.

Not just Intel, several tech have turned to AI to help detect DR. Google helps healthcare workers detect diabetic retinopathy, with possibilities of AI algorithms to assist clinicians in identifying other diseases as well.

Microsoft Seeing AI research project is designed for no and low-vision community. Seeing AI uses computer vision, image and speech recognition, natural language processing and machine learning to describe a persons surroundings, read the text and answer questions. It can also identify Indian currency notes.

These are just some of the examples of AI improving the quality of life in India.Not just in India, AI for social good is almost a movement across the world, aimed at providing quality education, welfare measures and healthcare while taking care of the environment.

Delivering the social goods

AI for social good is an increasingly popular theme. When Google Research India, an AI lab based in Bengaluru, was set up in 2019, the focus was to find ways to help build an artificial intelligence ecosystem.

AI has been a subject of immense interest not just for the technological breakthroughs but also for the wrong turns it could take.

Since its origins in the 1950s, AI has been viewed with suspicion. Will it make humanity redundant? Will humans become subservient to machines?

A lot of water has flown under the bridge since the 1950s and extensive research of the last four decades has led us to AI for social good.

Worries surrounding legal, ethical and safety issues forced the tech ecosystem to look for ways that AI could be shaped to benefit people and society at large.

Be it digital healthcare, advanced education, forecasting floods, wildlife conservation, securing marine life, or predicting wildfires, AI has a role to play.

Powering learning

Shalini Kapoor, IBM Fellow, IBM India Software Labs, says AI has penetrated every aspect of our lives. From transforming businesses to making a mark in societal impact, AI is leading from the front.

In the field of education, we believe that AI can enhance learning environments by unlocking learning potential resulting in improved outcomes and better student experiences. IBM is working with several state governments and educational institutes to ensure AI is part of the curriculum to impart AI skills at an early age, she says.

IBM has partnered with the Ministry of Education and NITI Aayog on an online initiative for higher education.

Aligned with the Skill India mission to provide last-mile connectivity for quality higher education, smShiksha, an AI-driven, personalised learning platform, is being designed for a holistic learning experience.

It has the potential to scale up to become a single-point source for higher education in India by serving as a virtual campus, Kapoor says.

Along with CBSE, we have also developed a curriculum of artificial intelligence as an elective subject for Classes IX to XII. We have covered over 200 schools across 13 states and over 15,000 students have benefitted from the programme, she told Moneycontrol.

The coronavirus pandemic has accelerated digital adoption. Rohini Srivathsa, National Technology Officer, Microsoft India, says India is on the cusp of a digital revolution.

During the pandemic, we worked closely with several governments and public health authorities to enable citizen services with AI. The government of India's Saathi chatbot and Punjab governments COVA app that helped citizens with critical COVID care information are powered by Azure AI, she says.

If you look at healthcare, Apollos 24x7 virtual healthcare platform uses Azure AI to offer last-mile healthcare service delivery across India, says Srivathsa.

Bridging the language divide

AI and data are at the heart of driving economic and societal inclusion for everyone, including the one billion people with disabilities around the world.

AI4Bharat, an AI startup in partnership with Microsoft, is building AI models for recognising Indian sign languages, creating one of the largest datasets on it. This project has the potential to integrate people with disabilities into the workforce, says Srivathsa.

Google is using AI to help out with languages. Many of us prefer consuming content and getting things done in our mother tongue but language continues to be one of the biggest barriers to access information on the web.

Solving for languages, in particular Indian languages, represents many challenges, says Partha Talukdar, a staff research scientist at Google Research India.

From the complexity and variation of scriptsfrom Urdu to Malayalam to Gujaratito the sheer number of languages across the country, it poses an interesting challenge for AI and machine learning.

It is also something we at Google have been focusing on for a long time, from launching Neural Machine Translation for 11 Indian languages back in 2017 to now giving users the option to access high-quality web pages originally written in other languages and see it in their preferred Indian language, says Talukdar.

Scientists at Google Research India have come up with a new language model called Multilingual Representations for Indian Languages (MuRIL).

MuRIL helps in handling challenges like transliterations, spelling variations, mixed languages, and other usages often observed in the Indian context.

MuRIL supports 16 Indian languages as well as English. We have also adapted MuRIL for better query understanding for spoken query understanding, says Googles Talukdar.

Srinivas Lingam, Intels VP, Datacenter & AI Group, says AI is improving English-language proficiency for some of Indias most needy children.

A project weve been involved with is the development of ReadToMe, an AI-enabled software platform, which is supporting more than 25,000 Indian schools. Implemented for students from grades 1 through 12, classes using ReadToMe displayed a 2040 percent improvement in English reading and comprehension skills, he says.

Googles Read Along, earlier known as Bolo, helps kids learn to read. First tested in India in 2018 and rolled out across the world, Read Along is an AI-enabled Android app. The virtual assistant in the app reads a story then listens to the child narrating it back.

Using AI-based speech recognition and machine learning, the app coaches the student on pronunciation. Googles internal analysis shows that after reading 100 minutes on the app, beginner readersthose who read at a speed of less than 45 correct words a minutesee an improvement from 38 percent to 88 percent in reading fluency.

Intelligent farming

Elsewhere, AI is helping Indias agricultural sector. For instance, an AI algorithm can help farmers detect where pests and insects will land in a field by checking the direction of winds, helping them optimise where they plant their crops, says Intels Lingam.

According to Microsofts Srivathsa, AI is removing guesswork involved in agriculture by enabling data-driven farming.

With the help of machine learning algorithms and low-cost sensors, our data and AI give farmers a real-time view of soil conditions and moisture, enabling farmers to increase productivity and lower costs, she says.

Google is working on a range of climate-related efforts such as weather alerts, flood forecasting and air quality in India that can provide timely and important information.

It is also expanding its machine-learning-based flood forecasting it launched in 2018 to help combat the damage by equipping those in harms way with accurate and detailed alerts.

Microsoft is working with non-profit SEEDS (Sustainable Environment and Ecological Development Society) to protect vulnerable populations affected by climatic hazards by leveraging AI capabilities.

With a cloud and AI-based model that predicts cyclonic activity, the project is helping vulnerable communities from cyclone-prone areas reach safe grounds on time.

The AI model called Sunny Lives has received technical and financial support from Microsofts AI for Humanitarian Action grant and was deployed at scale during Cyclone Yaas in Odisha earlier this year.

SEEDS is exploring its use cases across many countries in Southeast Asia to cope with other weather challenges, including heatwaves.

More here:
How AI is improving education, healthcare and farming in India - Moneycontrol.com

December 28th, 2021

Comments off

Development and validation of simplified machine learning algorithms to predict prognosis of hospitalized COVID-19 patients: a multi-center,…

J Med Internet Res. 2021 Dec 19. doi: 10.2196/31549. Online ahead of print.

ABSTRACT

BACKGROUND: The current COVID-19 pandemic is unprecedented; under resource-constrained setting, predictive algorithms can help to stratify disease severity, alerting physicians of high-risk patients, however there are few risk scores derived from a substantially large EHR dataset, using simplified predictors as input.

OBJECTIVE: To develop and validate simplified machine learning algorithms which predicts COVID-19 adverse outcomes, to evaluate the AUC (area under the receiver operating characteristic curve), sensitivity, specificity and calibration of the algorithms, to derive clinically meaningful thresholds.

METHODS: We conducted machine learning model development and validation via cohort study using multi-center, patient-level, longitudinal electronic health records (EHR) from Optum COVID-19 database which provides anonymized, longitudinal EHR from across US. The models were developed based on clinical characteristics to predict 28-day in-hospital mortality, ICU admission, respiratory failure, mechanical ventilator usages at inpatient setting. Data from patients who were admitted from Feb 1, 2020 to Sep 7, 2020, is randomly sampled into development, validation and test datasets; data collected from Sep 7, 2020 through Nov 15, 2020 was reserved as post-development prospective test dataset.

RESULTS: Of 3.7M patients in the analysis, a total of 585,867 patients were diagnosed or tested positive for SARS-CoV-2; and 50,703 adult patients were hospitalized with COVID-19 between Feb 1 and Nov 15, 2020. Among the study cohort (N=50,703), there were 6,204 deaths, 9,564 ICU admissions, 6,478 mechanically ventilated or EMCO patients and 25,169 patients developed ARDS or respiratory failure within 28 days since hospital admission. The algorithms demonstrated high accuracy (AUC = 0.89 (0.89 0.89) on test dataset (N=10,752)), consistent prediction through the second wave of pandemic from September to November (AUC = 0.85 (0.85 0.86) on post-development prospective test dataset (N= 14,863)), great clinical relevance and utility. Besides, a comprehensive 386 input covariates from baseline or at admission were included in the analysis; the end-to-end pipeline automates feature selection and model development. The parsimonious model with only 10 input predictors produced comparably accurate predictions; the ten predictors (age, BUN, SpO2, blood pressures, respiration rate, pulse, temperature, albumin, cognitive disorder) are both commonly measured and concordant with recognized risk factors for COVID-19.

CONCLUSIONS: The systematic approach and rigorous validations demonstrate consistent model performance to predict even beyond the time period of data collection, with satisfactory discriminatory power and great clinical utility. Overall, the study offers an accurate, validated, and reliable prediction model based on only ten clinical features as a prognostic tool to stratifying COVID-19 patients into intermediate, high and very high-risk groups. This simple predictive tool could be shared with a wider healthcare community, to enable service as an early warning system to alert physicians of possible high-risk patients, or as a resource triaging tool to optimize healthcare resources.

CLINICALTRIAL: Not applicable.

PMID:34951865 | DOI:10.2196/31549

See the original post:
Development and validation of simplified machine learning algorithms to predict prognosis of hospitalized COVID-19 patients: a multi-center,...

December 28th, 2021

Comments off

Meenakshi Kaushik and Neelima Mukiri on Responsible AI and Machine Learning Algorithm Fairness – InfoQ.com

Subscribe on: Apple Podcasts Google Podcasts Soundcloud Spotify Overcast Podcast Feed

Srini Penchikala: Hi everyone. My name is Srini Penchikala. I am the lead editor for AI/ML and data engineering community at InfoQ website. Thank you for tuning into this podcast. In today's podcast, I will be speaking with Meenakshi Kaushik, and Neelima Mukiri, both from Cisco team. We will be talking about machine learning algorithm bias, and how to make machine learning models fair and unbiased.

Let me first introduce our guests, Meenakshi Kaushik currently works in product management team at Cisco. She leads Kubernetes and AI/ML product offerings in the organization. Meenakshi has interest in AI/ML space and is excited about how the technology can enhance human wellbeing and productivity. Thanks, Meenakshi, for joining me today. And Neelima Mukiri is a principal engineer at Cisco. She is currently working in Cisco's on-premise and software service container platforms. Thank you both for joining me today in this podcast. Before we get started, do you have any additional comments about your research and projects you've been working on, that you would like to share with our readers?

Meenakshi Kaushik: Hi everyone. My name is Meenakshi. Thank you, Srini, for inviting Neelima and me to the podcast. And thank you for that great introduction. Other than what you have mentioned, I just want to say that it is exciting to see how machine learning is getting more and more mainstream. And we see that in our customers. So, this topic and this conversation is super important.

Neelima Mukiri: Thank you, Srini, for that introduction and thank you, Meenakshi. Definitely we are very excited about the evolution of AI/ML, and especially in the Kubernetes community. How Kubernetes is making it easier to handle MLOps. I'm excited to be here to talk about fairness and reducing bias in machine learning pipelines, as ML becomes more pervasive in the society. That's a very critical topic for us to focus on.

Srini Penchikala: Thank you. Definitely I'm excited to discuss this with you today. So, let's get started. I think the first question, Meenakshi, maybe you can start us off with this. So, how did you get interested in machine learning, and what would you like to accomplish by working on machine learning projects and initiatives?

Meenakshi Kaushik: Machine learning has been there for a while. What got me excited is when I started seeing real world use cases getting more and more deployed. So, for example, I remember a much change when I saw Amazon's recognition, and it could recognize facial expression, and tell you your mood. What I took back from that is, "Oh, isn't that helpful? You can change somebody's mood by making them aware that today you're not looking so happy." So, that was pretty groundbreaking. And then more and more applications came along, especially in image recognition, where you could tell about patients' health, and that became more and more real. And we, as Neelima pointed out, that got mainstream even with our customers, with the evolution of Kubernetes and Kubeflow. So, both these spaces together, where it became easier and easier to enable data scientists and even ordinary folks to apply machine learning in day to day, really got me excited. And this evolution is progressing, so I feel very happy about that.

Srini Penchikala: How about you, Neelima, what made you get interested in working on machine learning projects, and what are your goals in this space?

Neelima Mukiri: I've always been interested in AI/ML and the possibilities that it opens up. In the recent past years, advances in ML have been so marked compared to 10 years before. There's so much improvement in what you can do, and it's so much more accessible to every domain that we are involved in. The possibilities of self-driving cars, robotics, healthcare, all of these are real world implications that have a chance to affect our day to day lives. In addition to just how exciting the field is, being involved in Kubernetes and in Kubeflow as part of Cisco's container platforms, we've seen our customers be very interested in using Kubeflow to make ML more accessible. And as part of that, we've started working on AI/ML in the context of Kubernetes.

Srini Penchikala: Yeah. Definitely, kubernetes brings that additional dimension to the machine learning projects, to make them more cloud native and elastic and performant, right? So, thank you. Before we jump into the machine learning bias and fairness, which is kind of main focus of your discussion here, can you define AI fairness? What does it mean by AI fairness? And talk about a couple of examples of fair AI solutions, and an example of ML solution where it hasn't been fair.

Meenakshi Kaushik: The fairness comes into the picture when you start subdividing the population. So for example, this is an example which we gave even in our KubeCon presentation, where let's say a bank is looking at giving loans to the entire population. And it decides that 80% of the time it is accurate. So, overall in the population, things behave normally. But when you start looking at subsection of population, you want to see whether the subsection of population are equally getting represented in the overall decision making.

So, let's say if 80% of the time, the loan application gets accepted, if you started slicing and dicing at a broad level between, let's say, male and female, 80% of the time equally, do they get accepted? Or within the population where people who had previous loans, but defaulted, whether they get equally represented or not? So, fairness is about looking at a broad solution and then slicing and dicing into a specific subgroup, whether it is based on gender or racial differences or age differences. For example, with COVID vaccine, if it was tested on adults, it doesn't work on children that well. So, it's not fair to just push your machine learning data to children until you have looked at that population, and it's fair to that population. So, fairness is about equity, and it's really in the context of the stakeholder. A stakeholder decides at what level they want to define fairness, and what groups it wants to figure out, whether it is fair across or not.

Srini Penchikala: That's a good background on what is fair and what is not, right? So, maybe Neelima, you can probably respond to this. In your presentation at KubeCon conference last month, you both mentioned that the sources of unfair algorithm bias are data, user interactions, and AI/ML pipeline. Can you discuss more about these three sources and the importance of each of them in controlling the unfair bias and contributing to unfair bias?

Neelima Mukiri: So, data is the first step where you are bringing in real world information into a machine learning pipeline. So, let's say you take the same example of deciding whether a person is creditworthy for a bank to give a loan or not. You populate the machine learning pipeline with data from previous customers, and the decisions made for the previous customers, whether to give them a loan or not. So, the data comes in from the real world, and it is filled with bias that is present in our real world because the world is evolving and what we consider as fair a few years back, may not be fair today. And we are all prone to prejudices and biases that we put into our decision making process. So, the first step in the machine learning pipeline is that collection of the data and processing the data, where we have the chance to evaluate and see what are the biases that are present.

For example, are there large subsets of the population that are missing in the dataset that we have collected? Or is the data skewed towards being more positive for some subset of a population? So, that's the first step where we want to evaluate, understand and when possible, improve the bias that is present.

The next step is, in the machine learning pipeline as we build the model and as we are serving the model, every step of the pipeline we want to make sure that we are considering and evaluating the decisions that are made in the pipeline, the models that are built and the inference that is provided, to evaluate and see, is it being fair across the population set that you're covering? And when you bring that decision to a user and present it, that can in turn reinforce bias by the user acting on it.

So, let's say, you're looking at policing and you are giving a wrong prediction of somebody being prone to commit a crime. It's possible that the police will actually do more enforcement in that region, and that ends up assigning more people in that region as possible to create a crime, and then that feeds into your data and the cycle is reinforced. So, every step in the pipeline right from starting from where you collect your data, to building models, providing inference and then seeing how people act based on those inferences, is prone to bias. So, we need to evaluate and correct for fairness where possible.

Srini Penchikala: Maybe, Meenakshi, you can respond to this one, right? So, can you talk about some of the AI/ML fairness toolkits that you talked about in the presentation, and why you both chose Kubeflow for your project?

Meenakshi Kaushik: As we were talking in the beginning of the presentation, we work in the Kubernetes domain. And although there are many machine learning lifecycle manager toolkits available on top of Kubernetes, Kubeflow has gained a lot of traction, and it's used by many of our customers. It's also pretty easy to use. So, we chose Kubeflow since it is one of the popular open source machine learning lifecycle manager toolkit. And really, what it allows you to do, it allows you to build all the way starting from your exploration phase into the production pipeline. It allows you to do everything. You can bring up a notebook and run your machine learning models, and then chain them together in a workflow and deploy it in production. So, that's why we used Kubeflow.

And then on the machine learning toolkits, when we started this journey, we started looking at open source toolkits. And fairness is an upcoming topic, so there is a lot of interest, and there are a lot of toolkits available. We picked the four popular ones because they had a wide portfolio of features for fairness available. And the good thing is that they had many commonalities, but they also had interesting variations, so that it gives you a large variety of toolkits. So, let me quickly talk about the four toolkits. We started by looking at Aequitas. Aequitas fairness toolkit, I would say is the simplest toolkit when you want to get into. You just give your predictions and it will tell you about fairness. It would give you your entire fairness report. So, your prediction and your data and which population you want to look at fairness, the protected group, and it'll just give you the data. So, it offers you an insight as a black box, which is pretty nice.

But what if you want to go next level deeper, or what if you wanted to do interactive analysis? In which case, what I found was that Google's What-If Tool, was pretty nice. In the sense that it is a graphical user interface, and you can do interactive changes to your data to see when it is fair and whether, "Can I get a counterfactual? Can I change the threshold to see if it changes the bias in this subpopulation?" And how it impacts other things. For example, it might impact your accuracy if you try to change your bias threshold. So, What-If Tool is pretty good from that perspective. It is interactive and it will help you with that. Obviously, because it's an interactive toolkit, if you have billions and billions of dataset, you won't be able to pull all of those into this graphical user interface. But there is some strength to having a graphical toolkit.

Then the other toolkits which we looked at are AI 360 degree from IBM, and Microsoft's Fairlearn. And these toolkits are awesome. They don't have the interactive capability or a white box capability of Aequitas, but they have very simple libraries that you can pick and put in any of your machine learning workflow, on I guess, any notebook. In the case of Kubeflow, it's Jupyter notebook, but you could definitely run it on Colab. And now as you are experimenting, you can see graphically using those libraries where their fairness criteria lies.

So, those are four toolkits, and all of these toolkits have strength in doing binary classification, because that's where the machine learning fairness toolkits have started. For other areas like natural language processing and computer vision, things are evolving. So, these toolkits are adding more and more functionality into it. So, that's an overview of the landscape that we looked at.

Srini Penchikala: Neelima, do you have any additional comments on that same topic? Any other criteria you considered for the different frameworks?

Neelima Mukiri: In terms of the toolkits, Meenakshi covered what we looked at primarily. And in terms of the criteria, one of the primary things that we were looking for was how easy is it to run these on your on-prem system versus having to put your data in a cloud. Given a lot of our customers are running their workloads On-prem and they have the data locality restrictions. That was one key thing that was important for us to understand. And all the toolkits we were able to run them on-prem in Kubeflow. Some of them are, especially What-If, is a lot easier to run directly. Go on to the website and run it in a browser, but you have to upload your data there. The other part that we looked at is the programmability or how easy is it to bundle this into a pipeline? And that's where, I think, both Fairlearn and IBM AI 360 are easier to plug into, as well as a bunch of the TensorFlow libraries that are available for bias reduction and detection as well.

Yeah. So, the two axes which we were coming from was, how easy is it to plug in your data to it. And then where can you run it. How easy is it to run it in your machine learning pipeline versus having to have a separate process for it?

Srini Penchikala: So, once you chose the technology, Kubeflow, and also you have the model defined, and you have the requirements finalized, so the most important thing next is the data pipeline development itself, right? So, can you discuss the details of the data pipelines you are proposing as part of the solution to detect the bias, and improve the fairness of these programs, right? So, there are a few different steps you talked about in the presentation such as pre-processing, in-processing and post processing. So, can you provide more details on these steps? And also more importantly, how do you ensure fairness in every step in the data pipeline?

Neelima Mukiri: Basically, we divided the machine learning pipeline into three phases. In-processing, pre-processing, and post processing. Pre-processing is everything that comes before you start building your model. In-processing is what happens while you're building your model, and post processing is, you've built your model and you're ready to serve, is there something that you can do at that point? So, the first part, which is pre-processing is where you look at your data, analyze your data, and try to remove any biases that are present in the data. The type of biases that are better served by handling at the stage, are cases where you have a large skew in the data available for different subgroups. The example that we gave in the presentation was, let's say, you're trying to build a dog classifier, and you train it on one breed of dogs. It's not going to perform very well when you try to give it a different dog breed, right?

So, that's the place where you're coming in with a large skew in the data available per subgroups, trying to remove it at the pre-processing phase itself. The type of biases that are more easy to remove, or better served by removing in the model building phase, are more of the quality of service improvements. So, let's say you're trying to train a medical algorithm to see what type of medicine or treatment regimen works best for a subset of population. You don't really want to give everyone equal medicine or equal type of medication, you want to give them what is best serving their use case, what works well for that subset. So, you actually want to better fit the data.

And that's where doing the bias reduction during the model training phase, which is the in-processing step, works better. And there are a bunch of techniques which are available to improve or to reduce bias in the model training stage, that we talk about in the presentation, like going through and generating an adversarial training step, where you're trying to optimize for both accuracy as well as reducing the bias parameter that you specify.

Now, when we have trained the model, and we've decided on the model that we are going to use, we can still evaluate for bias and improve fairness in the post processing step. And the type of data that is really well suited for that is where you have existing bias in your data. So, the example of where you have the loan processing, where let's say a subgroup of population is traditionally being denied loans even though they are very good at paying back the loan. So, there you can actually go and say, "Hey, maybe their income is less than this threshold, but this population has traditionally been better at paying back the loan than we've predicted, so let's increase the threshold for them." And you're not changing the model, you're not changing the data, you're just increasing the threshold because you know that your prediction has been traditionally wrong.

So, that's the post-processing step, where you can remove that kind of bias better. So, each step of the pipeline, I think it's important to first evaluate and see, and try to remove the bias. And also try different mechanisms, because each thing works better in different scenarios.

Srini Penchikala: Meenakshi, do you have any additional comments on the data pipeline that you both worked on?

Meenakshi Kaushik: Yeah. What happens even before we have the ability to do pre-processing, in-processing or post-processing is, what do we have at hand? So, for example, sometimes we didn't build the model, we just are consumers of the model. In which case, there isn't much you can do other than post-processing. Or can we massage the output of the model to become fair? So, in that case, post-processing is and actually, it works very well in many scenarios.

So, it's not that nothing is lost there, you can still change and make it more fair just by that. Now, sometimes you have access to data, you may not have access to the model. So, in addition to what Neelima is saying about going through the different phases of the pipeline, do not be afraid even if you have a limited view of your infrastructure, or how you are serving your customers. There is still an opportunity where you can massage the data, like at the pre-processing layer.

If you don't have access to the model, but you have the ability to feed to the model the data, that's good. You still have the ability where you can change at the pre-processing level to influence the decision. But it's important to look at what really works. Sometimes the way I look at it is that, ideally it's like security, you shift left, you try to change the earliest pipeline as possible. But sometimes influencing the earlier pipeline may not give you the best result. But ideally that's what you want to do.

First, you want to fix the word so that you get perfect data. But if you cannot get the perfect data, can you massage it so that it is perfect? If that's not possible, then you go lower in the pipeline and say, "Oh, okay, can I change my model?" At times, model changing may not be possible. Then even at the last stage, as we've seen in a variety of examples, it's very good enough where the model may not be fair, but you massage the actual result which you give out to the others by changing some simple thresholds, and make your pipeline fair.

Srini Penchikala: Very interesting. So, still I have a question on the fairness quality assurance, right? Neelima, going back to your example of loan threshold, probably increasing it, because it's been traditionally wrong with the previous criteria. So, how do you decide that, and how do you ensure that that decision is also fair?

Neelima Mukiri: In examples like the bank loan, typically the way to evaluate fairness is, you have one set of data, let's say from your bank, and the decisions that you have made. But let's say, you've denied a loan to a person, and that person's gone and taken a loan with another bank. You actually have real world data about how they performed with that loan. Did they pay back on time or not? So, you can actually come back and say, "Hey, that was a false negative that I said, that I don't want to give a loan to the person, who actually is paying back on time."

So, historic data, you can take it and see how it's performed versus your prediction. And you can actually evaluate what is the real fairness in terms of both the accuracy. And you can easily look at fairness in terms of subpopulations by looking at the positive rates per population. But as a business coming in, you want to optimize for value. So, it's critical to know that you've actually made mistakes, both in terms of accuracy, and there is bias there.

The bias is what has induced the errors in accuracy. First of all, getting that historic data, and then getting a summary of how it's performed across these different dimensions, is the way for you to see what bias exists today. And if you improve it, is it actually going to improve your accuracy as well, and your goal of maximizing profit or whatever your goal is, right? So, tools like Aequitas and What-If, actually give you a very nice summary of the different dimensions. How is accuracy changing as you're changing fairness? How is it changing when you're trying to fit better to the data or when you're trying to change thresholds?

So, I would say evaluate this, run through the system, see the data that it generates, and then decide what sort of fairness reduction that makes sense for you. Because really, it doesn't make sense to say, "Give it to everyone." Because you have a business to run end of the day, right? So, evaluate, see the data and then act on the data.

Srini Penchikala: In that example, financial organizations, they definitely want to predict from an accuracy standpoint to minimize a risk, but also they need to be more responsible and unbiased when it comes to the customers' rights. Okay, thank you. We can now switch gears a little bit. So, let's talk about the current state of standards. So, can you both talk about, what is the current situation in terms of standards or community guidelines? Because responsible AI is still an emerging topic. So, what are some standards in this space that the developer community can adapt in their own organizations to be consistent with fairness? So, we don't want the fairness to be kind of different for different organizations. How can we provide a consistent standard or guideline to all the developers in their organizations?

Meenakshi Kaushik: So, let me just start by saying that, as you mentioned, fairness is still in its infancy, so everybody's trying to figure out. And the good thing is that it's easier to evaluate fairness, because you can look at lines from the subpopulation and see whether it is still doing the same thing as it's doing for the overall population as a whole. Given that the easiest thing which you can do for now, which is commonly done in most of our software and even in hardware, is we have a specification. It tells you, "Oh, these are the performance, it will only accept this many packets per second. These are the number of connections it would take." Things like that. What is a bounding limit under which you would get an expected performance?

And the model now has something called model cards, where you can give a similar specification as to how was the model built? What are the assumptions it made? This was the data it took. This is what the assumption it is making for the bounding limit under which it works, right? This is the data set that it took. For example, if you were doing some kind of medical analysis, and it took a population which is, let's say from India, then it has a specific view of just a specific population. And if you're trying to generalize, or if somebody's trying to use in a generalized setting, a model card which tells you about that, then me as a consumer can be aware of that, and can say, "Aha, okay, I should expect some kind of discrepancy." Currently, those things are not readily available when you go to take some model from open source or from anywhere, for that matter. So, that's the first easy step that I think that can be done in the near term.

In the long term, there has to be perhaps more guidelines. Today there are different ways of mitigating fairness. There is no one step which fits all. However, adding those into the pipelines, what needs to be added is not standardized. What should be standardized is that these are the sets of things your model should run against, right?

So, if your model is doing some kind of an age group across all the age groups, then some of the protected groups should be predefined. "Oh, I should look at children versus pre-teens versus adults, and see if it is performing in the same way." If there is some other kind of disparity, there has to be a common standard that an organization should define, depending on the space they are in. For the bank, for example, it would be based on gender differences, based on the zones that they live in, area zip code they live in, some ethnicity for example. In the case of ofcourse medical, the history is larger. So, those are the near term standards. The broader term standards, I think will take a longer time. Even within machine learning, there are no standard ways to give predictions. You can bring your own algorithms, and you can bring your own things. So, I think we're a little far away on that.

Neelima Mukiri: Yeah. I would echo what Meenakshi said. We were surprised with the lack of any standards. The field is at it's very infancy, right? So, it's evolving very rapidly. There's a lot of research going on. We are still at the phase where we are trying to define what is required versus at a state where we are able to set standards. That said, there are a lot of existing legal and society requirements that are available, in different settings, what's the level of disparity that you can have across different populations? But again, that's very limited to certain domains, certain use cases, maybe in things like, when you are applying for jobs or housing or giving out loans. So, there are fields where there are legal guidelines already in place. In terms of, what is the acceptable bias across different subgroups, that's where we have some existing standards.

But then bring it on to machine learning and AI, there's really no standards there. When we looked at all these different frameworks that are available for reducing bias, one interesting thing is that even the definition of what is bias or what is parity, is different across each of these models. Broadly, they fall into either an allocation bias or QoS or a quality of service improvement. But again, each framework comes and says, "This is the bias that I'm trying to reduce, or these are the set of biases that I allow you to optimize." So again, it makes sense at this stage to actually look at it from multiple angles, and try out and see what works in a specific sub-domain. But as a society, we have a lot of progress to do and ways to go before we can define standards and say, "This is the allowed parity in these domains."

Srini Penchikala: Right. Yeah. Definitely, the bias is contextual and situational, and relative, right? So, we have to take the business context into consideration to see what exactly is bias, right? Can you guys talk about what's the future? You already mentioned a couple of gaps. So, what kind of innovations do you see happening in this space or you would like to see happen in this space?

Meenakshi Kaushik: As Neelima pointed out, we were happy to see that there was fairness. It's easy to not define fairness, but at least evaluate fairness because it's model generated rather than, there is some human in the loop involved, and you can't really evaluate. So, that's good thing. What I am excited to see is that we are continuing fairness across different domains of machine learning, so that it started with, as I said, classification problems, but it is now going more and more towards the problems which are getting increasingly deployed. Anything to do with image recognition, computer vision, for example, and it touches broad areas, from medical to, as Neelima was pointing out, autonomous driving field. So, that I'm really excited to see.

The second thing is that more and more, hopefully the model cards become the way of the future. Every model comes with what it was that was used to generate the model, and what should the expected output be, so that we all can figure out how it is done. Even for advertisements which are served to me. If I know exactly how the model was defined, it's useful information to have. So, I'm excited to see that.

And the toolkits which are developing are also very good. Because right now, these toolkits are one-off toolkits. And when Neelima and I started looking at not only Kubeflow, but researching as to what we want to demonstrate in KubeCon, we were looking at a way of automating in our machine learning pipeline. Similar to how we generate automated hyperparameter, we wanted to automatically modify our machine learning model to now have fairness criteria built-in.

So, currently those things are not totally automated, but I think we're very close. We could just modify some of our routines, similar to the hyperparameter tuning. Now there is a machine learning fairness tuning, so you can tune your model so that you can achieve fairness as well as achieve your business objectives. So, accuracy versus fairness is easily done. So, that's the other area I'm excited to see that we achieve, so that it becomes in-built like hyperparameter tuning. Also do the fairness tuning for this model.

Neelima Mukiri: Yeah. To echo what Meenakshi said, we really need to have more standards that are defined, that we can use across different types of problems. We also want to see standardization in terms of defining fairness, evaluating fairness. And there's a lot of improvement to be done in making it easy to integrate fairness into the pipeline itself. There's work ongoing in Kubeflow, for example, to integrate evaluation of fairness into the inference side of the pipeline, post-processing. And so, we need to be able to build explainable, interpretable models and make it easy for people to build in fairness into the pipelines, so that it's not an afterthought, it's not coming in as someone who's interested in making your model more fair, but it's part of the pipeline. Just as you do training, testing, cross validation, you also need to do fairness as part of the pipeline, as part of the standard development process.

Srini Penchikala: Yeah. Definitely, I agree with you, both of you. So, if there is one area that we can introduce fairness as another dimension, and build the solutions out of the box right from the beginning, to be fair, that area would be machine learning, right? So, thanks, Neelima. Thanks, Meenakshi.

Neelima Mukiri: Thank you for this opportunity to talk to you and talk to your readers on this very exciting topic.

Meenakshi Kaushik: Thank you so much for the opportunity. It was fun chatting with you, Srini. Thank you.

Srini Penchikala: Thank you very much for joining this podcast. Again, it's been great to discuss this emerging topic, and a very important topic in the machine learning space, how to make the programs more fair and unbiased, right? So, as we use more and more machine learning programs in our applications, and as we depend on machines to make decisions in different situations, it's very important to make sure there is no unfairness as much as possible. To one demographic group or another. So, to our listeners, thank you for listening to this podcast. If you would like to learn more about machine learning and deep learning topics, check out the AI/ML and the data engineering community page on infoq.com website. I encourage you to listen to recent podcasts, and check out the articles and news items my team has posted on the website. Thank you.

QCon brings together the world's most innovative senior software engineers across multiple domains to share their real-world implementation of emerging trends and practices. Find practical inspiration (not product pitches) from software leaders deep in the trenches creating software, scaling architectures and fine-tuning their technical leadership to help you make the right decisions. Save your spot now!

.From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

View post:
Meenakshi Kaushik and Neelima Mukiri on Responsible AI and Machine Learning Algorithm Fairness - InfoQ.com

November 26th, 2021

Comments off

Mediaboss Marketing

Archive for the ‘Machine Learning’ Category

New AI Software Makes Us Happier by Analyzing Facing Expressions – Finance Magnates

Will Autonomous Vehicle Makers Get Back into Gear in 2022? – InformationWeek

How AI is improving education, healthcare and farming in India – Moneycontrol.com

Development and validation of simplified machine learning algorithms to predict prognosis of hospitalized COVID-19 patients: a multi-center,…

Meenakshi Kaushik and Neelima Mukiri on Responsible AI and Machine Learning Algorithm Fairness – InfoQ.com

About

Pages

Categories

Media Sites

Recommended Sites

Archives