Archive for the ‘Machine Learning’ Category

Theres No Such Thing As The Machine Learning Platform – Forbes

In the past few years, you might have noticed the increasing pace at which vendors are rolling out platforms that serve the AI ecosystem, namely addressing data science and machine learning (ML) needs. The Data Science Platform and Machine Learning Platform are at the front lines of the battle for the mind share and wallets of data scientists, ML project managers, and others that manage AI projects and initiatives. If youre a major technology vendor and you dont have some sort of big play in the AI space, then you risk rapidly becoming irrelevant. But what exactly are these platforms and why is there such an intense market share grab going on?

The core of this insight is the realization that ML and data science projects are nothing like typical application or hardware development projects. Whereas in the past hardware and software development aimed to focus on the functionality of systems or applications, data science and ML projects are really about managing data, continuously evolving learning gleaned from data, and the evolution of data models based on constant iteration. Typical development processes and platforms simply dont work from a data-centric perspective.

It should be no surprise then that technology vendors of all sizes are focused on developing platforms that data scientists and ML project managers will depend on to develop, run, operate, and manage their ongoing data models for the enterprise. To these vendors, the ML platform of the future is like the operating system or cloud environment or mobile development platform of the past and present. If you can dominate market share for data science / ML platforms, you will reap rewards for decades to come. As a result, everyone with a dog in this fight is fighting to own a piece of this market.

However, what does a Machine Learning platform look like? How is it the same or different than a Data Science platform? What are the core requirements for ML Platforms, and how do they differ from more general data science platforms? Who are the users of these platforms, and what do they really want? Lets dive deeper.

What is the Data Science Platform?

Data scientists are tasked with wrangling useful information from a sea of data and translating business and operational informational needs into the language of data and math. Data scientists need to be masters of statistics, probability, mathematics, and algorithms that help to glean useful insights from huge piles of information. A data scientist creates data hypothesis, runs tests and analysis of the data, and then translates their results for someone else in the organization to easily view and understand. So it follows that a pure data science platform would meet the needs of helping craft data models, determining the best fit of information to a hypothesis, testing that hypothesis, facilitating collaboration amongst teams of data scientists, and helping to manage and evolve the data model as information continues to change.

Furthermore, data scientists dont focus their work in code-centric Integrated Development Environments (IDEs), but rather in notebooks. First popularized by academically-oriented math-centric platforms like Mathematica and Matlab, but now prominent in the Python, R, and SAS communities, notebooks are used to document data research and simplify reproducibility of results by allowing the notebook to run on different source data. The best notebooks are shared, collaborative environments where groups of data scientists can work together and iterate models over constantly evolving data sets. While notebooks dont make great environments for developing code, they make great environments to collaborate, explore, and visualize data. Indeed, the best notebooks are used by data scientists to quickly explore large data sets, assuming sufficient access to clean data.

However, data scientists cant perform their jobs effectively without access to large volumes of clean data. Extracting, cleaning, and moving data is not really the role of a data scientist, but rather that of a data engineer. Data engineers are challenged with the task of taking data from a wide range of systems in structured and unstructured formats, and data which is usually not clean, with missing fields, mismatched data types, and other data-related issues. In this way, the role of a data engineer is an engineer who designs, builds and arranges data. Good data science platforms also enable data scientists to easily leverage compute power as their needs grow. Instead of copying data sets to a local computer to work on them, platforms allow data scientists to easily access compute power and data sets with minimal hassle. A data science platform is challenged with the needs to provide these data engineering capabilities as well. As such, a practical data science platform will have elements of data science capabilities and necessary data engineering functionality.

What is the Machine Learning Platform?

We just spent several paragraphs talking about data science platforms and not even once mentioned AI or ML. Of course, the overlap is the use of data science techniques and machine learning algorithms applied to the large sets of data for the development of machine learning models. The tools that data scientists use on a daily basis have significant overlap with the tools used by ML-focused scientists and engineers. However, these tools arent the same, because the needs of ML scientists and engineers are not the same as more general data scientists and engineers.

Rather than just focusing on notebooks and the ecosystem to manage and work collaboratively with others on those notebooks, those tasked with managing ML projects need access to the range of ML-specific algorithms, libraries, and infrastructure to train those algorithms over large and evolving datasets. An ideal ML platforms helps ML engineers, data scientists, and engineers discover which machine learning approaches work best, how to tune hyperparameters, deploy compute-intensive ML training across on-premise or cloud-based CPU, GPU, and/or TPU clusters, and provide an ecosystem for managing and monitoring both unsupervised as well as supervised modes of training.

Clearly a collaborative, interactive, visual system for developing and managing ML models in a data science platform is necessary, but its not sufficient for an ML platform. As hinted above, one of the more challenging parts of making ML systems work is the setting and tuning of hyperparameters. The whole concept of a machine learning model is that it requires various parameters to be learned from the data. Basically, what machine learning is actually learning are the parameters of the data, and fitting new data to that learned model. Hyperparameters are configurable data values that are set prior to training an ML model that cant be learned from data. These hyperparameters indicate various factors such as complexity, speed of learning, and more. Different ML algorithms require different hyperparameters, and some dont need any at all. ML platforms help with the discovery, setting, and management of hyperparameters, among other things including algorithm selection and comparison that non-ML specific data science platforms dont provide.

The different needs of big data, ML engineering, model management, operationalization

At the end of the day, ML project managers simply want tools to make their jobs more efficient and effective. But not all ML projects are the same. Some are focused on conversational systems, while others are focused on recognition or predictive analytics. Yet others are focused on reinforcement learning or autonomous systems. Furthermore, these models can be deployed (or operationalized) in various different ways. Some models might reside in the cloud or on-premise servers while others are deployed to edge devices or offline batch modes. These differences in ML application, deployment, and needs between data scientists, engineers, and ML developers makes the concept of a single ML platform not particularly feasible. It would be a jack of all trades and master of none.''

As such, we see four different platforms emerging. One focused on the needs of data scientists and model builders, another focused on big data management and data engineering, yet another focused on model scaffolding and building systems to interact with models, and a fourth focused on managing the model lifecycle - ML Ops. The winners will focus on building out capabilities for each of these parts.

The Four Environments of AI (Source: Cognilytica)

The winners in the data science platform race will be the ones that simplify ML model creation, training, and iteration. They will make it quick and easy for companies to move from dumb unintelligent systems to ones that leverage the power of ML to solve problems that previously could not be addressed by machines. Data science platforms that dont enable ML capabilities will be relegated to non-ML data science tasks. Likewise, those big data platforms that inherently enable data engineering capabilities will be winners. Similarly, application development tools will need to treat machine learning models as first-class participants in their lifecycle just like any other form of technology asset. Finally, the space of ML operations (ML Ops) is just now emerging and will no doubt be big news in the next few years.

When a vendor tells you they have an AI or ML platform, the right response is to say which one?. As you can see, there isnt just one ML platform, but rather different ones that serve very different needs. Make sure you dont get caught up in the marketing hype of some of these vendors with what they say they have with what they actually have.

View original post here:

Theres No Such Thing As The Machine Learning Platform - Forbes

Machine learning results: pay attention to what you don’t see – STAT

Even as machine learning and artificial intelligence are drawing substantial attention in health care, overzealousness for these technologies has created an environment in which other critical aspects of the research are often overlooked.

Theres no question that the increasing availability of large data sources and off-the-shelf machine learning tools offer tremendous resources to researchers. Yet a lack of understanding about the limitations of both the data and the algorithms can lead to erroneous or unsupported conclusions.

Given that machine learning in the health domain can have a direct impact on peoples lives, broad claims emerging from this kind of research should not be embraced without serious vetting. Whether conducting health care research or reading about it, make sure to consider what you dont see in the data and analyses.

advertisement

One key question to ask is: Whose information is in the data and what do these data reflect?

Common forms of electronic health data, such as billing claims and clinical records, contain information only on individuals who have encounters with the health care system. But many individuals who are sick dont or cant see a doctor or other health care provider and so are invisible in these databases. This may be true for individuals with lower incomes or those who live in rural communities with rising hospital closures. As University of Toronto machine learning professor Marzyeh Ghassemi said earlier this year:

Even among patients who do visit their doctors, health conditions are not consistently recorded. Health data also reflect structural racism, which has devastating consequences.

Data from randomized trials are not immune to these issues. As a ProPublica report demonstrated, black and Native American patients are drastically underrepresented in cancer clinical trials. This is important to underscore given that randomized trials are frequently highlighted as superior in discussions about machine learning work that leverages nonrandomized electronic health data.

In interpreting results from machine learning research, its important to be aware that the patients in a study often do not depict the population we wish to make conclusions about and that the information collected is far from complete.

It has become commonplace to evaluate machine learning algorithms based on overall measures like accuracy or area under the curve. However, one evaluation metric cannot capture the complexity of performance. Be wary of research that claims to be ready for translation into clinical practice but only presents a leader board of tools that are ranked based on a single metric.

As an extreme illustration, an algorithm designed to predict a rare condition found in only 1% of the population can be extremely accurate by labeling all individuals as not having the condition. This tool is 99% accurate, but completely useless. Yet, it may outperform other algorithms if accuracy is considered in isolation.

Whats more, algorithms are frequently not evaluated based on multiple hold-out samples in cross-validation. Using only a single hold-out sample, which is done in many published papers, often leads to higher variance and misleading metric performance.

Beyond examining multiple overall metrics of performance for machine learning, we should also assess how tools perform in subgroups as a step toward avoiding bias and discrimination. For example, artificial intelligence-based facial recognition software performed poorly when analyzing darker-skinned women. Many measures of algorithmic fairness center on performance in subgroups.

Bias in algorithms has largely not been a focus in health care research. That needs to change. A new study found substantial racial bias against black patients in a commercial algorithm used by many hospitals and other health care systems. Other work developed algorithms to improve fairness for subgroups in health care spending formulas.

Subjective decision-making pervades research. Who decides what the research question will be, which methods will be applied to answering it, and how the techniques will be assessed all matter. Diverse teams are needed not just because they yield better results. As Rediet Abebe, a junior fellow of Harvards Society of Fellows, has written, In both private enterprise and the public sector, research must be reflective of the society were serving.

The influx of so-called digital data thats available through search engines and social media may be one resource for understanding the health of individuals who do not have encounters with the health care system. There have, however, been notable failures with these data. But there are also promising advances using online search queries at scale where traditional approaches like conducting surveys would be infeasible.

Increasingly granular data are now becoming available thanks to wearable technologies such as Fitbit trackers and Apple Watches. Researchers are actively developing and applying techniques to summarize the information gleaned from these devices for prevention efforts.

Much of the published clinical machine learning research, however, focuses on predicting outcomes or discovering patterns. Although machine learning for causal questions in health and biomedicine is a rapidly growing area, we dont see a lot of this work yet because it is new. Recent examples of it include the comparative effectiveness of feeding interventions in a pediatric intensive care unit and the effectiveness of different types of drug-eluting coronary artery stents.

Understanding how the data were collected and using appropriate evaluation metrics will also be crucial for studies that incorporate novel data sources and those attempting to establish causality.

In our drive to improve health with (and without) machine learning, we must not forget to look for what is missing: What information do we not have about the underlying health care system? Why might an individual or a code be unobserved? What subgroups have not been prioritized? Who is on the research team?

Giving these questions a place at the table will be the only way to see the whole picture.

Sherri Rose, Ph.D., is associate professor of health care policy at Harvard Medical School and co-author of the first book on machine learning for causal inference, Targeted Learning (Springer, 2011).

See the article here:

Machine learning results: pay attention to what you don't see - STAT

AI vs machine learning: What is the difference between them? – ValueWalk

Artificial intelligence and machine learning are two of the hottest buzzwords in the technology world. They have become so pervasive in our lives that we dont even realize that we are using AI and machine learning several times a day. We often use the two terms interchangeably, without realizing that they are not exactly the same thing. In this AI vs machine learning comparison, lets delve into what they are and how they differ.

geralt / Pixabay

Technology heavyweights such as Google, Facebook, Tesla, Apple, and Microsoft are pouring billions of dollars every year to improve the AI and machine learning capabilities of their products and services. AI is used in all sorts of things ranging from robots that are stealing our jobs to web searches and more.

Artificial intelligence is an umbrella term, which means the artificial ability to think. AI algorithms can mimic the cognitive functions of humans to perform a given task in an intelligent manner. AI can be integrated into a system to give machines the cognitive ability to perform tasks.

AI works on if-then statements, which are rules established by programmers. The if-then statements are often called rules engines or knowledge graph. Depending on the purpose its built for, the AI will take up the information you provide, process it based on pre-determined rules, and give you the output.

There are a number of ways AI algorithms can simulate human intelligence. Your smartphone, bank, smart speaker, smart TV, and other items use artificial intelligence on a daily basis. It promises to bring major changes to medical diagnosis, entertainment, self-driving cars, and more in the next few years.

Machine learning is a subset of AI, meaning all ML is AI but not all AI is ML. Unlike the knowledge graph and rules engines of AI, machine learning is capable of learning from the experiences/data its exposed to. It can modify its own algorithms to evolve without requiring any human intervention.

Think of ML as a newborn baby. As its exposed to different experiences, it begins to form its own understanding of the world, and adjusts itself continuously to thrive in that world. The machine learning algorithms try to minimize error and maximize accuracy.

Technology giants such as Nvidia, Google, Amazon and Microsoft are focusing much of their efforts on machine learning. It will increasingly give machines the ability tothink like humans.

Machine learning is currently being used in self-driving cars, web searches, emails, smart speakers, medicine, genetics, and much more. The ML programs are helping marketers better understand consumer behavior. They are also helping scientists discover how the human genome works.

Googles services such as web search, email, and Maps use machine learning to offer a personalized experience. Gmails algorithms can predict and show you what youll likely reply to an email. Its not always accurate, but its getting better with time as Googles machine learning algorithms continue to learn from billions of email communications.

Artificial intelligence makes machines smart, giving them the ability to mimic cognitive functions of humans. Machine learning is the enabler for AI, allowing the programs to constantly learn and tweak their own algorithms to get better over time. Machine learning has become the fastest-growing subset of AI.

Read the original post:

AI vs machine learning: What is the difference between them? - ValueWalk

Israelis develop ‘self-healing’ cars powered by machine learning and AI – The Jerusalem Post

Even before autonomous vehicles become a regular sight on our streets, modern cars are quickly resembling sophisticated computers on wheels.Increasingly connected vehicles come with as many as 150 million lines of code, far exceeding the 145,000 lines of code required to land Apollo 11 on the Moon in 1969. Self-driving cars could require up to one billion lines of code.For manufacturers, passengers and repair shops alike, vehicles running on software rather than just machines represent an unprecedented world of highly complex mobility. Checking the engine, tires and brakes to find a fault will certainly no longer suffice.Seeking to build trust in the new generation of automotive innovation, Tel Aviv-based start-up Aurora Labs has developed software for what it calls the self-healing car a proactive and remote system to detect and fix potential vehicle malfunctions, and update and validate in-car software without any downtime.(From left) Aurora Labs co-founder & CEO Zohar Fox; co-founder & COO Ori Lederman; and EVP Marketing Roger Ordman (Credit: Aurora Labs)The automotive industry is facing its biggest revolution to date, Aurora Labs co-founder and chief operating officer Ori Lederman told The Jerusalem Post. The most critical aspect of all that sophistication and software coming into the car is whether you can trust it, even before you hand over complete autonomy to the car. It poses a lot of challenges to car-makers.New challenges, Lederman added, include whether software problems can be detected after selling the vehicle, whether problems can be solved safely and securely, and whether defects can be solved without interrupting car use. In 2018, some eight million vehicles were recalled in the United States due to software-based defects alone.The human body can detect when something is not quite right before you pass out, said executive vice president of marketing Roger Ordman. The auto-immune system indicates something is wrong and what can be done to fix it: raise your temperature or white blood count. Sometimes the body can do a self-fix, and sometimes thats not enough and needs an external intervention.Our technology has the same kind of approach detecting if something has started to go wrong before it causes a catastrophic failure, indicating exactly where that problem is, doing something to fix it, and keeping it running smoothly.The companys Line-Of-Code Behavior technology, powered by machine learning and artificial intelligence, creates a deep understanding of what software is installed on over 100 vehicle Engine Control Units (ECU), and the relationship between them. In addition to detecting software faults, the technology can enable remote, over-the-air software updates without any downtime.Similar to silent updates automatically implemented by smartphone applications, Ordman added, car manufacturers will be able to update and continuously improve software running on connected vehicles. Of course, manufacturers will be required to meet stringent regulations, developed by bodies including the UNECE, concerning cybersecurity and over-the-air updates.When we joined forces and started developing the idea, we knew our technology was applicable to any connected, smart device or Internet of Things device, said Lederman. The first vertical we wanted to start with is the one that needs us the most, and the biggest market. The need for detecting, managing, recovering and being transparent about software is by far the largest need in the automotive industry as they move from mechanical parts to virtual systems run by lines of code.Rather than requiring mass recalls, Aurora Labs self-healing software will be able to apply short-term fixes to ensure continued functionality and predictability, and subsequently implement comprehensive upgrades to the vehicles systems.The company, which has raised $11.5 million in fund-raising rounds since it was founded in 2016 by Lederman and CEO Zohar Fox, is currently working to implement its technology with some of the worlds leading automotive industry players, including major car-makers in Germany, the United States, Korea and Japan.The fast-growing start-up also has offices in Michigan and the North Macedonian capital of Skopje, and owns a subsidiary near Munich.Customers ought to start being aware of how sophisticated their cars are, said Lederman. When they buy a new car, they should want to ask the dealership that they have the ability to detect, fix and recover so they dont need to go the dealership. Its something they would want to have. Just as the safety performance of cars in Europe are ranked according to the five-star NCAP standard, Ordman believes there should be an additional star for software safety and security.There should be as many self-healing systems in place as possible to enable that, when inevitably something does go wrong, there are systems in place to detect and fix them and maintain uptime, said Ordman.Does the software running in the vehicle have the right cybersecurity in place? Does it have right recovery technologies in place? Can it continuously and safely improve over time?With these functionalities, youre not just dealing with five stars of the physical but adding another star for the software safety and security. It is about giving the trust to the consumer. Im getting a car that will safeguard me and my family as I move forward.

See original here:

Israelis develop 'self-healing' cars powered by machine learning and AI - The Jerusalem Post

Press the right buttons on machine learning – The Australian Financial Review

Nguyen says AI can be a rather challenging type of technology for people to fully understand because its a category that contains many different types.

You cant paint it with a single brush. There are some areas that are going to be quite challenging and it is those that people tend to associate with the term. However, just because a particular AI has been built to, for example, master a game, doesnt mean it is suddenly going to make decisions totake over the human race.

Nguyen also acknowledges that people have concerns about the potential for AI to cause widespread job losses as it automates tasks that previously have required a human.

While there will be job losses, there will also be new opportunities created as a result of its use, he says. When it comes to jobs, its important to consider which ones will be affected and what alternatives there are for those involved in the change.

Everything comes with balance and we need to see both sides of the story.

Go here to see the original:

Press the right buttons on machine learning - The Australian Financial Review