Archive for the ‘Machine Learning’ Category

HEAL: A framework for health equity assessment of machine learning performance – Google Research

Posted by Mike Schaekermann, Research Scientist, Google Research, and Ivor Horn, Chief Health Equity Officer & Director, Google Core

Health equity is a major societal concern worldwide with disparities having many causes. These sources include limitations in access to healthcare, differences in clinical treatment, and even fundamental differences in the diagnostic technology. In dermatology for example, skin cancer outcomes are worse for populations such as minorities, those with lower socioeconomic status, or individuals with limited healthcare access. While there is great promise in recent advances in machine learning (ML) and artificial intelligence (AI) to help improve healthcare, this transition from research to bedside must be accompanied by a careful understanding of whether and how they impact health equity.

Health equity is defined by public health organizations as fairness of opportunity for everyone to be as healthy as possible. Importantly, equity may be different from equality. For example, people with greater barriers to improving their health may require more or different effort to experience this fair opportunity. Similarly, equity is not fairness as defined in the AI for healthcare literature. Whereas AI fairness often strives for equal performance of the AI technology across different patient populations, this does not center the goal of prioritizing performance with respect to pre-existing health disparities.

In Health Equity Assessment of machine Learning performance (HEAL): a framework and dermatology AI model case study, published in The Lancet eClinicalMedicine, we propose a methodology to quantitatively assess whether ML-based health technologies perform equitably. In other words, does the ML model perform well for those with the worst health outcomes for the condition(s) the model is meant to address? This goal anchors on the principle that health equity should prioritize and measure model performance with respect to disparate health outcomes, which may be due to a number of factors that include structural inequities (e.g., demographic, social, cultural, political, economic, environmental and geographic).

The HEAL framework proposes a 4-step process to estimate the likelihood that an ML-based health technology performs equitably:

The final steps output is termed the HEAL metric, which quantifies how anticorrelated the ML models performance is with health disparities. In other words, does the model perform better with populations that have the worse health outcomes?

This 4-step process is designed to inform improvements for making ML model performance more equitable, and is meant to be iterative and re-evaluated on a regular basis. For example, the availability of health outcomes data in step (2) can inform the choice of demographic factors and brackets in step (1), and the framework can be applied again with new datasets, models and populations.

With this work, we take a step towards encouraging explicit assessment of the health equity considerations of AI technologies, and encourage prioritization of efforts during model development to reduce health inequities for subpopulations exposed to structural inequities that can precipitate disparate outcomes. We should note that the present framework does not model causal relationships and, therefore, cannot quantify the actual impact a new technology will have on reducing health outcome disparities. However, the HEAL metric may help identify opportunities for improvement, where the current performance is not prioritized with respect to pre-existing health disparities.

As an illustrative case study, we applied the framework to a dermatology model, which utilizes a convolutional neural network similar to that described in prior work. This example dermatology model was trained to classify 288 skin conditions using a development dataset of 29k cases. The input to the model consists of three photos of a skin concern along with demographic information and a brief structured medical history. The output consists of a ranked list of possible matching skin conditions.

Using the HEAL framework, we evaluated this model by assessing whether it prioritized performance with respect to pre-existing health outcomes. The model was designed to predict possible dermatologic conditions (from a list of hundreds) based on photos of a skin concern and patient metadata. Evaluation of the model is done using a top-3 agreement metric, which quantifies how often the top 3 output conditions match the most likely condition as suggested by a dermatologist panel. The HEAL metric is computed via the anticorrelation of this top-3 agreement with health outcome rankings.

We used a dataset of 5,420 teledermatology cases, enriched for diversity in age, sex and race/ethnicity, to retrospectively evaluate the models HEAL metric. The dataset consisted of store-and-forward cases from patients of 20 years or older from primary care providers in the USA and skin cancer clinics in Australia. Based on a review of the literature, we decided to explore race/ethnicity, sex and age as potential factors of inequity, and used sampling techniques to ensure that our evaluation dataset had sufficient representation of all race/ethnicity, sex and age groups. To quantify pre-existing health outcomes for each subgroup we relied on measurements from public databases endorsed by the World Health Organization, such as Years of Life Lost (YLLs) and Disability-Adjusted Life Years (DALYs; years of life lost plus years lived with disability).

However, while the model was likely to perform equitably across age groups for cancer conditions specifically, we discovered that it had room for improvement across age groups for non-cancer conditions. For example, those 70+ have the poorest health outcomes related to non-cancer skin conditions, yet the model didn't prioritize performance for this subgroup.

For holistic evaluation, the HEAL metric cannot be employed in isolation. Instead this metric should be contextualized alongside many other factors ranging from computational efficiency and data privacy to ethical values, and aspects that may influence the results (e.g., selection bias or differences in representativeness of the evaluation data across demographic groups).

As an adversarial example, the HEAL metric can be artificially improved by deliberately reducing model performance for the most advantaged subpopulation until performance for that subpopulation is worse than all others. For illustrative purposes, given subpopulations A and B where A has worse health outcomes than B, consider the choice between two models: Model 1 (M1) performs 5% better for subpopulation A than for subpopulation B. Model 2 (M2) performs 5% worse on subpopulation A than B. The HEAL metric would be higher for M1 because it prioritizes performance on a subpopulation with worse outcomes. However, M1 may have absolute performances of just 75% and 70% for subpopulations A and B respectively, while M2 has absolute performances of 75% and 80% for subpopulations A and B respectively. Choosing M1 over M2 would lead to worse overall performance for all subpopulations because some subpopulations are worse-off while no subpopulation is better-off.

Accordingly, the HEAL metric should be used alongside a Pareto condition (discussed further in the paper), which restricts model changes so that outcomes for each subpopulation are either unchanged or improved compared to the status quo, and performance does not worsen for any subpopulation.

The HEAL framework, in its current form, assesses the likelihood that an ML-based model prioritizes performance for subpopulations with respect to pre-existing health disparities for specific subpopulations. This differs from the goal of understanding whether ML will reduce disparities in outcomes across subpopulations in reality. Specifically, modeling improvements in outcomes requires a causal understanding of steps in the care journey that happen both before and after use of any given model. Future research is needed to address this gap.

The HEAL framework enables a quantitative assessment of the likelihood that health AI technologies prioritize performance with respect to health disparities. The case study demonstrates how to apply the framework in the dermatological domain, indicating a high likelihood that model performance is prioritized with respect to health disparities across sex and race/ethnicity, but also revealing the potential for improvements for non-cancer conditions across age. The case study also illustrates limitations in the ability to apply all recommended aspects of the framework (e.g., mapping societal context, availability of data), thus highlighting the complexity of health equity considerations of ML-based tools.

This work is a proposed approach to address a grand challenge for AI and health equity, and may provide a useful evaluation framework not only during model development, but during pre-implementation and real-world monitoring stages, e.g., in the form of health equity dashboards. We hold that the strength of the HEAL framework is in its future application to various AI tools and use cases and its refinement in the process. Finally, we acknowledge that a successful approach towards understanding the impact of AI technologies on health equity needs to be more than a set of metrics. It will require a set of goals agreed upon by a community that represents those who will be most impacted by a model.

The research described here is joint work across many teams at Google. We are grateful to all our co-authors: Terry Spitz, Malcolm Pyles, Heather Cole-Lewis, Ellery Wulczyn, Stephen R. Pfohl, Donald Martin, Jr., Ronnachai Jaroensri, Geoff Keeling, Yuan Liu, Stephanie Farquhar, Qinghan Xue, Jenna Lester, Can Hughes, Patricia Strachan, Fraser Tan, Peggy Bui, Craig H. Mermel, Lily H. Peng, Yossi Matias, Greg S. Corrado, Dale R. Webster, Sunny Virmani, Christopher Semturs, Yun Liu, and Po-Hsuan Cameron Chen. We also thank Lauren Winer, Sami Lachgar, Ting-An Lin, Aaron Loh, Morgan Du, Jenny Rizk, Renee Wong, Ashley Carrick, Preeti Singh, Annisah Um'rani, Jessica Schrouff, Alexander Brown, and Anna Iurchenko for their support of this project.

Go here to see the original:
HEAL: A framework for health equity assessment of machine learning performance - Google Research

Expert on how machine learning could lead to improved outcomes in urology – Urology Times

In this video, Glenn T. Werneburg, MD, PhD, shares the take-home message from the abstracts "Machine learning algorithms demonstrate accurate prediction of objective and patient-reported response to botulinum toxin for overactive bladder and outperform expert humans in an external cohort and "Machine learning algorithms predict urine culture bacterial resistance to first line antibiotic therapy at the time of sample collection, which were presented at the Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction 2024 Winter Meeting in Fort Lauderdale, Florida. Werneburg is a urology resident at Glickman Urological & Kidney Institute at Cleveland Clinic, Cleveland, Ohio.

We're very much looking forward to being able to clinically implement these algorithms, both on the OAB side and the antibiotic resistance side. For the OAB, if we can identify who would best respond to sacral neuromodulation, and who would best respond to onabotulinumtoxinA injection, then we're helping patients achieve an acceptable outcome faster. We're improving their incontinence or their urgency in a more efficient way. So we're enthusiastic about this. Once we can implement this clinically, we believe it's going to help us in this way. It's the same for the antibiotic resistance algorithms. When we can get these into the hands of clinicians, we'll be able to have a good suggestion in terms of which is the best antibiotic to use for this patient at this time. And in doing so, we hope to be able to improve our antibiotic stewardship. Ideally, we would use an antibiotic with the narrowest spectrum that would still cover the infecting organism, and in doing so, it reduces the risk for resistance. So if that same patient requires an antibiotic later on in his or her lifetime, chances areand we'd have to determine this with data and experimentsif we're implementing a narrower spectrum antibiotic to treat an infection, they're going to be less likely to be resistant to other antibiotics down the line.

This transcription was edited for clarity.

Excerpt from:
Expert on how machine learning could lead to improved outcomes in urology - Urology Times

Unlock the potential of generative AI in industrial operations | Amazon Web Services – AWS Blog

In the evolving landscape of manufacturing, the transformative power of AI and machine learning (ML) is evident, driving a digital revolution that streamlines operations and boosts productivity. However, this progress introduces unique challenges for enterprises navigating data-driven solutions. Industrial facilities grapple with vast volumes of unstructured data, sourced from sensors, telemetry systems, and equipment dispersed across production lines. Real-time data is critical for applications like predictive maintenance and anomaly detection, yet developing custom ML models for each industrial use case with such time series data demands considerable time and resources from data scientists, hindering widespread adoption.

Generative AI using large pre-trained foundation models (FMs) such as Claude can rapidly generate a variety of content from conversational text to computer code based on simple text prompts, known as zero-shot prompting. This eliminates the need for data scientists to manually develop specific ML models for each use case, and therefore democratizes AI access, benefitting even small manufacturers. Workers gain productivity through AI-generated insights, engineers can proactively detect anomalies, supply chain managers optimize inventories, and plant leadership makes informed, data-driven decisions.

Nevertheless, standalone FMs face limitations in handling complex industrial data with context size constraints (typically less than 200,000 tokens), which poses challenges. To address this, you can use the FMs ability to generate code in response to natural language queries (NLQs). Agents like PandasAI come into play, running this code on high-resolution time series data and handling errors using FMs. PandasAI is a Python library that adds generative AI capabilities to pandas, the popular data analysis and manipulation tool.

However, complex NLQs, such as time series data processing, multi-level aggregation, and pivot or joint table operations, may yield inconsistent Python script accuracy with a zero-shot prompt.

To enhance code generation accuracy, we propose dynamically constructing multi-shot prompts for NLQs. Multi-shot prompting provides additional context to the FM by showing it several examples of desired outputs for similar prompts, boosting accuracy and consistency. In this post, multi-shot prompts are retrieved from an embedding containing successful Python code run on a similar data type (for example, high-resolution time series data from Internet of Things devices). The dynamically constructed multi-shot prompt provides the most relevant context to the FM, and boosts the FMs capability in advanced math calculation, time series data processing, and data acronym understanding. This improved response facilitates enterprise workers and operational teams in engaging with data, deriving insights without requiring extensive data science skills.

Beyond time series data analysis, FMs prove valuable in various industrial applications. Maintenance teams assess asset health, capture images for Amazon Rekognition-based functionality summaries, and anomaly root cause analysis using intelligent searches with Retrieval Augmented Generation (RAG). To simplify these workflows, AWS has introduced Amazon Bedrock, enabling you to build and scale generative AI applications with state-of-the-art pre-trained FMs like Claude v2. With Knowledge Bases for Amazon Bedrock, you can simplify the RAG development process to provide more accurate anomaly root cause analysis for plant workers. Our post showcases an intelligent assistant for industrial use cases powered by Amazon Bedrock, addressing NLQ challenges, generating part summaries from images, and enhancing FM responses for equipment diagnosis through the RAG approach.

The following diagram illustrates the solution architecture.

The workflow includes three distinct use cases:

The workflow for NLQ with time series data consists of the following steps:

Our summary generation use case consists of the following steps:

Our root cause diagnosis use case consists of the following steps:

To follow along with this post, you should meet the following prerequisites:

To set up your solution resources, complete the following steps:

Next, you create the knowledge base for the documents in Amazon S3.

The next step is to deploy the app with the required library packages on either your PC or an EC2 instance (Ubuntu Server 22.04 LTS).

Provide the OpenSearch Service collection ARN you created in Amazon Bedrock from the previous step.

After you complete the end-to-end deployment, you can access the app via localhost on port 8501, which opens a browser window with the web interface. If you deployed the app on an EC2 instance, allow port 8501 access via the security group inbound rule. You can navigate to different tabs for various use cases.

To explore the first use case, choose Data Insight and Chart. Begin by uploading your time series data. If you dont have an existing time series data file to use, you can upload the following sample CSV file with anonymous Amazon Monitron project data. If you already have an Amazon Monitron project, refer to Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis to stream your Amazon Monitron data to Amazon S3 and use your data with this application.

When the upload is complete, enter a query to initiate a conversation with your data. The left sidebar offers a range of example questions for your convenience. The following screenshots illustrate the response and Python code generated by the FM when inputting a question such as Tell me the unique number of sensors for each site shown as Warning or Alarm respectively? (a hard-level question) or For sensors shown temperature signal as NOT Healthy, can you calculate the time duration in days for each sensor shown abnormal vibration signal? (a challenge-level question). The app will answer your question, and will also show the Python script of data analysis it performed to generate such results.

If youre satisfied with the answer, you can mark it as Helpful, saving the NLQ and Claude-generated Python code to an OpenSearch Service index.

To explore the second use case, choose the Captured Image Summary tab in the Streamlit app. You can upload an image of your industrial asset, and the application will generate a 200-word summary of its technical specification and operation condition based on the image information. The following screenshot shows the summary generated from an image of a belt motor drive. To test this feature, if you lack a suitable image, you can use the following example image.

Hydraulic elevator motor label by Clarence Risher is licensed underCC BY-SA 2.0.

To explore the third use case, choose the Root cause diagnosis tab. Input a query related to your broken industrial asset, such as, My actuator travels slow, what might be the issue? As depicted in the following screenshot, the application delivers a response with the source document excerpt used to generate the answer.

In this section, we discuss the design details of the application workflow for the first use case.

The users natural language query comes with different difficult levels: easy, hard, and challenge.

Straightforward questions may include the following requests:

For these questions, PandasAI can directly interact with the FM to generate Python scripts for processing.

Hard questions require basic aggregation operation or time series analysis, such as the following:

For hard questions, a prompt template with detailed step-by-step instructions assists FMs in providing accurate responses.

Challenge-level questions need advanced math calculation and time series processing, such as the following:

For these questions, you can use multi-shots in a custom prompt to enhance response accuracy. Such multi-shots show examples of advanced time series processing and math calculation, and will provide context for the FM to perform relevant inference on similar analysis. Dynamically inserting the most relevant examples from an NLQ question bank into the prompt can be a challenge. One solution is to construct embeddings from existing NLQ question samples and save these embeddings in a vector store like OpenSearch Service. When a question is sent to the Streamlit app, the question will be vectorized by BedrockEmbeddings. The top N most-relevant embeddings to that question are retrieved using opensearch_vector_search.similarity_search and inserted into the prompt template as a multi-shot prompt.

The following diagram illustrates this workflow.

The embedding layer is constructed using three key tools:

At the outset of app development, we began with only 23 saved examples in the OpenSearch Service index as embeddings. As the app goes live in the field, users start inputting their NLQs via the app. However, due to the limited examples available in the template, some NLQs may not find similar prompts. To continuously enrich these embeddings and offer more relevant user prompts, you can use the Streamlit app for gathering human-audited examples.

Within the app, the following function serves this purpose. When end-users find the output helpful and select Helpful, the application follows these steps:

In the event that a user selects Not Helpful, no action is taken. This iterative process makes sure that the system continually improves by incorporating user-contributed examples.

By incorporating human auditing, the quantity of examples in OpenSearch Service available for prompt embedding grows as the app gains usage. This expanded embedding dataset results in enhanced search accuracy over time. Specifically, for challenging NLQs, the FMs response accuracy reaches approximately 90% when dynamically inserting similar examples to construct custom prompts for each NLQ question. This represents a notable 28% increase compared to scenarios without multi-shot prompts.

On the Streamlit apps Captured Image Summary tab, you can directly upload an image file. This initiates the Amazon Rekognition API (detect_text API), extracting text from the image label detailing machine specifications. Subsequently, the extracted text data is sent to the Amazon Bedrock Claude model as the context of a prompt, resulting in a 200-word summary.

From a user experience perspective, enabling streaming functionality for a text summarization task is paramount, allowing users to read the FM-generated summary in smaller chunks rather than waiting for the entire output. Amazon Bedrock facilitates streaming via its API (bedrock_runtime.invoke_model_with_response_stream).

In this scenario, weve developed a chatbot application focused on root cause analysis, employing the RAG approach. This chatbot draws from multiple documents related to bearing equipment to facilitate root cause analysis. This RAG-based root cause analysis chatbot uses knowledge bases for generating vector text representations, or embeddings. Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources or manage data flows and RAG implementation details.

When youre satisfied with the knowledge base response from Amazon Bedrock, you can integrate the root cause response from the knowledge base to the Streamlit app.

To save costs, delete the resources you created in this post:

Generative AI applications have already transformed various business processes, enhancing worker productivity and skill sets. However, the limitations of FMs in handling time series data analysis have hindered their full utilization by industrial clients. This constraint has impeded the application of generative AI to the predominant data type processed daily.

In this post, we introduced a generative AI Application solution designed to alleviate this challenge for industrial users. This application uses an open source agent, PandasAI, to strengthen an FMs time series analysis capability. Rather than sending time series data directly to FMs, the app employs PandasAI to generate Python code for the analysis of unstructured time series data. To enhance the accuracy of Python code generation, a custom prompt generation workflow with human auditing has been implemented.

Empowered with insights into their asset health, industrial workers can fully harness the potential of generative AI across various use cases, including root cause diagnosis and part replacement planning. With Knowledge Bases for Amazon Bedrock, the RAG solution is straightforward for developers to build and manage.

The trajectory of enterprise data management and operations is unmistakably moving towards deeper integration with generative AI for comprehensive insights into operational health. This shift, spearheaded by Amazon Bedrock, is significantly amplified by the growing robustness and potential of LLMs likeAmazon Bedrock Claude 3to further elevate solutions. To learn more, visit consult theAmazon Bedrock documentation, and get hands-on with theAmazon Bedrock workshop.

Julia Hu is a Sr. AI/ML Solutions Architect at Amazon Web Services. She is specialized in Generative AI, Applied Data Science and IoT architecture. Currently she is part of the Amazon Q team, and an active member/mentor in Machine Learning Technical Field Community. She works with customers, ranging from start-ups to enterprises, to develop AWSome generative AI solutions. She is particularly passionate about leveraging Large Language Models for advanced data analytics and exploring practical applications that address real-world challenges.

Sudeesh Sasidharanis a Senior Solutions Architect at AWS, within the Energy team. Sudeesh loves experimenting with new technologies and building innovative solutions that solve complex business challenges. When he is not designing solutions or tinkering with the latest technologies, you can find him on the tennis court working on his backhand.

Neil Desai is a technology executive with over 20 years of experience in artificial intelligence (AI), data science, software engineering, and enterprise architecture. At AWS, he leads a team of Worldwide AI services specialist solutions architects who help customers build innovative Generative AI-powered solutions, share best practices with customers, and drive product roadmap. In his previous roles at Vestas, Honeywell, and Quest Diagnostics, Neil has held leadership roles in developing and launching innovative products and services that have helped companies improve their operations, reduce costs, and increase revenue. He is passionate about using technology to solve real-world problems and is a strategic thinker with a proven track record of success.

See original here:
Unlock the potential of generative AI in industrial operations | Amazon Web Services - AWS Blog

Optimize price-performance of LLM inference on NVIDIA GPUs using the Amazon SageMaker integration with NVIDIA … – AWS Blog

NVIDIA NIM microservices now integrate with Amazon SageMaker, allowing you to deploy industry-leading large language models (LLMs) and optimize model performance and cost. You can deploy state-of-the-art LLMs in minutes instead of days using technologies such as NVIDIA TensorRT, NVIDIA TensorRT-LLM, and NVIDIA Triton Inference Server on NVIDIA accelerated instances hosted by SageMaker.

NIM, part of the NVIDIA AI Enterprise software platform listed on AWS marketplace, is a set of inference microservices that bring the power of state-of-the-art LLMs to your applications, providing natural language processing (NLP) and understanding capabilities, whether youre developing chatbots, summarizing documents, or implementing other NLP-powered applications. You can use pre-built NVIDIA containers to host popular LLMs that are optimized for specific NVIDIA GPUs for quick deployment or use NIM tools to create your own containers.

In this post, we provide a high-level introduction to NIM and show how you can use it with SageMaker.

NIM provides optimized and pre-generated engines for a variety of popular models for inference. These microservices support a variety of LLMs, such as Llama 2 (7B, 13B, and 70B), Mistral-7B-Instruct, Mixtral-8x7B, NVIDIA Nemotron-3 22B Persona, and Code Llama 70B, out of the box using pre-built NVIDIA TensorRT engines tailored for specific NVIDIA GPUs for maximum performance and utilization. These models are curated with the optimal hyperparameters for model-hosting performance for deploying applications with ease.

If your model is not in NVIDIAs set of curated models, NIM offers essential utilities such as the Model Repo Generator, which facilitates the creation of a TensorRT-LLM-accelerated engine and a NIM-format model directory through a straightforward YAML file. Furthermore, an integrated community backend of vLLM provides support for cutting-edge models and emerging features that may not have been seamlessly integrated into the TensorRT-LLM-optimized stack.

In addition to creating optimized LLMs for inference, NIM provides advanced hosting technologies such as optimized scheduling techniques like in-flight batching, which can break down the overall text generation process for an LLM into multiple iterations on the model. With in-flight batching, rather than waiting for the whole batch to finish before moving on to the next set of requests, the NIM runtime immediately evicts finished sequences from the batch. The runtime then begins running new requests while other requests are still in flight, making the best use of your compute instances and GPUs.

NIM integrates with SageMaker, allowing you to host your LLMs with performance and cost optimization while benefiting from the capabilities of SageMaker. When you use NIM on SageMaker, you can use capabilities such as scaling out the number of instances to host your model, performing blue/green deployments, and evaluating workloads using shadow testingall with best-in-class observability and monitoring with Amazon CloudWatch.

Using NIM to deploy optimized LLMs can be a great option for both performance and cost. It also helps make deploying LLMs effortless. In the future, NIM will also allow for Parameter-Efficient Fine-Tuning (PEFT) customization methods like LoRA and P-tuning. NIM also plans to have LLM support by supporting Triton Inference Server, TensorRT-LLM, and vLLM backends.

We encourage you to learn more about NVIDIA microservices and how to deploy your LLMs using SageMaker and try out the benefits available to you. NIM is available as a paid offering as part of the NVIDIA AI Enterprise software subscription available on AWS Marketplace.

In the near future, we will post an in-depth guide for NIM on SageMaker.

James Parkis a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In h is spare time he enjoys seeking out new cultures, new experiences, and staying up to date with the latest technology trends.You can find him on LinkedIn.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qings team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.

Nikhil Kulkarni is a software developer with AWS Machine Learning, focusing on making machine learning workloads more performant on the cloud, and is a co-creator of AWS Deep Learning Containers for training and inference. Hes passionate about distributed Deep Learning Systems. Outside of work, he enjoys reading books, fiddling with the guitar, and making pizza.

Harish Tummalacherla is Software Engineer with Deep Learning Performance team at SageMaker. He works on performance engineering for serving large language models efficiently on SageMaker. In his spare time, he enjoys running, cycling and ski mountaineering.

Eliuth Triana Isaza is a Developer Relations Manager at NVIDIA empowering Amazons AI MLOps, DevOps, Scientists and AWS technical experts to master the NVIDIA computing stack for accelerating and optimizing Generative AI Foundation models spanning from data curation, GPU training, model inference and production deployment on AWS GPU instances. In addition, Eliuth is a passionate mountain biker, skier, tennis and poker player.

Jiahong Liuis a Solution Architect on the Cloud Service Provider team at NVIDIA. He assists clients in adopting machine learning and AI solutions that leverage NVIDIA accelerated computing to address their training and inference challenges. In his leisure time, he enjoys origami, DIY projects, and playing basketball.

Kshitiz Guptais a Solutions Architect at NVIDIA. He enjoys educating cloud customers about the GPU AI technologies NVIDIA has to offer and assisting them with accelerating their machine learning and deep learning applications. Outside of work, he enjoys running, hiking and wildlife watching.

Originally posted here:
Optimize price-performance of LLM inference on NVIDIA GPUs using the Amazon SageMaker integration with NVIDIA ... - AWS Blog

Orange isn’t building its own AI foundation model here’s why – Light Reading

There has been a flurry of interest in generative AI (GenAI) from telcos, each of which has taken its own nuanced approach to the idea of building its own large language models (LLMs). While Vodafone seems todismiss the ideaand Verizon appears content to build on existing foundation models, Deutsche Telekom and SK Telecomannounced last yearthey will develop telco-specific LLMs. Orange, meanwhile, doesn't currently see the need to build a foundation model, its chief AI officer Steve Jarrett has recently told Light Reading.

Jarrett said the company is currently content with using existing models and adapting them to its needs using two main approaches. The first one is retrieval-augmented generation (RAG), where a detailed source of information is passed to the model together with the prompt to augment its response.

He said this allows the company to experiment with different prompts easily, adding that existing methodologies can be used to assess the results. "That is a very, very easy way to dynamically test different models, different styles of structuring the RAG and the prompts. And [] that solves the majority of our needs today," he elaborated.

At the same time, Jarrett admitted that the downside of RAG is that it may require a lot of data to be passed along with the prompt, making more complex tasks slow and expensive. In such cases, he argued, fine-tuning is a more appropriate approach.

Distilling models

In this case, he explained, "you take the information that you would have used in the RAG for [] a huge problem area. And you make a new version of the underlying model that embeds all that information." Another related option is to distill the model.

This involves not just structuring the output of the model, but downsizing it, "like you're distilling fruit into alcohol," Jarrett said, adding "there are techniques to actually chop the model down into a much smaller model that runs much faster."

This approach is, however, highly challenging. "Even my most expert people frequently make mistakes," he admitted, saying: "It's not simple, and the state of the art of the tools to fine tune are changing every single day." At the same time, he noted that these tools are improving constantly and, as a result, he expects fine-tuning to get easier over time.

He pointed out that building a foundation model from scratch would be an even more complex task, which the company currently doesn't see a reason to embark on. Nevertheless, he stressed that it's impossible to predict how things will evolve in the future.

Complexity budget

One possibility is that big foundational models will eventually absorb so much information that the need for RAG and other tools will diminish. In this scenario, Orange may never have to create its own foundation model, Jarrett said, "as long as we have the ability to distill and fine tune models, where we need to, to make the model small enough to run faster and cheaper and so on."

He added: "I think it's a very open question in the industry. In the end, will we have a handful of massive models, and everyone's doing 99% RAG and prompt engineering, or are there going to be millions of distilled and fine-tuned models?"

One factor that may determine where things will go in the future is what Jarrett calls the complexity budget. This is a concept that conveys how much computing was needed from start to finish to produce an answer.

While a very large model may be more intensive to train in the beginning, there may be less computing required for RAG and fine-tuning. "The other approach is you have a large language model that also obviously took a lot of training, but then you do a ton more compute to fine tune and distill the model so that your model is much smaller," he added.

Apart from cost, there is also an environmental concern. While hyperscalers tend to perform relatively well in terms of using clean energy, and Jarrett claimed that Orange is "fairly green as a company," he added that the carbon intensity of the energy used for on-premises GPU clusters tends to vary in the industry.

Right tool for the job

The uncertainty surrounding GenAI's future evolution is one of the reasons why Orange is taking a measured approach to the technology, with Jarrett stressing it is not a tool that's suited to every job. "You don't want to use the large language model sledge hammer to hit every nail," he said.

"I think, fairly uniquely compared to most other telco operators, we actually have the ability, the skill inside of Orange to help make these decisions about what tool to use when. So we prefer to use a statistical method or basic machine learning to solve problems because those results are more [] explainable. They're usually cheaper, and they're usually less impactful on the environment," he added.

In fact, Jarrett says one of the things Orange is investigating at the moment is how to use multiple AI models together to solve problems. The notion, he added, is called agents, and refers to a high-level abstraction of a problem, such as asking how the network in France is working on a given day. This, he said, will enable the company to solve complex problems more dynamically.

In the meantime, the company is making a range of GenAI models available to its employees, including ChatGPT, Dolly and Mistral. To do so, it has built a solution that Jarrett says provides a "secure, European-resident version of leading AI models that we make available to the entire company."

Improving customer service

Jarrett says this is a more controlled and safer way for employees to use models than if they were accessed directly. The solution also notifies the employee of the cost of running a specific model to answer a question. Available for several months, it has so far been used by 12% of employees.

Orange has already deployed GenAI in many countries within its customer service solutions to predict what the most appealing offer may be to an individual customer, Jarrett said, adding "what we're trialling right now is can generative AI help us to customize and personalize the text of that offer? Does that make the offer incrementally more appealing?"

Another potential use case is in transcribing a conversation with a customer care agent in real time, using generative AI to create prompts. The tool is still in development but could help new recruits to improve faster, raising employee and customer satisfaction, said Jarrett.

While Orange doesn't currently use GenAI for any use cases in the network, some are under development, although few details are being shared at this stage. One use case involves predicting when batteries at cell sites may need replacing.

Jarrett admits, however, that GenAI is still facing a number of challenges, such as hallucinations. "In a scenario where the outputs have to be correct 100% of the time, we're not going to use generative AI for that today, because [it's] not correct 100% of the time," he said.

Dealing with hallucinations

Yet it can be applied in areas that are less sensitive. "For example, if for internal use you want to have a summary of an enormous transcript of a long meeting that you missed, it's okay if the model hallucinates a little bit," he added.

Hallucinations cannot be stopped entirely and will likely continue to be a problem for some time, said Jarrett. But he believes RAG and fine-tuning could mitigate the issue to some extent.

"The majority of the time, if we're good at prompt engineering and we're good at passing the appropriate information with the response, the model generates very, very useful, relevant answers," Jarrett said about the results achieved with RAG.

The availability and quality of data is another issue that is often discussed, and also one that Orange is trying to address. Using data historically kept in separate silos has been difficult, said Jarrett. "[The] availability of the data from the marketing team to be able to run a campaign on where was our network relatively strong, for example those use cases were either impossible, or took many, many, many months of manual meetings and collaboration."

As a result, the company is trying to create a marketplace where data is made widely available inside each country and appropriately labeled. Orange calls this approach data democracy.

Continued here:
Orange isn't building its own AI foundation model here's why - Light Reading