Archive for the ‘Artificial Intelligence’ Category

WHO releases AI ethics and governance guidance for large multi-modal models – World Health Organization

The World Health Organization (WHO) is releasing new guidance on the ethics and governance of large multi-modal models (LMMs) a type of fast growing generative artificial intelligence (AI) technology with applications across health care.

The guidance outlines over 40 recommendations for consideration by governments, technology companies, and health care providers to ensure the appropriate use of LMMs to promote and protect the health of populations.

LMMs can accept one or more type of data inputs, such as text, videos, and images, and generate diverse outputs not limited to the type of data inputted. LMMs are unique in their mimicry of human communication and ability to carry out tasks they were not explicitly programmed to perform. LMMs have been adopted faster than any consumer application in history, with several platforms such as ChatGPT, Bard and Bert entering the public consciousness in 2023.

Generative AI technologies have the potential to improve health care but only if those who develop, regulate, and use these technologies identify and fully account for the associated risks, said Dr Jeremy Farrar, WHO Chief Scientist. We need transparent information and policies to manage the design, development, and use of LMMs to achieve better health outcomes and overcome persisting health inequities.

The new WHO guidance outlines five broad applications of LMMs for health:

While LMMs are starting to be used for specific health-related purposes, there are also documented risks of producing false, inaccurate, biased, or incomplete statements, which could harm people using such information in making health decisions. Furthermore, LMMs may be trained on data that are of poor quality or biased, whether by race, ethnicity, ancestry, sex, gender identity, or age.

The guidance also details broader risks to health systems, such as accessibility and affordability of the best-performing LMMs. LMMS can also encourage automation bias by health care professionals and patients, whereby errors are overlooked that would otherwise have been identified or difficult choices are improperly delegated to a LMM. LMMs, like other forms of AI, are also vulnerable to cybersecurity risks that could endanger patient information or the trustworthiness of these algorithms and the provision of health care more broadly.

To create safe and effective LMMs, WHO underlines the need for engagement of various stakeholders: governments, technology companies, healthcare providers, patients, and civil society, in all stages of development and deployment of such technologies, including their oversight and regulation.

Governments from all countries must cooperatively lead efforts to effectively regulate the development and use of AI technologies, such as LMMs, said Dr Alain Labrique, WHO Director for Digital Health and Innovation in the Science Division.

The new WHO guidance includes recommendations for governments, who have the primary responsibility to set standards for the development and deployment of LMMs, and their integration and use for public health and medical purposes. For example, governments should:

The guidance also includes the following key recommendations for developers of LMMs, who should ensure that:

The new document on Ethics and governance of AI for health Guidance on large multi-modal models is based on WHOs guidance published in June 2021. Access the publication here

See the original post:
WHO releases AI ethics and governance guidance for large multi-modal models - World Health Organization

When Might AI Outsmart Us? It Depends Who You Ask – TIME

In 1960, Herbert Simon, who went on to win both the Nobel Prize for economics and the Turing Award for computer science, wrote in his book The New Science of Management Decision that machines will be capable, within 20 years, of doing any work that a man can do.

History is filled with exuberant technological predictions that have failed to materialize. Within the field of artificial intelligence, the brashest predictions have concerned the arrival of systems that can perform any task a human can, often referred to as artificial general intelligence, or AGI.

So when Shane Legg, Google DeepMinds co-founder and chief AGI scientist, estimates that theres a 50% chance that AGI will be developed by 2028, it might be tempting to write him off as another AI pioneer who hasnt learnt the lessons of history.

Still, AI is certainly progressing rapidly. GPT-3.5, the language model that powers OpenAIs ChatGPT was developed in 2022, and scored 213 out of 400 on the Uniform Bar Exam, the standardized test that prospective lawyers must pass, putting it in the bottom 10% of human test-takers. GPT-4, developed just months later, scored 298, putting it in the top 10%. Many experts expect this progress to continue.

Read More: 4 Charts That Show Why AI Progress Is Unlikely to Slow Down

Leggs views are common among the leadership of the companies currently building the most powerful AI systems. In August, Dario Amodei, co-founder and CEO of Anthropic, said he expects a human-level AI could be developed in two to three years. Sam Altman, CEO of OpenAI, believes AGI could be reached sometime in the next four or five years.

But in a recent survey the majority of 1,712 AI experts who responded to the question of when they thought AI would be able to accomplish every task better and more cheaply than human workers were less bullish. A separate survey of elite forecasters with exceptional track records shows they are less bullish still.

The stakes for divining who is correct are high. Legg, like many other AI pioneers, has warned that powerful future AI systems could cause human extinction. And even for those less concerned by Terminator scenarios, some warn that an AI system that could replace humans at any task might replace human labor entirely.

Many of those working at the companies building the biggest and most powerful AI models believe that the arrival of AGI is imminent. They subscribe to a theory known as the scaling hypothesis: the idea that even if a few incremental technical advances are required along the way, continuing to train AI models using ever greater amounts of computational power and data will inevitably lead to AGI.

There is some evidence to back this theory up. Researchers have observed very neat and predictable relationships between how much computational power, also known as compute, is used to train an AI model and how well it performs a given task. In the case of large language models (LLM)the AI systems that power chatbots like ChatGPTscaling laws predict how well a model can predict a missing word in a sentence. OpenAI CEO Sam Altman recently told TIME that he realized in 2019 that AGI might be coming much sooner than most people think, after OpenAI researchers discovered the scaling laws.

Read More: 2023 CEO of the Year: Sam Altman

Even before the scaling laws were observed, researchers have long understood that training an AI system using more compute makes it more capable. The amount of compute being used to train AI models has increased relatively predictably for the last 70 years as costs have fallen.

Early predictions based on the expected growth in compute were used by experts to anticipate when AI might match (and then possibly surpass) humans. In 1997, computer scientist Hans Moravec argued that cheaply available hardware will match the human brain in terms of computing power in the 2020s. An Nvidia A100 semiconductor chip, widely used for AI training, costs around $10,000 and can perform roughly 20 trillion FLOPS, and chips developed later this decade will have higher performance still. However, estimates for the amount of compute used by the human brain vary widely from around one trillion floating point operations per second (FLOPS) to more than one quintillion FLOPS, making it hard to evaluate Moravecs prediction. Additionally, training modern AI systems requires a great deal more compute than running them, a fact that Moravecs prediction did not account for.

More recently, researchers at nonprofit Epoch have made a more sophisticated compute-based model. Instead of estimating when AI models will be trained with amounts of compute similar to the human brain, the Epoch approach makes direct use of scaling laws and makes a simplifying assumption: If an AI model trained with a given amount of compute can faithfully reproduce a given portion of textbased on whether the scaling laws predict such a model can repeatedly predict the next word almost flawlesslythen it can do the work of producing that text. For example, an AI system that can perfectly reproduce a book can substitute for authors, and an AI system that can reproduce scientific papers without fault can substitute for scientists.

Some would argue that just because AI systems can produce human-like outputs, that doesnt necessarily mean they will think like a human. After all, Russell Crowe plays Nobel Prize-winning mathematician John Nash in the 2001 film, A Beautiful Mind, but nobody would claim that the better his acting performance, the more impressive his mathematical skills must be. Researchers at Epoch argue that this analogy rests on a flawed understanding of how language models work. As they scale up, LLMs acquire the ability to reason like humans, rather than just superficially emulating human behavior. However, some researchers argue it's unclear whether current AI models are in fact reasoning.

Epochs approach is one way to quantitatively model the scaling hypothesis, says Tamay Besiroglu, Epochs associate director, who notes that researchers at Epoch tend to think AI will progress less rapidly than the model suggests. The model estimates a 10% chance of transformative AIdefined as AI that if deployed widely, would precipitate a change comparable to the industrial revolutionbeing developed by 2025, and a 50% chance of it being developed by 2033. The difference between the models forecast and those of people like Legg is probably largely down to transformative AI being harder to achieve than AGI, says Besiroglu.

Although many in leadership positions at the most prominent AI companies believe that the current path of AI progress will soon produce AGI, theyre outliers. In an effort to more systematically assess what the experts believe about the future of artificial intelligence, AI Impacts, an AI safety project at the nonprofit Machine Intelligence Research Institute, surveyed 2,778 experts in fall 2023, all of whom had published peer-reviewed research in prestigious AI journals and conferences in the last year.

Among other things, the experts were asked when they thought high-level machine intelligence, defined as machines that could accomplish every task better and more cheaply than human workers without help, would be feasible. Although the individual predictions varied greatly, the average of the predictions suggests a 50% chance that this would happen by 2047, and a 10% chance by 2027.

Like many people, the experts seemed to have been surprised by the rapid AI progress of the last year and have updated their forecasts accordinglywhen AI Impacts ran the same survey in 2022, researchers estimated a 50% chance of high-level machine intelligence arriving by 2060, and a 10% chance by 2029.

The researchers were also asked when they thought various individual tasks could be carried out by machines. They estimated a 50% chance that AI could compose a Top 40 hit by 2028 and write a book that would make the New York Times bestseller list by 2029.

Nonetheless, there is plenty of evidence to suggest that experts dont make good forecasters. Between 1984 and 2003, social scientist Philip Tetlock collected 82,361 forecasts from 284 experts, asking them questions such as: Will Soviet leader Mikhail Gorbachev be ousted in a coup? Will Canada survive as a political union? Tetlock found that the experts predictions were often no better than chance, and that the more famous an expert was, the less accurate their predictions tended to be.

Next, Tetlock and his collaborators set out to determine whether anyone could make accurate predictions. In a forecasting competition launched by the U.S. Intelligence Advanced Research Projects Activity in 2010, Tetlocks team, the Good Judgement Project (GJP), dominated the others, producing forecasts that were reportedly 30% more accurate than intelligence analysts who had access to classified information. As part of the competition, the GJP identified superforecastersindividuals who consistently made above-average accuracy forecasts. However, although superforecasters have been shown to be reasonably accurate for predictions with a time horizon of two years or less, it's unclear whether theyre also similarly accurate for longer-term questions such as when AGI might be developed, says Ezra Karger, an economist at the Federal Reserve Bank of Chicago and research director at Tetlocks Forecasting Research Institute.

When do the superforecasters think AGI will arrive? As part of a forecasting tournament run between June and October 2022 by the Forecasting Research Institute, 31 superforecasters were asked when they thought Nick Bostromthe controversial philosopher and author of the seminal AI existential risk treatise Superintelligencewould affirm the existence of AGI. The median superforecaster thought there was a 1% chance that this would happen by 2030, a 21% chance by 2050, and a 75% chance by 2100.

All three approaches to predicting when AGI might be developedEpochs model of the scaling hypothesis, and the expert and superforecaster surveyshave one thing in common: theres a lot of uncertainty. In particular, the experts are spread widely, with 10% thinking it's as likely as not that AGI is developed by 2030, and 18% thinking AGI wont be reached until after 2100.

Still, on average, the different approaches give different answers. Epochs model estimates a 50% chance that transformative AI arrives by 2033, the median expert estimates a 50% probability of AGI before 2048, and the superforecasters are much further out at 2070.

There are many points of disagreement that feed into debates over when AGI might be developed, says Katja Grace, who organized the expert survey as lead researcher at AI Impacts. First, will the current methods for building AI systems, bolstered by more compute and fed more data, with a few algorithmic tweaks, be sufficient? The answer to this question in part depends on how impressive you think recently developed AI systems are. Is GPT-4, in the words of researchers at Microsoft, the sparks of AGI? Or is this, in the words of philosopher Hubert Dreyfus, like claiming that the first monkey that climbed a tree was making progress towards landing on the moon?

Second, even if current methods are enough to achieve the goal of developing AGI, it's unclear how far away the finish line is, says Grace. Its also possible that something could obstruct progress on the way, for example a shortfall of training data.

Finally, looming in the background of these more technical debates are peoples more fundamental beliefs about how much and how quickly the world is likely to change, Grace says. Those working in AI are often steeped in technology and open to the idea that their creations could alter the world dramatically, whereas most people dismiss this as unrealistic.

The stakes of resolving this disagreement are high. In addition to asking experts how quickly they thought AI would reach certain milestones, AI Impacts asked them about the technologys societal implications. Of the 1,345 respondents who answered questions about AIs impact on society, 89% said they are substantially or extremely concerned about AI-generated deepfakes and 73% were similarly concerned that AI could empower dangerous groups, for example by enabling them to engineer viruses. The median respondent thought it was 5% likely that AGI leads to extremely bad, outcomes, such as human extinction.

Given these concerns, and the fact that 10% of the experts surveyed believe that AI might be able to do any task a human can by 2030, Grace argues that policymakers and companies should prepare now.

Preparations could include investment in safety research, mandatory safety testing, and coordination between companies and countries developing powerful AI systems, says Grace. Many of these measures were also recommended in a paper published by AI experts last year.

If governments act now, with determination, there is a chance that we will learn how to make AI systems safe before we learn how to make them so powerful that they become uncontrollable, Stuart Russell, professor of computer science at the University of California, Berkeley, and one of the papers authors, told TIME in October.

Link:
When Might AI Outsmart Us? It Depends Who You Ask - TIME

Artificial Intelligence and Nuclear Stability – War On The Rocks

Policymakers around the world are grappling with the new opportunities and dangers that artificial intelligence presents. Of all the effects that AI can have on the world, among the most consequential would be integrating it into the command and control for nuclear weapons. Improperly used, AI in nuclear operations could have world-ending effects. If properly implemented, it could reduce nuclear risk by improving early warning and detection and enhancing the resilience of second-strike capabilities, both of which would strengthen deterrence. To take full advantage of these benefits, systems must take into account the strengths and limitations of humans and machines. Successful human-machine joint cognitive systems will harness the precision and speed of automation with the flexibility of human judgment and do so in a way that avoids automation bias and surrendering human judgment to machines. Because of the early state of AI implementation, the United States has the potential to make the world safer by more clearly outlining its policies, pushing for broad international agreement, and acting as a normative trendsetter.

The United States has been extremely transparent and forward-leaning in establishing and communicating its policies on military AI and autonomous systems, publishing its policy on autonomy in weapons in 2012, adopting ethical principles for military AI in 2020, and updating its policy on autonomy in weapons in 2023. The department stated formally and unequivocally in the 2022 Nuclear Posture Review that it will always maintain a human in the loop for nuclear weapons employment. In November 2023, over 40 nations joined the United States in endorsing a political declaration on responsible military use of AI. Endorsing states included not just U.S. allies but also nations in Africa, Southeast Asia, and Latin America.

[wotr_memer_button]

Building on this success, the United States should push for international agreements with other nuclear powers to mitigate the risks of integrating AI into nuclear systems or placing nuclear weapons onboard uncrewed vehicles. The United Kingdom and France released a joint statement with the United States in 2022 agreeing on the need to maintain human control of nuclear launches. Ideally, this could represent the beginning of a commitment by the permanent members of the United Nations Security Council if Russia and China could be convinced to join this principle. Even if they are not willing to agree, the United States should further mature its own policies to address critical gaps and work with other nuclear-armed states to strengthen their commitments as an interim measure and as a way to build international consensus on the issue.

The Dangers of Automation

As militaries increasingly adopt AI and automation, there is an urgent need to clarify how these technologies should be used in nuclear operations. Absent formal agreements, states risk an incremental trend of creeping automation that could undermine nuclear stability. While policymakers are understandably reluctant to adopt restrictions on emerging technologies lest they give up a valuable future capability, U.S. officials should not be complacent in assuming other states will approach AI and automation in nuclear operations responsibly. Examples such as Russias Perimeter dead hand system and Poseidon autonomous nuclear-armed underwater drone demonstrate that other nations might see these risks differently than the United States and might be willing to take risks that U.S. policymakers would find unacceptable.

Existing systems, such as Russias Perimeter, highlight the risks of states integrating automation into nuclear systems. Perimeter is reportedly a system created by the Soviet Union in the 1980s to act as a failsafe in case Soviet leadership was destroyed in a decapitation strike. Perimeter reportedly has a network of sensors to determine if a nuclear attack has occurred. If these sensors are triggered while Perimeter is activated, the system would wait a predetermined period of time for a signal from senior military commanders. If there is no signal from headquarters, presumably because Soviet/Russian leadership had been wiped out, then Perimeter would bypass the normal chain of command and pass nuclear launch authority to a relatively junior officer on duty. Senior Russian officials have stated the system is still functioning, noting in 2011 that the system was combat ready and in 2018 that it had been improved.

The system was designed to reduce the burden on Soviet leaders of hastily making a nuclear decision under time pressure and with incomplete information. In theory, Soviet/Russian leaders could take more time to deliberate knowing that there is a failsafe guaranteeing retaliation if the United States succeeded in a decapitation strike. The cost, however, is a system that risks easing pathways to nuclear annihilation in the event of an accident.

Allowing autonomous systems to participate in nuclear launch decisions risks degrading stability and increasing the dangers of nuclear accidents. The Stanislav Petrov incident is an illustrative example of the dangers of automation in nuclear decision-making. In 1983, a Soviet early warning system indicated that the United States had launched several intercontinental ballistic missiles. Lieutenant Colonel Stanislav Petrov, the duty officer at the time, suspected that the system was malfunctioning because the number of missiles launched was suspiciously low and the missiles were not picked up by early warning radars. Petrov reported it (correctly) as a malfunction instead of an attack. AI and autonomous systems often lack the contextual understanding that humans have and that Petrov used to recognize that the reported missile launch was a false alarm. Without human judgment at critical stages of nuclear operations, automated systems could make mistakes or elevate false alarms, heightening nuclear risk.

Moreover, merely having humans in the loop will not be enough to ensure effective human decision-making. Human operators frequently fall victim to automation bias, a condition in which humans overtrust automation and surrender their judgment to machines. Accidents with self-driving cars demonstrate the dangers of humans overtrusting automation, and military personnel are not immune to this phenomenon. To ensure humans remain cognitively engaged in their decision-making, militaries will need to take into account not only the automation itself but also human psychology and human-machine interfaces.

More broadly, when designing human-machine systems, it is essential to consciously determine the appropriate roles for humans and machines. Machines are often better at precision and speed, while humans are often better at understanding the broader context and applying judgment. Too often, human operators are left to fill in the gaps for what automation cant do, acting as backups or failsafes for the edge cases that autonomous systems cant handle. But this model often fails to take into account the realities of human psychology. Even if human operators dont fall victim to automation bias, to assume that a person can sit passively watching a machine perform a task for hours on end, whether a self-driving car or a military weapon system, and then suddenly and correctly identify a problem when the automation is not performing and leap into action to take control is not realistic. Human psychology doesnt work that way. And tragic accidents with complex highly automated systems, such as the Air France 447 crash in 2009 and the 737 MAX crashes in 2018 and 2019, demonstrate the importance of taking into account the dynamic interplay between automation and human operators.

The U.S. military has also suffered tragic accidents with automated systems, even when humans are in the loop. In 2003, U.S. Army Patriot air and missile defense systems shot down two friendly aircraft during the opening phases of the Iraq war. Humans were in the loop for both incidents. Yet a complex mix of human and technical failures meant that human operators did not fully understand the complex, highly automated systems they were in charge of and were not effectively in control.

The military will need to establish guidance to inform system design, operator training, doctrine, and operational procedures to ensure that humans in the loop arent merely unthinking cogs in a machine but actually exercise human judgment. Issuing this concrete guidance for weapons developers and operators is most critical in the nuclear domain, where the consequences of an accident could be grave.

Clarifying Department of Defense Guidance

Recent policies and statements on the role of autonomy and AI in nuclear operations are an important first step in establishing this much-needed guidance, but additional clarification is needed. The 2022 Nuclear Posture Review states: In all cases, the United States will maintain a human in the loop for all actions critical to informing and executing decisions by the President to initiate and terminate nuclear weapon employment. The United Kingdom adopted a similar policy in 2022, stating in their Defence Artificial Intelligence Strategy: We will ensure that regardless of any use of AI in our strategic systems human political control of our nuclear weapons is maintained at all times.

As the first official policies on AI in nuclear command and control, these are landmark statements. Senior U.S. military officers had previously emphasized the importance of human control over nuclear weapons, including statements by Lt. Gen. Jack Shanahan, then-director of the Joint Artificial Intelligence Center in 2019. Official policy statements are more significant, however, in signaling to audiences both internal and external to the military the importance of keeping humans firmly in charge of all nuclear use decisions. These high-level statements nevertheless leave many open questions about implementation.

The next step for Department of Defense is to translate what the high-level principle of human in the loop means for nuclear systems, doctrine, and training. Key questions include: Which actions are critical to informing and executing decisions by the president? Do those only consist of actions immediately surrounding the president, or do they also include actions further down the chain of command before and after a presidential decision? For example, would it be acceptable for a human to deliver an algorithm-based recommendation to the president to carry out a nuclear attack? Or does a human need to be involved in understanding the data and rendering their own human judgment?

The U.S. military already uses AI to process information, such as satellite images and drone video feeds. Presumably, AI would also be used to support intelligence analysis that could support decisions about nuclear use. Under what circumstances is AI appropriate and beneficial to nuclear stability? Are some applications and ways of using AI more valuable than others?

When AI is used, what safeguards should be put in place to guard against mistakes, malfunctions, or spoofing of AI systems? For example, the United States currently employs a dual phenomenology mechanism to ensure that a potential missile attack is confirmed by two independent sensing methods, such as satellites and ground-based radars. Should the United States adopt a dual algorithm approach to any use of AI in nuclear operations, ensuring that there are two independent AI systems trained on different data sets with different algorithms as a safeguard against spoofing attacks or unreliable AI systems?

When AI systems are used to process information, how should that information be presented to human operators? For example, if the military used an algorithm trained to detect signs of a missile being fueled, that information could be interpreted differently by humans if the AI system reported fueling versus preparing to launch. Fueling is a more precise and accurate description of what the AI system is actually detecting and might lead a human analyst to seek more information, whereas preparing to launch is a conclusion that might or might not be appropriate depending on the broader context.

When algorithmic recommendation systems are used, how much of the underlying data should humans have to directly review? Is it sufficient for human operators to only see the algorithms conclusion, or should they also have access to the raw data that supports the algorithms recommendation?

Finally, what degree of engagement is expected from a human in the loop? Is the human merely there as a failsafe in case the AI malfunctions? Or must the human be engaged in the process of analyzing information, generating courses of actions, and making recommendations? Are some of these steps more important than others for human involvement?

These are critical questions that the United States will need to address as it seeks to harness the benefits of AI in nuclear operations while meeting the human in the loop policy. The sooner the Department of Defense can clarify answers to these questions, the more that it can accelerate AI adoption in ways that are trustworthy and meet the necessary reliability standards for nuclear operations. Nor does clarifying these questions overly constrain how the United States approaches AI. Guidance can always be changed over time as the technology evolves. But a lack of clear guidance risks forgoing valuable opportunities to use AI or, even worse, adopting AI in ways that might undermine nuclear surety and deterrence.

Dead Hand Systems

In clarifying its human-in-the-loop policy, the United States should make a firm commitment to reject dead hand nuclear launch systems or a system with a standing order to launch that incorporates algorithmic components. Dead hand systems akin to Russias Perimeter would appear to be prohibited by current Department of Defense policy. However, the United States should explicitly state that it will not build such systems given their risk.

Despite their danger, some U.S. analysts have suggested that the United States should adopt a dead hand system to respond to emerging technologies such as AI, hypersonics, and advanced cruise missiles. There are safer methods for responding to these threats, however. Rather than gambling humanitys future on an algorithm, the United States should strengthen its second-strike deterrent in response to new threats.

Some members of the U.S. Congress have even expressed a desire for writing this requirement into law. In April 2023, a bipartisan group of representatives introduced the Block Nuclear Launch by Autonomous Artificial Intelligence Act, which would prohibit funding for any system that launches nuclear weapons without meaningful human control. There is precedent for a legal requirement to maintain a human in the loop for strategic systems. In the 1980s, during development of the Strategic Defense Initiative (also known as Star Wars), Congress passed a law requiring affirmative human decision at an appropriate level of authority for strategic missile defense systems. This legislation could serve as a blueprint for a similar legislative requirement for nuclear use. One benefit of a legal requirement is that it ensures that such an important policy could not be overturned by a future administration or Pentagon leadership that is more risk-accepting without Congressional authorization.

Nuclear Weapons and Uncrewed Vehicles

The United States should similarly clarify its policy for nuclear weapons on uncrewed vehicles. The United States is producing a new nuclear-capable strategic bomber, the B-21, that will be able to perform uncrewed missions in the future, and is developing large undersea uncrewed vehicles that could carry weapons payloads. U.S. military officers have stated a strong reticence for placing nuclear weapons aboard uncrewed platforms. In 2016, then-Commander of Air Force Global Strike Command Gen. Robin Rand noted that the B-21 would always be crewed when carrying nuclear weapons: If you had to pin me down, I like the man in the loop; the pilot, the woman in the loop, very much, particularly as we do the dual-capable mission with nuclear weapons. General Rands sentiment may be shared among senior military officers, but it is not official policy. The United States should adopt an official policy that nuclear weapons will not be placed aboard recoverable uncrewed platforms. Establishing this policy could help provide guidance to weapons developers and the services about the appropriate role for uncrewed platforms in nuclear operations as the Department of Defense fields larger uncrewed and optionally crewed platforms.

Nuclear weapons have long been placed on uncrewed delivery vehicles, such as ballistic and cruise missiles, but placing nuclear weapons on a recoverable uncrewed platform such as a bomber is fundamentally different. A human decision to launch a nuclear missile is a decision to carry out a nuclear strike. Humans could send a recoverable, two-way uncrewed platform, such as a drone bomber or undersea autonomous vehicle, out on patrol. In that case, the human decision to launch the nuclear-armed drone would not yet be a decision to carry out a nuclear strike. Instead, the drone could be sent on patrol as an escalation signal or to preposition in case of a later decision to launch a nuclear attack. Doing so would put enormous faith in the drones communications links and on-board automation, both of which may be unreliable.

The U.S. military has lost control of drones before. In 2017, a small tactical Army drone flew over 600 miles from southern Arizona to Denver after Army operators lost communications. In 2011, a highly sensitive U.S. RQ-170 stealth drone ended up in Iranian hands after U.S. operators lost contact with it over Afghanistan. Losing control of a nuclear-armed drone could cause nuclear weapons to fall into the wrong hands or, in the worst case, escalate a nuclear crisis. The only way to maintain nuclear surety is direct, physical human control over nuclear weapons up until the point of a decision to carry out a nuclear strike.

While the U.S. military would likely be extremely reluctant to place nuclear weapons onboard a drone aircraft or undersea vehicle, Russia is already developing such a system. The Poseidon, or Status-6, undersea autonomous uncrewed vehicle is reportedly intended as a second- or third-strike weapon to deliver a nuclear attack against the United States. How Russia intends to use the weapon is unclear and could evolve over time but an uncrewed platform like the Poseidon in principle could be sent on patrol, risking dangerous accidents. Other nuclear powers could see value in nuclear-armed drone aircraft or undersea vehicles as these technologies mature.

The United States should build on its current momentum in shaping global norms on military AI use and work with other nations to clarify the dangers of nuclear-armed drones. As a first step, the U.S. Defense Department should clearly state as a matter of official policy that it will not place nuclear weapons on two-way, recoverable uncrewed platforms, such as bombers or undersea vehicles. The United States has at times foresworn dangerous weapons in other areas, such as debris-causing antisatellite weapons, and publicly articulated their dangers. Similarly explaining the dangers of nuclear-armed drones could help shape the behavior of other nuclear powers, potentially forestalling their adoption.

Conclusion

It is imperative that nuclear powers approach the integration of AI and autonomy in their nuclear operations thoughtfully and deliberately. Some applications, such as using AI to help reduce the risk of a surprise attack, could improve stability. Other applications, such as dead hand systems, could be dangerous and destabilizing. Russias Perimeter and Poseidon systems demonstrate that other nations might be willing to take risks with automation and autonomy that U.S. leaders would see as irresponsible. It is essential for the United States to build on its current momentum to clarify its own policies and work with other nuclear-armed states to seek international agreement on responsible guardrails for AI in nuclear operations. Rumors of a U.S.-Chinese agreement on AI in nuclear command and control at the meeting between President Joseph Biden and General Secretary Xi Jinping offer a tantalizing hint of the possibilities for nuclear powers to come together to guard against the risks of AI integrated into humanitys most dangerous weapons. The United States should seize this moment and not let this opportunity pass to build a safer, more stable future.

Michael Depp is a research associate with the AI safety and stability project at the Center for a New American Security (CNAS).

Paul Scharre is the executive vice president and director of studies at CNAS and the author of Four Battlegrounds: Power in the Age of Artificial Intelligence.

Image: U.S. Air Force photo by Senior Airman Jason Wiese

Read the rest here:
Artificial Intelligence and Nuclear Stability - War On The Rocks

Test Yourself: Which Faces Were Made by A.I.? – The New York Times

Tools powered by artificial intelligence can create lifelike images of people who do not exist.

See if you can identify which of these images are real people and which are A.I.-generated.

Were you surprised by your results? You guessed 0 times and got 0 correct.

Ever since the public release of tools like Dall-E and Midjourney in the past couple of years, the A.I.-generated images theyve produced have stoked confusion about breaking news, fashion trends and Taylor Swift.

Distinguishing between a real versus an A.I.-generated face has proved especially confounding.

Research published across multiple studies found that faces of white people created by A.I. systems were perceived as more realistic than genuine photographs of white people, a phenomenon called hyper-realism.

Researchers believe A.I. tools excel at producing hyper-realistic faces because they were trained on tens of thousands of images of real people. Those training datasets contained images of mostly white people, resulting in hyper-realistic white faces. (The over-reliance on images of white people to train A.I. is a known problem in the tech industry.)

The confusion among participants was less apparent among nonwhite faces, researchers found.

Participants were also asked to indicate how sure they were in their selections, and researchers found that higher confidence correlated with a higher chance of being wrong.

We were very surprised to see the level of over-confidence that was coming through, said Dr. Amy Dawel, an associate professor at Australian National University, who was an author on two of the studies.

It points to the thinking styles that make us more vulnerable on the internet and more vulnerable to misinformation, she added.

The idea that A.I.-generated faces could be deemed more authentic than actual people startled experts like Dr. Dawel, who fear that digital fakes could help the spread of false and misleading messages online.

A.I. systems had been capable of producing photorealistic faces for years, though there were typically telltale signs that the images were not real. A.I. systems struggled to create ears that looked like mirror images of each other, for example, or eyes that looked in the same direction.

But as the systems have advanced, the tools have become better at creating faces.

The hyper-realistic faces used in the studies tended to be less distinctive, researchers said, and hewed so closely to average proportions that they failed to arouse suspicion among the participants. And when participants looked at real pictures of people, they seemed to fixate on features that drifted from average proportions such as a misshapen ear or larger-than-average nose considering them a sign of A.I. involvement.

The images in the study came from StyleGAN2, an image model trained on a public repository of photographs containing 69 percent white faces.

Study participants said they relied on a few features to make their decisions, including how proportional the faces were, the appearance of skin, wrinkles, and facial features like eyes.

Read the original post:
Test Yourself: Which Faces Were Made by A.I.? - The New York Times

Quantitative gait analysis and prediction using artificial intelligence for patients with gait disorders | Scientific Reports – Nature.com

Data acquisition

This study was carried out in accordance with the tenets of the Declaration of Helsinki and with the approval of the Brest, France hospitals (CHRUs) Ethics Committee. Patients had also signed an informed consent. Our work was conducted between 2021 and 2022. Data collected between June 2006 and June 2021 from 734 patients (115 adults and 619 children) who had undergone clinical 3D gait analysis were used. Their identities were preserved by respecting medical secret and protecting patient confidentiality. All data were recorded using the same motion analysis system (Vicon MX, Oxford Metrics, UK) and four force platforms (Advanced Mechanical Technology, Inc., Watertown, MA, USA) in the same motion laboratory (CHU Brest) between 2006 and 2022. The data collected by the 15 infrared cameras (sampling rate of 100 or 120Hz) were synchronized with the ground reaction forces recorded by the force platforms (1000Hz or 1200Hz). The 16 markers were placed according to the protocol by Kadaba et al.11. Marker trajectories and ground reaction forces were dual-pass filtered with a low-pass Butterworth filter at a cut-off frequency of 6 Hz. After an initial calibration in the standing position, all patients were asked to walk at a self-selected speed along a 10m walkway.

Gait kinematics were processed using the Vicon Plug-in Gait model. Kinematics were time-normalized to stride duration, from 0 to 100% from initial contact (IC) to the next IC of the ipsilateral foot. Nine gait joint angles (kinematic gait variables) were used: anteversion/retroversion of the pelvis, rotation of the pelvis, pelvic tilt, flexion/extension of the hip, abduction/adduction of the hip, internal/external rotation of the hip, flexion/extension of the knee, plantar/dorsiflexion of the ankle, and the foots angle of progression. As a result, a gait cycle yielded 101 (times) 9 measurements. Let (E_{p,d}) denote the gait session of patient p at datetime d. It can be written as follows:

$$begin{aligned} E_{p,d} = left{ {C_{ E_{p,d}}}^{1}, {C_{ E_{p,d}}}^{2}, ldots , {C_{ E_{p,d}}}^{K} right} end{aligned}$$

(1)

where ({C_{ E_{p,d}}}^{k}) is the k-th gait cycle of a gait session (E_{p,d}) and K the total number of gait cycles. Let (c_{t,n}^{E_{p,d}^{k}}) denote the gait cycle ({C_{E_{p,d}}}^{k}) value at time step t and joint angle n. To keep notations simple, (c_{t,n}^{E_{p,d}^{k}}) is referred to as (c_{t,n}) in what follows. ({C_{E_{p,d}}}^{k}) can simply be represented with a matrix of 101 lines and 9 columns, as follows:

$$begin{aligned} {C_{ E_{p,d}}}^{k} = begin{bmatrix} c_{1,1} &{} c_{1,2} &{}cdots &{} c_{1,9} \ c_{2,1} &{} c_{2,2} &{}cdots &{} c_{2,9}\ vdots &{} &{} &{} \ c_{101,1} &{} c_{101,2} &{}cdots &{} c_{101,9}\ end{bmatrix} end{aligned}$$

(2)

The Gait Profile Score (GPS), a walking behavior score, was computed for each gait cycle from the previously described joint angles12,13,14. The GPS is a single index measure that summarizes the overall deviation of kinematic gait data relative to normative data. It can be decomposed to provide Gait Variable Scores (GVS) for nine key component kinematic gait variables, which are presented as a Movement Analysis Profile (MAP). The GVS corresponding to the n-th kinematic variable, GVS(_{textrm{n}}), is given by15,16,17:

$$begin{aligned} GVS_n = sqrt{frac{1}{T}sum _{t=1}^{T}(c_{t,n} - c_{t,n} ^{ref})^{2}} end{aligned}$$

(3)

where t is a specific point in the gait cycle, T its total number of points (typically equal to 10118,19), (c_{t,n}) the value of the kinematic variable n at point t, and (c_{t,n}^{textrm{ref}}) is its mean on the reference population (physiological normative). The GPS is obtained from the GVS scores15,17 as follows:

$$begin{aligned} GPS = sqrt{frac{1}{N}sum _{n=1}^{N}GVS_n^{2}} end{aligned}$$

(4)

where N is the total number of kinematic variables (equal to 9 by definition).

We had a total of 1459 gait sessions from 734 patients (115 adults and 619 children). Each patient had an average of 1.988 gait sessions with a standard deviation of 1.515. 53,693 gait cycles were collected. Their average number per gait session is equal to 18 with a standard deviation of 6. Neurological conditions, notably cerebral palsy, are the most frequent etiologies, as we can see in Fig.1.

The average patient age within the first gait session is equal to 14years, with a standard deviation of 16years. The time delay between the first and last gait session (for patients with more than one gait session, i.e., 319) is equal to 3.92years on average with a standard deviation of 3.24years. Directly consecutive gait sessions are, on average, separated by approximately 740days, with a standard deviation of 577days. The shortest (resp. longest) time delay was equal to 4 (resp. 4438) days. We had 1384 pairs of directly consecutive gait sessions belonging to 319 patients (the remaining patients were removed since they had only one gait session). Involved gait conditions are various: without any equipment, with a cane, with a rollator, with an orthosis, with a prosthesis.. Only pairs of gait sessions without equipment were selected in order to be in the same condition (79% of all available pairs, i.e. 1152). The first gait sessions in these pairs were used for training. Models were fed the gait cycles of these first gait sessions (i.e., 21,167 gait cycles in total).

GPS variation prediction is similar enough to a Time Series Classification (TSC) issue that its proposed popular architectures should be adopted. Consecutive gait session pairs ((E_{p,d}, E_{p,d+Delta d})) were considered. For each gait cycle ({C_{ E_{p,d}}}^{k}) of the current gait session (E_{p,d}), a GPS variation (Delta {}GPS) was computed using:

$$begin{aligned} Delta {}GPS({C_{ E_{p,d}}}^{k}) = GPS_{avg}( E_{p,d+Delta d}) - GPS({C_{ E_{p,d}}}^{k}) end{aligned}$$

(5)

where (GPS_{avg}(E_{p,d+Delta d})) is the average GPS per cycle of (E_{p,d+Delta d}) and (GPS({C_{ E_{p,d}}}^{k})) the GPS of the current gait cycle ({C_{E_{p,d}}}^{k}). The average GPS per cycle (GPS_{average}(E_{p,d})) of a gait session (E_{p,d}) is simply equal to:

$$begin{aligned} GPS_{avg}(E_{p,d}) = frac{sum _{k=1}^{K} GPS({C_{ E_{p,d}}}^{k}) }{K} end{aligned}$$

(6)

(Delta {})

GPS was ranked in a binary fashion. Either it is negative, in which case the patients gait improves (class 1), or it is positive, in which case the patients gait worsens (class 0). The metric used is the Area Under the Curve (AUC).

The distribution of patients between training, validation, and test groups is provided in Table1. Such a split put 73%, 12%, and 14% of total gait cycles within the training, validation, and test groups, respectively.

To be exhaustive, one MLP, one recurrent neural network (LSTM), one hybrid architecture (Encoder), several CNN architectures (FCN, ResNet, t-LeNet), and a one-dimensional Transformer20 were included. The MLP and LSTM were designed and developed from scratch. Their hyper-parameters were optimized manually. FCN, ResNet, Encoder, and t-LeNet are among the most effective end-to-end discriminative architectures regarding the TSC state-of-the-art10. These methods were also compared to the Transformer, a more recent and popular architecture. The Transformer does not suffer from long-range context dependency issues compared to LSTM21. In addition, it is notable for requiring less training. The Adam optimizer22 and binary cross-entropy loss were employed23.

For MLP, gait cycles were flattened so that the input length was equal to 909 time steps. The number of neurons was the same across all the fully connected layers. Many values of this number were tested to find the best structure for our task. In the same way, the number of layers was optimized. The corresponding architecture is shown in Fig.2.

MLP architecture for prediction.

LSTM layers were stacked, and a dropout was added before the last layer to avoid overfitting. The corresponding architecture is shown in Fig.3.

LSTM architecture for prediction.

For FCN, ResNet, Encoder and t-LeNet, the architectures proposed in Ref.10 were considered. They are shown in Figs. 4, 5, 6 and 7, respectively. We followed an existing implementation24 to set up the Transformer.

FCN architecture for prediction.

ResNet architecture for prediction.

Encoder architecture for prediction.

t-LeNet architecture for prediction.

Different techniques of data augmentation were tested as a pre-processing step to avoid overfitting: jittering, scaling, window warping, permutation, and window slicing. Their hyperparameters were empirically optimized for each model. These are among the TSC literatures most frequently utilized techniques, particularly when it comes from sensor data10.

Image-based time series representation initiated a new branch of deep learning approaches that consider image transformation as an innovative pre-processing of feature engineering25. In an attempt to reveal features and patterns less visible in the one-dimensional sequence of the original time series, many transformation methods were developed to encode time series as input images.

In our study, sensor modalities are transformed to the visual domain using 2D FFT in order to utilize a set of pre-trained CNN models for transfer learning on the converted imagery data. The full workflow of our framework is represented in Fig.8.

Proposed (Delta GPS) prediction workflow for the image-based approach.

2D FFT is used to work in the frequency domain or Fourier domain because it efficiently extracts features based on the frequency of each time step in the time series. It can be defined as:

$$F(u,v) = frac{1}{{T.N}}sumlimits_{{t = 0}}^{T} {sumlimits_{{n = 0}}^{N} {c_{{t,n}} } } exp left( { - j2pi left( {frac{{ut}}{T} + frac{{vn}}{N}} right)} right)$$

(7)

where F(u,v) is the direct Fourier transform of the gait cycle. It is a complex function that shows the phase and magnitude of the signal in the frequency domain. u and v are the frequency space coordinates. The magnitude of the 2D FFT |F(u,v)|, also known as the spectrum, is a two-dimensional signal that represents frequency information. Because the 2D FFT has translation and rotation attributes, the zero-frequency component can be moved to the center of |F(u,v)| without losing any information, making the spectrum image more visible. The centralized FFT spectrums were computed and fed to the proposed deep learning models. A centralized FFT spectrum for a given gait cycle is represented in Fig.9.

2D FFT for a given gait cycle. (a) The gait cycle; (b) FFT spectrum of the gait cycle; (c) Centralized FFT spectrum of the gait cycle.

The Timm librarys26 pre-trained VGG16, ResNet34, EfficientNet_b0, and the Vision Transformer vit_base_patch16_224 were investigated. They were pre-trained on a large collection of images, in a supervised fashion. For the Transformer, the pre-training was at a resolution of (224 times 224) pixels. Its input images were considered as a sequence of fixed-size patches (resolution (16 times 16)), which were linearly embedded.

Converting our grayscale images to RGB images was not necessary because Timms implementations support any number of input channels. The models minimum input size for VGG16 is (32 times 32). The images width dimension (N) equals 9, which is less than 32. In order to fit the minimum needed size, 2D FFT images were repeated 4 times in this width dimension. Transfer learning with fine-tuning methods was employed. One neurons final fully connected layer was used. In the same way that the top layers were trainable, all convolutional blocks were.

The pre-trained Timm models are deep and sophisticated, with many layers. As a result, a CNN model with fewer parameters, designed from scratch, was conceived. The number of used two-dimensional convolutional layers was a hyper-parameter to optimize in a finite range of values {1, 2, 3, 4, 5}. After the convolutional block, a dropout function was applied. Following that, two-dimensional max-pooling (MaxPooling2D) and batch normalization were used. The flattened output of the batch normalization was then fed to a dense layer of a certain number of neurons to tune. In order to predict the (Delta GPS), our model had a dense output layer with a single neuron. The corresponding architecture is shown in Fig.10.

Tailored 2D CNN for prediction.

The following are all of the architecture hyper-parameters to tune: the number of convolutional layers (num_layers), the number of filters for each convolution layer (num_filters), the kernel size of each convolution layer (kernel_size), the dropout rate (dropout), the pooling size of the MaxPooling2D (pool_size), the number of neurons in the dense layer (units), and the learning rate (lr). Five models with a varying number of convolutional layers (from 1 to 5) were tested. For each of them, the rest of the hyper-parameters were tuned using KerasTuner9 to maximize the validation AUC.

Originally posted here:
Quantitative gait analysis and prediction using artificial intelligence for patients with gait disorders | Scientific Reports - Nature.com