AI in War: LLMs ‘Develop Arms-Race Dynamics’

Tom Everill | 5 March 2024


Summary

  • Large Language Models (LLMs) in war games escalate conflicts, showing a tendency towards arms-race dynamics without significant de-escalation across various models tested. 

  • The study highlights the importance of cautious AI integration in military contexts, ethical considerations, and the need for design adjustments to mitigate unwanted escalation. 

  • Global AI advancements underscore the urgency for policies on AI use in defence, amidst rapid integration in military operations and private sector innovation. 

  • Ethical and regulatory challenges arise from the dual-use nature of AI technologies and their potential impact on global security and warfare dynamics. 


In January, a study by a team of researchers published on Cornell University’s arXiv research platform found that Large Language Models (LLMs) deployed in war game scenarios tend towards arms-race dynamics and a high likelihood of conflict.  

 

These findings were roughly consistent across the five ‘off-the-shelf’ LLMs tested, all illustrating difficult-to-predict escalation patterns. The researchers studied GPT-3.5 and GPT-4 (the last and current generations of the model used in OpenAI’s ChatGPT chatbot), Claude-2.0 by San Francisco-based firm Anthropic, Llama-2-Chat which appears to be a testing variant of Meta’s LLaMA model and finally, a model known as GPT-4-Base, essentially a jailbroken GPT-4, having removed fine-tuning.  

 

All models displayed a statistically significant initial escalation in aggression during the simulations, and no models exhibited substantial de-escalation. GPT-3.5 showed the most intense escalation increase. The study also cites unpredictable, sudden escalation spikes and the occurrence of ‘high-risk outlier actions’, including the use of nuclear weapons across models, especially endemic to GPT-3.5 and Llama-2-Chat. Furthermore, all models tended to invest in future military capabilities over time, suggesting a bias towards arms-race dynamics despite being free to de-escalate. GPT-4-Base, given its lack of fine-tuning for safety or instruction-following, exhibited a higher propensity for severe actions, providing a glimpse into the chaos behind language models’ guardrails, something many consumers are unaware of.  

The team behind the research provided some recommendations for policymakers: 

  • Cautious Integration: policymakers should proceed cautiously when considering integrating LLMs into military and foreign policy decision-making due to their unpredictability. 

  • Importance of Model Design Choices: the variance observed between models, including the severity of GPT-4-Base’s choices, proves that scope for mitigating unwanted effects through design exists. 

  • Need for Further Research: more research into the behaviour of LLMs in high-stakes scenarios is needed, as well as a focus on understanding the differences between LLM and human decision-making. 

AI-generated image on AI and warfare

Artificial Intelligence Revolution?  

This study comes as the global economy begins expanding the infrastructure for an AI revolution that most analysts now believe is underway. The bull case for artificial intelligence is strengthened by chipmaker Nvidia’s surging stock price and consistently overperforming earnings, surpassing a market capitalisation of $2 trillion this month, up roughly 65% YTD. Microsoft’s sizeable investment in OpenAI, Google’s rush to release Gemini as a competitor to ChatGPT, Apple’s research on fully offline LLM chatbots, and big bets on AI across industries illustrate the prevalence and degree of momentum behind AI integration. 

 

In the realm of defence, we are already beginning to see the operational use of complex AI systems by the United States Military and others as the global AI defence market nears $9 billion. For example, in 2017, the US Department of Defense (DoD) launched Project Maven, intended to develop AI technology to analyse and interpret vast amounts of video data from drones and other sources, enabling faster and more accurate decision-making and intelligence-gathering for military operations. In September 2022, it was confirmed as operational in Europe.  

 

Furthermore, earlier this month, AUKUS (a trilateral defence agreement between the US, UK and Australia) announced a trial of its joint AI and autonomous capabilities. At the same time, UK Defense Secretary Grant Shapps announced a £1.85bn deal with French defence and aerospace firm Thales for AI and virtual reality tech intended to keep British warships and submarines at sea for longer.  

 

Looking eastward, shares of Chinese tech giant Baidu fell almost 15% intraday on January 17th after rumours emerged of collaboration with the People’s Liberation Army on LLM testing; if true, it serves as hard proof of China’s interest in AI’s military applications. 

OpenAI's ChatGPT logo and login

Mojahid Mottakin/Unsplash


Private Sector, Policymakers, and National Security Objectives 

The integration of AI into the military realm has launched countless debates of varied origin. Long before the mainstreaming of generative AI, Project Maven had already felt the brunt of a major controversy. In 2018, it came to light that one of Google’s newest clients was, in fact, the DoD and that the relationship between the tech giant and the DoD centered around collaboration on Maven. When Google employees learned of their employer’s involvement in the program, many protested - citing ethical concerns, leading to internal disarray and, eventually, the company cancelling its plan to renew its Maven contract. As the vision of an AI future becomes more tangible, this instance may set the stage for future internal disquiet over Big Tech’s involvement in military affairs. 

 

Given the private sector’s dominance on AI innovation, a debate is also growing surrounding where Western companies can and should export new tech. Perhaps it is easy for defence contractors and weapons manufacturers to draw a line: legally, strategically, morally and practically. However, much of the technology on modern battlefields is dual-purpose and produced for innocuous reasons by commercial firms. For example, we have observed ongoing and widespread use of commercial drones, made by companies such as Chinese, DJI (who seemingly do very little to prevent it), on the battlefield in Ukraine since the start of Russia’s full-scale invasion in February 2022. AI products will prove similar, and some are already finding their way to the battlefield; the question now is how companies will navigate this precarious geopolitical environment and if governments will force their hand.

 

US firm Palantir Technologies, like DJI, witnessed its tech used in Ukraine; however, unlike DJI, that was its intention. Palantir CEO Alex Karp’s stance seemingly moves in lockstep with US foreign policy. On a panel at this year’s Munich Security Conference, he argued that ‘we should have very severe restrictions on exporting our enormous advantage to China.’ Moreover, Karp flew to Kyiv in June 2022 to offer to open an office there to deploy Palantir data and AI software to support Ukraine’s cyber defence. This is a rare display of synergy between modern commercial US business and national foreign policy objectives. However, as Palantir was initially seeded by the Central Intelligence Agency’s venture capital arm, it is reasonable to suggest it does not represent the views of the average US company on such matters. Nonetheless, as talk of global conflict intensifies, western companies must have internal conversations concerning how to align business practices with national security objectives or, at least, how to do so quickly in a crisis. This is particularly crucial for the AI industry given its immense potential to transform our world. 

 

The findings on LLMs’ proclivities to escalate, in the context of a rapidly developing global AI landscape, highlight the need for cautious integration, ethical considerations, a framework on private sector technology exports, and further research into the impact of LLMs in military decision-making. 

Fleet of fighter planes

UxGun/Unsplash


Implications

  • The tendency of LLMs to escalate conflicts in simulations signals a potential risk for global security and stability if such models are to be integrated into advanced weapon systems without appropriate safeguards against excessive or unintended escalation.  

  • The variation observed between the behaviour of different models points to an urgent need for a comprehensive regulatory framework for how LLMs used in high-risk environments are trained. 

  • Policymakers must balance AI innovation, driven by economic and strategic incentives, with the need to prevent its misuse. 

  • Ethical dilemmas are raised by the use of AI in warfare, particularly regarding dual-use technologies and the question of export controls. 

  • The involvement of private-sector firms like Palantir illustrates the complex interplay between business interests and national security objectives. 

Previous
Previous

2024 Iranian General Elections

Next
Next

Modi’s Stifling of India’s Civil Society