Research

Anonymous Communication & Censorship

Authors: Neil Perry , Bruce Spang , Saba Eskandarian , Dan Boneh

Summary

Messaging systems built on mesh networks consisting of smartphones communicating over Bluetooth have been used by protesters around the world after governments have disrupted Internet connectivity. Unfortunately, existing systems have been shown to be insecure; most concerningly by not adequately hiding metadata. This is further complicated by the fact that wireless communication such as Bluetooth is inherently a broadcasting medium. In this paper, we present a new threat model that captures the security requirements of protesters in this setting. We then provide a solution that satisfies the required security properties, hides all relevant metadata, scales to moderately sized protests, and supports group messaging. This is achieved by broadcasting all messages in a way that limits the overhead of duplicate messages, ensuring that ciphertexts do not leak metadata, and limiting what can be learned by observing user behavior. We also build a model of our system and numerically evaluate it to support our claims and analyze how many users it supports. Finally, we discuss further extensions that remove potential bottlenecks in scaling and support substantially more users.

Robust Steganography from Large Language Models

Authors: Neil Perry , Sanket Gupte , Nishant Pitta , Lior Rotem

Summary

Recent steganographic schemes, starting with Meteor (CCS'21), rely on leveraging large language models (LLMs) to resolve a historically-challenging task of disguising covert communication as 'innocent-looking' natural-language communication. However, existing methods are vulnerable to 're-randomization attacks,' where slight changes to the communicated text, that might go unnoticed, completely destroy any hidden message. This is also a vulnerability in more traditional encryption-based stegosystems, where adversaries can modify the randomness of an encryption scheme to destroy the hidden message while preserving an acceptable covertext to ordinary users. In this work, we study the problem of robust steganography. We introduce formal definitions of weak and strong robust LLM-based steganography, corresponding to two threat models in which natural language serves as a covertext channel resistant to realistic re-randomization attacks. We then propose two constructions satisfying these notions. We design and implement our steganographic schemes that embed arbitrary secret messages into natural language text generated by LLMs, ensuring recoverability even under adversarial paraphrasing and rewording attacks. To support further research and real-world deployment, we release our implementation and datasets for public use.

Arms Control

OP55: Everything Counts: Building a Control Regime for Nonstrategic Nuclear Warheads in Europe

Authors: Sami Shihadeh , Miles A. Pomper , William Alberque , Marshall L. Brown Jr. , William M. Moon , Nikolai Sokov , Rose Gottemoeller , Neil Perry , Dan Zhukov , Bill Delaney , Ferenc Dalnoki-Veress , George Moore

Summary

Before the Russian invasion of Ukraine, the Biden administration insisted in arms control talks with Russia that a follow-on agreement to the New Strategic Arms Reduction Treaty (New START) should cover all nuclear weapons and that such an agreement should focus on the nuclear warheads themselves. This would represent a significant change from previous agreements, which focused on delivery vehicles, such as missiles. The United States has been particularly interested in potential limits on nonstrategic nuclear warheads (NSNW). Such weapons have never been subject to an arms control agreement. Because Russia possesses an advantage in the number of such weapons, the US Senate has insisted that negotiators include them in a future agreement, making their inclusion necessary if such an accord is to win Senate approval and ultimately be ratified by Washington. In the wake of Russian nuclear threats in the Ukraine conflict, such demands can only be expected to grow if and when US and Russian negotiators return to the negotiating table. This work outlines such a Control Regime for Nonstrategic Nuclear Warheads in Europe.

Cryptographic Data Exchange for Nuclear Warheads

Authors: Neil Perry , Daniil Zhukov

Summary

Nuclear arms control treaties have historically focused on strategic nuclear delivery systems, indirectly restricting strategic nuclear warhead numbers and leaving nonstrategic nuclear warheads (NSNWs) outside formal verification frameworks. This paper presents a cryptographic protocol for secure and verifiable warhead tracking, addressing challenges in nuclear warhead verification without requiring intrusive physical inspections. Our system leverages commitment schemes and zero-knowledge succinct non-interactive arguments of knowledge (zkSNARKs) to ensure compliance with treaty constraints while preserving the confidentiality of sensitive nuclear warhead data. We propose a cryptographic 'Warhead Passport' tracking system that chains commitments to individual warheads over their life cycle, enabling periodic challenges and real-time verification of treaty compliance. Our implementation follows real-world treaty constraints, integrates U.S. and Russian dual-hash combiners (SHA-family and GOST R 34.11 family) for cryptographic robustness and political constraints, and ensures forward security by preventing retroactive data manipulation. This work builds on policy research from prior arms control studies and provides a practical foundation for implementing secure, auditable NSNW verification mechanisms.

AI Security

Do Users Write More Insecure Code with AI Assistants?

Authors: Neil Perry , Megha Srivastava , Deepak Kumar , Dan Boneh

Summary

We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI's codex-davinci-002 model wrote significantly less secure code than those without access. Additionally, participants with access to an AI assistant were more likely to believe they wrote secure code than those without access to the AI assistant. Furthermore, we find that participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities. Finally, in order to better inform the design of future AI-based Code assistants, we provide an in-depth analysis of participants' language and interaction behavior, as well as release our user interface as an instrument to conduct similar studies in the future.

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

Authors: Andy K. Zhang , Neil Perry , Riya Dulepet , et al.

Summary

Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have potential to cause real-world impact. Policymakers, model providers, and researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those tasks. We include 40 professional-level Capture the Flag (CTF) tasks from 4 distinct CTF competitions, chosen to be recent, meaningful, and spanning a wide range of difficulties. Each task includes its own description, starter files, and is initialized in an environment where an agent can execute commands and observe outputs. Since many tasks are beyond the capabilities of existing LM agents, we introduce subtasks for each task, which break down a task into intermediary steps for a more detailed evaluation. To evaluate agent capabilities, we construct a cybersecurity agent and evaluate 8 models: GPT-4o, OpenAI o1-preview, Claude 3 Opus, Claude 3.5 Sonnet, Mixtral 8x22b Instruct, Gemini 1.5 Pro, Llama 3 70B Chat, and Llama 3.1 405B Instruct. For the top performing models (GPT-4o and Claude 3.5 Sonnet), we further investigate performance across 4 agent scaffolds (structed bash, action-only, pseudoterminal, and web search). Without subtask guidance, agents leveraging Claude 3.5 Sonnet, GPT-4o, OpenAI o1-preview, and Claude 3 Opus successfully solved complete tasks that took human teams up to 11 minutes to solve. In comparison, the most difficult task took human teams 24 hours and 54 minutes to solve.

Technology Policy

The Stanford Emerging Technology Review 2025: A Report on Ten Key Technologies and Their Policy Implications

Authors: SETR Staff & Faculty , Neil Perry

Summary

The Stanford Emerging Technology Review (SETR) is a pivotal initiative by Stanford University, aimed at educating policymakers about transformative technologies. It underscores the dual nature of technological advances: their potential to drive progress and the risks of misuse or stifling innovation. The report emphasizes the convergence of multiple technologies like synthetic biology, materials science, and neuroscience, which are rapidly reshaping society. SETR serves as a comprehensive guide covering ten critical technology areas: artificial intelligence, biotechnology and synthetic biology, cryptography, materials science, neuroscience, nuclear technologies, robotics, semiconductors, space technologies, and sustainable energy technologies, underlining their growing influence on American society. The initiative stresses the need for collaboration among academia, industry, and government to maintain American leadership in science and technology. It advocates for continuous learning and dialogue to harness the promise of emerging technologies effectively, recognizing the importance of understanding and embracing these advancements for collective benefit.

The Stanford Emerging Technology Review 2023: A Report on Ten Key Technologies and Their Policy Implications

Authors: SETR Staff & Faculty , Neil Perry

Summary

Research

Anonymous Communication & Censorship#

Arms Control#

AI Security#

Technology Policy#

Anonymous Communication & Censorship

Arms Control

AI Security

Technology Policy