Facepalm: For some, AI assistants are like good friends whom we can turn to with any sensitive or embarrassing question. It seems safe, after all, because our communication with them is encrypted. However, researchers in Israel have discovered a way for hackers to circumvent that protection.
Like any good assistant, your AI knows a lot about you. It knows where you live and where you work. It probably knows what foods you like and what you are planning to do this weekend. If you are particularly chatty, it may even know if you are considering a divorce or contemplating bankruptcy.
That's why an attack devised by researchers that can read encrypted responses from AI assistants over the web is alarming. The researchers are from the Offensive AI Research Lab in Israel, and they have identified an exploitable side-channel present in most major AI assistants that use streaming to interact with large language models, with the exception of Google Gemini. They then demonstrate how it works on encrypted network traffic from OpenAI's ChatGPT-4 and Microsoft's Copilot.
"[W]e were able to accurately reconstruct 29% of an AI assistant's responses and successfully infer the topic from 55% of them," the researchers wrote in their paper.
The initial point of attack is the token-length side-channel. In natural language processing, the token is the smallest unit of text that carries meaning, the researchers explain. For instance, the sentence "I have an itchy rash" could be tokenized as follows: S = (k1, k2, k3, k4, k5), where the tokens are k1 = I, k2 = have, k3 = an, k4 = itchy, and k5 = rash.
However, tokens represent a significant vulnerability in the way large language model services handle data transmission. Namely, as LLMs generate and send responses as a series of tokens, each token is transmitted from the server to the user as it is generated. While this process is encrypted, the size of the packets can reveal the length of the tokens, potentially allowing attackers on the network to read conversations.
Inferring the content of a response from a token length sequence is challenging because the responses can be several sentences long, leading to millions of grammatically correct sentences, the researchers said. To get around this, they (1) used a large language model to translate these sequences, (2) provided the LLM with inter-sentence context to narrow the search space, and (3) performed a known-plaintext attack by fine-tuning the model on the target model's writing style.
"To the best of our knowledge, this is the first work that uses generative AI to perform a side-channel attack," they wrote.
The researchers have contacted at least one security vendor, Cloudflare, about their work. Since being notified, Cloudflare says it has implemented a mitigation to secure its own inference product called Workers AI, as well as added it to its AI Gateway to protect customers' LLMs regardless of where they are running them.
In their paper, the researchers also provided a mitigation suggestion: including random padding to each message to hide the actual length of tokens in the stream, thereby complicating attempts to infer information based solely on network packet size.