![]() To do this, the researchers probed jailbreak defenses by examining differences in the chatbots’ response time when a jailbreak attempt is detected or not detected. “This motivated the initial idea of this work: time-based analysis (like what has been done for traditional SQL injections) can help with LLM jailbreaking.” “We found that some classical analysis techniques can be transferred to analyze and identify problems/vulnerabilities in LLMs,” Yuekang Li, a researcher at Virginia Tech who co-authored the paper, told Motherboard. However, the researchers claim that by training their own LLM on examples of common jailbreak prompts, they were able to generate new, working prompts with a success rate of 21.58 percent-several times higher than the 7.33 percent success rate of the current known jailbreak prompts. The obscure and convoluted nature of the AI systems make it hard to know exactly what these defenses are, or how one might get around them. While humorous, most of these clever tricks no longer work because companies continuously patch the chatbots with new defenses. “By fine-tuning an LLM with jailbreak prompts, we demonstrate the possibility of automated jailbreak generation targeting a set of well-known commercialized LLM chatbots.” “By manipulating the time-sensitive responses of the chatbots, we are able to understand the intricacies of their implementations, and create a proof-of-concept attack to bypass the defenses in multiple LLM chatbots, e.g., CHATGPT, Bard, and Bing Chat,” wrote the international team of researchers-the paper lists affiliations with Nanyang Technological University in Singapore, Huazhong University of Science and Technology in China, as well as the University of New South Wales and Virginia Tech-in a paper posted to the arXiv preprint server. Using a framework they call “Masterkey,” the researchers were able to effectively automate this process of finding new vulnerabilities in Large Language Model (LLM)-based systems like ChatGPT, Microsoft's Bing Chat, and Google Bard. Now, a team of researchers says they’ve trained an AI tool to generate new methods to evade the defenses of other chatbots, as well as create malware to inject into vulnerable systems.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |