An easy-to-use Python framework to generate adversarial jailbreak prompts by assembling different methods
EasyJailbreak is an easy-to-use Python framework designed for researchers and developers focusing on LLM security. Specifically, EasyJailbreak decomposes the mainstream jailbreaking process into several iterable steps: initialize mutation seeds, select suitable seeds, add constraint, mutate, attack, and evaluate. On this basis, EasyJailbreak provides a component for each step, constructing a playground for further research and attempts. More details can be found in our paper.
Model | ReNeLLM | GPTFuzz | ICA | AutoDAN | PAIR | JailBroken | Cipher | JailBroken | DeepInception | MultiLingual | GCG | Avg |
---|
GPT3.5 | 87% | 86% | 0% | 19% | 100% | 80% | 100% | 66% | 100% | 12% | 61.1% | |
GPT4 | 38% | 0% | 1% | 12% | 58% | 75% | 58% | 35% | 63% | 0% | 31.3% | |
Llama2-7B-chat | 31% | 46% | 0% | 25% | 52% | 6% | 61% | 6% | 8% | 2% | 46% | 27.7% |
Llama2-13B-chat | 69% | 42% | 0% | 8% | 4% | 90% | 4% | 0% | 0% | 46% | 28.8% | |
Vicuna7B | 77% | 100% | 52% | 100% | 100% | 100% | 57% | 100% | 29% | 94% | 94% | 80.3% |
Vicuna13B | 87% | 100% | 80% | 100% | 100% | 100% | 61% | 100% | 17% | 100% | 94% | 83.9% |
ChatGLM3 | 86% | 100% | 54% | 100% | 96% | 95% | 32% | 95% | 33% | 100% | 34% | 73.0% |
Qwen-7B-chat | 70% | 100% | 37% | 100% | 82% | 100% | 34% | 100% | 58% | 99% | 48% | 72.8% |
Intern7B | 67% | 100% | 23% | 100% | 96% | 100% | 85% | 100% | 36% | 99% | 10% | 71.6% |
Mistral | 90% | 100% | 67% | 100% | 94% | 100% | 60% | 100% | 40% | 100% | 82% | 83.3% |
Avg | 70% | 77% | 31% | 89% | 66% | 76% | 64% | 76% | 32% | 76% | 47% |