Selector Module

This section of documentation introduces the various submodules in easyjailbreak.selector. The Selector module is used to select the most suitable sample from the dataset for mutation In some circumstances, a seed can exponentially generate innumerable jailbreak instances. Therefore, it is important to select jailbreak instances that have great potential for later processes, especially when compute resources are limited. EasyJailbreak offers several kinds of selectors for users to pick.

selector

SelectPolicy class

This file contains the implementation of policies for selecting instances from datasets, specifically tailored for use in easy jailbreak scenarios. It defines abstract base classes and concrete implementations for selecting instances based on various criteria.

class easyjailbreak.selector.selector.SelectPolicy(Datasets: JailbreakDataset)

Abstract base class representing a policy for selecting instances from a JailbreakDataset. It provides a framework for implementing various selection strategies.

initial(): Initializes or resets any internal state of the selection policy, if necessary.

abstract select() → Instance

Abstract method that must be implemented by subclasses to define the selection strategy.

Return ~Instance:: The selected instance from the dataset.

update(jailbreak_dataset: JailbreakDataset)

Updates the internal state of the selection policy, if necessary.

Parameters:: jailbreak_dataset (~JailbreakDataset) – The dataset to update the policy with.

UCBSelectPolicy

UCBSelectPolicy class

class easyjailbreak.selector.UCBSelectPolicy.UCBSelectPolicy(explore_coeff: float = 1.0, Dataset: JailbreakDataset | None = None)

A selection policy based on the Upper Confidence Bound (UCB) algorithm. This policy is designed to balance exploration and exploitation when selecting instances from a JailbreakDataset. It uses the UCB formula to select instances that either have high rewards or have not been explored much.

select() → JailbreakDataset

Selects an instance from the dataset based on the UCB algorithm.

Return ~JailbreakDataset:: The selected JailbreakDataset from the dataset.

update(Dataset: JailbreakDataset)

Updates the rewards for the last selected instance based on the success of the prompts.

Parameters:: Dataset (~JailbreakDataset) – The dataset containing prompts used for updating rewards.

SelectBasedOnScores

‘SelectBasedOnScores’, select those instances whose scores are high(scores are on the extent of jailbreaking), detail information can be found in the following paper.

Paper title: Tree of Attacks: Jailbreaking Black-Box LLMs Automatically arXiv link: https://arxiv.org/abs/2312.02119 Source repository: https://github.com/RICommunity/TAP

class easyjailbreak.selector.SelectBasedOnScores.SelectBasedOnScores(Dataset: JailbreakDataset, tree_width)

This class implements a selection policy based on the scores of instances in a JailbreakDataset. It selects a subset of instances with high scores, relevant for jailbreaking tasks.

select(dataset: JailbreakDataset) → List[Instance]

Selects a subset of instances from the dataset based on their scores.

Parameters:: dataset (~JailbreakDataset) – The dataset from which instances are to be selected.
Return List[Instance]:: A list of selected instances with high evaluation scores.

RoundRobinSelectPolicy

RoundRobinSelectPolicy class

class easyjailbreak.selector.RoundRobinSelectPolicy.RoundRobinSelectPolicy(Dataset: JailbreakDataset)

A selection policy that selects instances from a JailbreakDataset in a round-robin manner. This policy iterates over the dataset, selecting each instance in turn, and then repeats the process.

select() → JailbreakDataset

Selects the next instance in the dataset based on a round-robin approach and increments its visited count.

Return ~JailbreakDataset:: The selected instance from the dataset.

update(prompt_nodes: JailbreakDataset | None = None)

Updates the selection index based on the length of the dataset.

Parameters:: prompt_nodes (~JailbreakDataset) – Not used in this implementation.

RandomSelector

RandomSelectPolicy class

class easyjailbreak.selector.RandomSelector.RandomSelectPolicy(Datasets: JailbreakDataset)

A selection policy that randomly selects an instance from a JailbreakDataset. It extends the SelectPolicy abstract base class, providing a concrete implementation for the random selection strategy.

select() → JailbreakDataset

Selects an instance randomly from the dataset and increments its visited count.

Return ~JailbreakDataset:: The randomly selected instance from the dataset.

MCTSExploreSelectPolicy

MCTSExploreSelectPolicy class

class easyjailbreak.selector.MCTSExploreSelectPolicy.MCTSExploreSelectPolicy(dataset, inital_prompt_pool, Questions, ratio=0.5, alpha=0.1, beta=0.2)

This class implements a selection policy based on the Monte Carlo Tree Search (MCTS) algorithm. It is designed to explore and exploit a dataset of instances for effective jailbreaking of LLMs.

select() → Instance

Selects an instance from the dataset using MCTS algorithm.

Return ~JailbreakDataset:: The selected instance from the dataset.

update(prompt_nodes: JailbreakDataset)

Updates the weights of nodes in the MCTS tree based on their performance.

Parameters:: prompt_nodes (~JailbreakDataset) – Dataset of prompt nodes to update.

EXP3SelectPolicy

EXP3SelectPolicy class

class easyjailbreak.selector.EXP3SelectPolicy.EXP3SelectPolicy(Dataset: JailbreakDataset, energy: float = 1.0, gamma: float = 0.05, alpha: float = 25)

A selection policy based on the Exponential-weight algorithm for Exploration and Exploitation (EXP3). This policy is designed for environments with adversarial contexts, balancing between exploring new instances and exploiting known rewards in a JailbreakDataset.

initial(): Initializes or resets the weights and probabilities for each instance in the dataset.

select() → Instance

Selects an instance from the dataset based on the EXP3 algorithm.

Return ~JailbreakDataset:: The selected instance from the dataset.

update(prompt_nodes: JailbreakDataset)

Updates the weights of the last chosen instance based on the success of the prompts.

Parameters:: prompt_nodes (~JailbreakDataset) – The dataset containing prompts used for updating weights.

ReferenceLossSelector

class easyjailbreak.selector.ReferenceLossSelector.ReferenceLossSelector(model: WhiteBoxModelBase, batch_size=None, is_universal=False)

This class implements a selection policy based on the reference loss. It selects instances from a set of parents based on the minimum loss calculated on their reference target, discarding others.

select(dataset) → JailbreakDataset

Selects instances from the dataset based on the calculated reference loss.

Parameters:: dataset (~JailbreakDataset) – The dataset from which instances are to be selected.
Return ~JailbreakDataset:: A new dataset containing selected instances with minimum reference loss.