Datasets Module
Before users start jailbreak processes, users need to prepare and load harmful queries that models should not respond to. EasyJailbreak contains an Instance class to store these queries and other information that may be useful for the jailbreak processes, e.g. the responses from the target model. Meanwhile EasyJailbreak uses a JailbreakDataset class to gather these instances up and support batch operations.
instance
Instance class
jailbreak_datasets
Jailbreak_Dataset Module
This module provides the JailbreakDataset class, which is designed to manage and manipulate datasets for the Easy Jailbreak application. It is capable of handling datasets structured with PromptNode instances, offering functionalities such as shuffling, accessing, and processing data points in an organized way for machine learning tasks related to Easy Jailbreak.
- class easyjailbreak.datasets.jailbreak_datasets.JailbreakDataset(dataset: List[Instance] | str, shuffle: bool = False, local_file_type: str = 'json')
JailbreakDataset class is designed for handling datasets specifically structured for the Easy Jailbreak application. It allows for the representation, manipulation, and access of data points in the form of Instance instances. This class provides essential functionalities such as shuffling, accessing, and formatting data for use in machine learning models.
- add(Instance: Instance)
Adds a new Instance to the dataset.
- Parameters:
instance (Instance) – The Instance to be added to the dataset.
- group_by(key)
Groups instances in the dataset based on a specified key function.
- Parameters:
key (function) – A function that takes an Instance and returns a hashable object for grouping.
- Return list[list[Instance]]:
A list of lists, where each sublist contains Instances grouped by the specified key.
- group_by_parents()
Groups instances in the dataset based on their parent nodes.
- Return list[list[Instance]]:
A list of lists, where each sublist contains Instances grouped by their parent nodes.
- static load_csv(path='data.csv', headers: List[int] | None = None)
Loads a CSV file into the dataset.
- Parameters:
path (str) – The path of the CSV file to be loaded.
headers (list[str]) – A list of column names to be used as headers. Defaults to None.
- static load_jsonl(path='data.jsonl')
Loads a JSONL file into the dataset.
- Parameters:
path (str) – The path of the JSONL file to be loaded.
- classmethod merge(dataset_list)
Merges multiple JailbreakDataset instances into a single dataset.
- Parameters:
dataset_list (list[JailbreakDataset]) – A list of JailbreakDataset instances to be merged.
- Return JailbreakDataset:
A new JailbreakDataset instance containing merged data from the provided datasets.
- save_to_csv(path='data.csv')
Saves the dataset to a CSV file.
- Parameters:
path (str) – The path of the file where the dataset will be saved. Defaults to ‘data.csv’.
- save_to_jsonl(path='data.jsonl')
Saves the dataset to a JSONL file using jsonlines library.
- Parameters:
path (str) – The path of the file where the dataset will be saved. Defaults to ‘data.jsonl’.
- shuffle()
Shuffles the dataset in place.
This method randomizes the order of the dataset’s elements and updates the shuffled attribute to True.