Red Team an Assistant

1. Navigate to the assistants section from the sidebar

Step 1 screenshot

2. Click on the three dots next to your assistant's name.

Step 2 screenshot

3. Click on Red Team

Step 3 screenshot

4. Click on Get Started

Step 1 screenshot

5. Select Target Dataset

You have two options available:

You can use built-in datasets (already provided in the system).

Or you can create and use your own custom dataset for testing purposes.

Step 2 screenshot

6. If you want to create a custom dataset: Go to the Custom Datasets option.

Enter a suitable name for your dataset.

Add a description explaining what your dataset is about.

Step 3 screenshot

7. Drag and drop your dataset file here.

Step 4 screenshot

8. Here, you will see all of your custom datasets listed.

Step 5 screenshot

9. If you want to use a built-in dataset, Click on the Built-in Datasets option.

Step 6 screenshot

10. Select the appropriate category for your dataset.

Step 7 screenshot

11. Click on Next

Step 8 screenshot

12. Provide a clear and descriptive name for the job, specify the number of prompts to execute

Step 9 screenshot

13. Select Attack Type

Step 10 screenshot

14. Define the attack parameters by selecting the specific methods you want to apply and Click on Next.

Step 12 screenshot

15. Select the Converter that defines how prompts and responses will be transformed or formatted before being sent to the model.

For a complete guide on the different types of converters view Assistant Red Teaming

Step 14 screenshot

16. Click on Next

Step 15 screenshot

17. Here you can review your configuration before launching job.

Step 16 screenshot

18. Click on Launch Job

Step 17 screenshot

19. A new job will appear in the jobs table with status Pending. After a short time, the status will update to Running as the attack is executed.

Step 18 screenshot

20. Click on View Attack Details to review your configuration to ensure they match your testing goals.

Step 19 screenshot

Step 20 screenshot

22. If your attack is in Running state and you want to stop it, click on Cancel Attack.

Step 21 screenshot

23. Click on Cancel Attack. Remember that this action can't be undone.

Step 22 screenshot

24. Once the attack completes, the job status will change to Completed. Click on View Result

Step 1 screenshot

25. Here you can inspect the prompts, model responses, and Scorer outputs for each red team example.

Step 2 screenshot

26. Click on Export Vulnerability Report to export a report with vulnerable prompts along with model response and rationale.

Step 3 screenshot

Assistant Red Teaming