📚 Datasets

Name

Type

# Prompts

Purpose

Q&A

~200k

Wikipedia-based question-answer pairs that require reasoning across multiple documents. Useful for evaluating false positives and over-triggering on natural Q&A.

ChatGPT Jailbreak Prompts

Jailbreak

Collection of jailbreak related prompts for ChatGPT. Useful for evaluating the detection rate of publicly known jailbreaks.

mosscap_prompt_injection

Prompt Injection

~278k

Dataset of prompts submitted to Lakera’s Mosscap prompt injection capture the flag game that was created as part of the DEF CON 31 AI Village Generative Red Team Challenge. Useful for evaluating detection rate on a large sample of real-world prompts that are a mixture of adversarial techniques and benign prompts.

OpenAI Moderation Evaluation Dataset

Content Moderation

1680

Dataset of inputs that cover a wide range of content moderation use cases. Useful for evaluating the efficacy of content moderation.

Previous🛠️ Customized evaluation for AI Next💁 DeepEval Support

Last updated 10 months ago