DeepEval
  • ๐Ÿ‘‹ Welcome to DeepEval
  • Overview
    • ๐Ÿ’ก What is DeepEval?
  • โญ๏ธ Our Features
  • ๐Ÿ”จ Getting Started
    • โ˜‘๏ธ Basic security test
    • ๐Ÿ•ต๏ธโ€โ™€๏ธ Deep comprehensive test
  • SOLUTIONS
    • ๐Ÿ’ฏ Basic security evaluation for AI
  • ๐Ÿ”ด Deep red teaming for AI
  • ๐Ÿ’‰ Base model evaluation
  • ๐Ÿ› ๏ธ Customized evaluation for AI
  • Resources
    • ๐Ÿ“š Datasets
  • SUPPORT
    • ๐Ÿ’ DeepEval Support
Powered by GitBook
On this page
  1. Resources

๐Ÿ“š Datasets

Previous๐Ÿ› ๏ธ Customized evaluation for AINext๐Ÿ’ DeepEval Support

Last updated 6 days ago

Name
Type
#ย Prompts
Purpose

Q&A

~200k

Wikipedia-based question-answer pairs that require reasoning across multiple documents. Useful for evaluating false positives and over-triggering on natural Q&A.

Jailbreak

79

Collection of jailbreak related prompts for ChatGPT. Useful for evaluating the detection rate of publicly known jailbreaks.

Prompt Injection

~278k

Dataset of prompts submitted to Lakeraโ€™s prompt injection capture the flag game that was created as part of the . Useful for evaluating detection rate on a large sample of real-world prompts that are a mixture of adversarial techniques and benign prompts.

Content Moderation

1680

Dataset of inputs that cover a wide range of content moderation use cases. Useful for evaluating the efficacy of content moderation.

HotpotQA
ChatGPT Jailbreak Prompts
mosscap_prompt_injection
Mosscap
DEF CON 31 AI Village Generative Red Team Challenge
OpenAI Moderation Evaluation Dataset