  Falcon Dataset: A large English web dataset is utilized for training the Falcon LLM.

  Music Caps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians.

  MMLU (Multi-task Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. (Paper with Code)

  MMMU (Massive Multi-discipline Multimodal Understanding) benchmark assesses multimodal models on college-level tasks, including image and text retrieval, question-answering, and language modeling. It gauges AI models' ability to understand and reason across diverse disciplines. (website)

  HumanEval is a benchmark for evaluating the multilingual ability of code generative models. (Paper with Code)

  GSM8K is high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning. (Paper with Code)

  ShareGPT: Enables users to share their conversations with GPT chatbots, and these conversations can be used for fine-tuning the model.