Dataset Viewer
Auto-converted to Parquet Duplicate
choices
stringclasses
1 value
cont_tokens
stringlengths
43
46.4k
example
stringlengths
351
5.89k
full_prompt
stringlengths
584
6.13k
gold
stringclasses
4 values
gold_index
stringclasses
1 value
input_tokens
stringlengths
754
4.47k
instruction
stringlengths
351
5.89k
metrics
stringclasses
2 values
num_asked_few_shots
stringclasses
1 value
num_effective_few_shots
stringclasses
1 value
padded
stringclasses
1 value
pred_logits
stringclasses
1 value
predictions
stringlengths
172
190k
specifics
stringclasses
20 values
truncated
stringclasses
1 value
[]
[[(1271, 11886, 420, 11, 584, 1205, 311, 2980, 279, 13010, 1990, 44448, 1718, 332, 4010, 1361, 3418, 385, 2741, 4010, 47707, 774, 39710, 323, 33088, 27456, 13935, 304, 3090, 382, 791, 4495, 1920, 18065, 264, 8999, 12454, 21914, 82882, 13155, 477, 67597, 13786, 34454, 88, 16284, 44448, 385, 723, 684, 13, 220, 220, 16, 1...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Identify the final product produced when cyclobutyl(cyclopropyl)methanol reacts with phosphoric acid in water. A. 1,2,3,4,5,6-hexahydropentalene B. spiro[3.4]oct-5-ene C. 1,2-dimethylcyclohexa-1,4-diene D. [1,1'-...
<|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 05 Feb 2025 <|eot_id|><|start_header_id|>user<|end_header_id|> Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Identify the final product produced ...
['A']
[]
[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 2304, 13806, 220, 2366, 20, 271, 128009, 128006, 882, 128007, 271, 22818, 279, 2768, 3488, 323, 3116, 9322, 11503, 320, 32, 11, 426, 11, 356, 323, 423, 705, 5268, 279, 1888, 4320, 382, 14924, 25, 6...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Identify the final product produced when cyclobutyl(cyclopropyl)methanol reacts with phosphoric acid in water. A. 1,2,3,4,5,6-hexahydropentalene B. spiro[3.4]oct-5-ene C. 1,2-dimethylcyclohexa-1,4-diene D. [1,1'-...
{'extractive_match': 0.0}
0
0
[0]
[]
[(['To solve this, we need to consider the reaction between cyclobutyl(cyclopropyl)methanol and phosphoric acid in water.\n\nThe correct process involves a meta cummic chloride scheme or palladium catalyzed cycloaddition. 1,2,3,4,5,6-hexahydroxepi π-allene is formed in this reaction, from a cyclobutyl methoxide. So, t...
{'extracted_predictions': ['C', 'C'], 'extracted_golds': ['A', 'A']}
[0]
[]
[[(1271, 11886, 420, 3575, 11, 1095, 596, 24564, 1855, 24549, 1473, 16, 13, 763, 220, 22, 11, 22, 1773, 333, 10036, 269, 677, 2912, 33678, 58, 17, 13, 17, 13, 16, 60, 383, 418, 2194, 11, 279, 54736, 483, 33299, 690, 1624, 3768, 553, 279, 17130, 17915, 11, 6968, 264, 1624, 3768, 1534, 11049, 323, 11, 8617, 11, 7293, 171...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: "Consider the following compounds: 1: 7,7-difluorobicyclo[2.2.1]heptane 2: 7-methoxybicyclo[2.2.1]heptane 3: 7-(propan-2-ylidene)bicyclo[2.2.1]heptane 4: 7-fluorobicyclo[2.2.1]heptane which of these compounds co...
<|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 05 Feb 2025 <|eot_id|><|start_header_id|>user<|end_header_id|> Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: "Consider the following compounds: 1...
['D']
[]
[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 2304, 13806, 220, 2366, 20, 271, 128009, 128006, 882, 128007, 271, 22818, 279, 2768, 3488, 323, 3116, 9322, 11503, 320, 32, 11, 426, 11, 356, 323, 423, 705, 5268, 279, 1888, 4320, 382, 14924, 25, 3...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: "Consider the following compounds: 1: 7,7-difluorobicyclo[2.2.1]heptane 2: 7-methoxybicyclo[2.2.1]heptane 3: 7-(propan-2-ylidene)bicyclo[2.2.1]heptane 4: 7-fluorobicyclo[2.2.1]heptane which of these compounds co...
{'extractive_match': 1.0}
0
0
[0]
[]
[(["To solve this problem, let's analyze each compound:\n\n1. In 7,7-difluorobicyclo[2.2.1]heptane, the fluorine atoms will delocalize the electron density, creating a delocalized bond and, thus, remain electron-deficient.\n\n Electronegativity and electron density increase when the molecular system realizes a more l...
{'extracted_predictions': ['D', 'D'], 'extracted_golds': ['D', 'D']}
[0]
[]
[[(1271, 11886, 420, 3488, 11, 1095, 596, 2980, 279, 5016, 15174, 11012, 617, 28995, 2085, 16785, 7397, 74767, 11, 1778, 439, 1884, 1766, 304, 68951, 323, 1063, 4363, 11012, 1093, 41245, 3447, 13, 15483, 11012, 617, 77765, 369, 20237, 7953, 7397, 74767, 11, 902, 3727, 4320, 423, 2753, 4461, 311, 617, 264, 3544, 3560, 3...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Some plants lack chlorophyll as they have evolved unique strategies to survive without performing photosynthesis. This group includes evolutionary distinct genera comprising algae and land plants, including vario...
<|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 05 Feb 2025 <|eot_id|><|start_header_id|>user<|end_header_id|> Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Some plants lack chlorophyll as they...
['D']
[]
[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 2304, 13806, 220, 2366, 20, 271, 128009, 128006, 882, 128007, 271, 22818, 279, 2768, 3488, 323, 3116, 9322, 11503, 320, 32, 11, 426, 11, 356, 323, 423, 705, 5268, 279, 1888, 4320, 382, 14924, 25, 4...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Some plants lack chlorophyll as they have evolved unique strategies to survive without performing photosynthesis. This group includes evolutionary distinct genera comprising algae and land plants, including vario...
{'extractive_match': 1.0}
0
0
[0]
[]
[(["To solve this question, let's consider the unique strategies plants have evolved without performing photosynthesis, such as those found in algae and some land plants like orchids. Such plants have adaptations for survival beyond photosynthesis, which makes answer D less likely to have a large role in their survival...
{'extracted_predictions': ['D', 'D'], 'extracted_golds': ['D', 'D']}
[0]
[]
[[(60734, 62, 17, 596, 65691, 4261, 649, 387, 16997, 439, 11263, 1473, 2244, 65, 2223, 4261, 320, 51, 8, 284, 220, 17, 49345, 81, 61, 18, 611, 45958, 5417, 62, 19, 20594, 8, 720, 4071, 2728, 430, 45958, 15858, 9803, 311, 220, 17, 11, 720, 2244, 65, 2223, 4261, 315, 29935, 62, 17, 284, 122371, 17, 38964, 611, 320, 17, 8...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Astronomers are studying a system of five exoplanets with circular orbits, discovered through the TTV method. The planets (designated as Planet_1 through Planet_5), which are in orbital resonance, have orbital pe...
<|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 05 Feb 2025 <|eot_id|><|start_header_id|>user<|end_header_id|> Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Astronomers are studying a system of...
['D']
[]
[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 2304, 13806, 220, 2366, 20, 271, 128009, 128006, 882, 128007, 271, 22818, 279, 2768, 3488, 323, 3116, 9322, 11503, 320, 32, 11, 426, 11, 356, 323, 423, 705, 5268, 279, 1888, 4320, 382, 14924, 25, 3...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Astronomers are studying a system of five exoplanets with circular orbits, discovered through the TTV method. The planets (designated as Planet_1 through Planet_5), which are in orbital resonance, have orbital pe...
{'extractive_match': 0.0}
0
0
[0]
[]
[(["Planet_2's orbital period can be calculated as follows:\n\nOrbital period (T) = 2πr^3 / gcd(P_4,T) \nBut given that gcd simplifies to 2, \nOrbital period of Planet_2 = √2 PV / (2r\\_4) \n= √2 * Proposal.\\_ period \n\n(assuming _____ period is 2*p).\n\nAnd Planet_4: Planet_2 ratio is in the list 1:2:2.5:3.5:5. And ...
{'extracted_predictions': ['B', 'B'], 'extracted_golds': ['D', 'D']}
[0]
[]
[[(22818, 512, 1451, 30374, 551, 720, 41790, 5376, 82263, 51, 11, 82263, 76, 7569, 27378, 315, 22996, 82263, 53, 67, 11, 82263, 76, 7569, 48602, 315, 22996, 82263, 937, 22996, 323, 82263, 76, 7569, 8286, 315, 95273, 82263, 53, 70, 11, 82263, 76, 7569, 48602, 315, 95273, 82263, 7939, 198, 791, 4689, 24524, 315, 3314, 37...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Diamond and graphite are present in the same magma chamber of a volcano. The diamond and graphite are in the state of thermodynamic equilibrium in the magma chamber when the temperature increases by 100.0 K. By h...
<|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 05 Feb 2025 <|eot_id|><|start_header_id|>user<|end_header_id|> Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Diamond and graphite are present in ...
['D']
[]
[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 2304, 13806, 220, 2366, 20, 271, 128009, 128006, 882, 128007, 271, 22818, 279, 2768, 3488, 323, 3116, 9322, 11503, 320, 32, 11, 426, 11, 356, 323, 423, 705, 5268, 279, 1888, 4320, 382, 14924, 25, 2...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: Diamond and graphite are present in the same magma chamber of a volcano. The diamond and graphite are in the state of thermodynamic equilibrium in the magma chamber when the temperature increases by 100.0 K. By h...
{'extractive_match': 0.0}
0
0
[0]
[]
[(['Given:\nInd RSS : \nTemperature increase ΔT, Δmolar volumes of diamond ΔVd, Δmolar entropy of diamond ΔSE diamond and Δmolar volume of graphite ΔVg, Δmolar entropy of graphite ΔSG\nThe general equation of State is ΔP × ΔV = ΔSE\nFor triangle Pvg ΔP \nΔP = ΔVd * ΔSE / ΔVg\nSubstituting the given values, ΔP = (3.417 ...
{'extracted_predictions': [], 'extracted_golds': ['D', 'D']}
[0]
[]
[[(1271, 11886, 420, 3575, 11, 584, 1205, 311, 24564, 279, 2728, 2038, 3094, 555, 3094, 1473, 16, 13, 578, 3488, 34945, 430, 279, 15922, 21562, 12457, 555, 10917, 279, 18316, 389, 264, 30136, 1380, 484, 10318, 859, 1051, 220, 1272, 2385, 13840, 304, 1404, 13, 1115, 13533, 430, 279, 15922, 35603, 9124, 555, 67791, 505, ...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: "Scientist aim to detect plasmid-mediated quinolone resistance from Klebsiella pneumoniae by targeting three plasmid-mediated quinolone resistance genes qnrA, qnrB, and qnrS. He used multiplex PCR using the follo...
<|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 05 Feb 2025 <|eot_id|><|start_header_id|>user<|end_header_id|> Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: "Scientist aim to detect plasmid-med...
['A']
[]
[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 2304, 13806, 220, 2366, 20, 271, 128009, 128006, 882, 128007, 271, 22818, 279, 2768, 3488, 323, 3116, 9322, 11503, 320, 32, 11, 426, 11, 356, 323, 423, 705, 5268, 279, 1888, 4320, 382, 14924, 25, 3...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: "Scientist aim to detect plasmid-mediated quinolone resistance from Klebsiella pneumoniae by targeting three plasmid-mediated quinolone resistance genes qnrA, qnrB, and qnrS. He used multiplex PCR using the follo...
{'extractive_match': 1.0}
0
0
[0]
[]
[(['To solve this problem, we need to analyze the given information step by step:\n\n1. The question mentions that the DNA bands obtained by putting the gel on a UV transilluminator were 40 base pairs in size. This suggests that the DNA fragments produced by PCR from the plasmid DNA inhibit the gel migration flow durin...
{'extracted_predictions': ['A', 'A'], 'extracted_golds': ['A', 'A']}
[0]
[]
[[(791, 24924, 4545, 12097, 374, 2728, 555, 1144, 15479, 284, 22350, 69307, 89, 11, 1405, 22350, 374, 264, 6926, 13, 3277, 279, 24924, 2115, 10800, 1022, 323, 832, 374, 6656, 389, 3235, 279, 489, 57, 5216, 11, 279, 1167, 42487, 315, 279, 19320, 596, 4545, 690, 15453, 5376, 13, 1115, 649, 387, 13605, 555, 4737, 279, 892...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: There is a spin 1/2 particle with magnetic moment mu (= gamma*Sz), aligned with a magnetic field pointing in the +Z direction. Suddenly, the magnetic field is turned off, and another one is turned on along the +Y...
<|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 05 Feb 2025 <|eot_id|><|start_header_id|>user<|end_header_id|> Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: There is a spin 1/2 particle with ma...
['C']
[]
[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 2304, 13806, 220, 2366, 20, 271, 128009, 128006, 882, 128007, 271, 22818, 279, 2768, 3488, 323, 3116, 9322, 11503, 320, 32, 11, 426, 11, 356, 323, 423, 705, 5268, 279, 1888, 4320, 382, 14924, 25, 2...
Given the following question and four candidate answers (A, B, C and D), choose the best answer. Question: There is a spin 1/2 particle with magnetic moment mu (= gamma*Sz), aligned with a magnetic field pointing in the +Z direction. Suddenly, the magnetic field is turned off, and another one is turned on along the +Y...
{'extractive_match': 0.0}
0
0
[0]
[]
[(["The magnetic moment mu is given by \\mu = gamma*Sz, where gamma is a constant. When the magnetic field turns off and one is turned on along the +Z direction, the z-component of the particle's moment will initially increase. This can be expressed by taking the time derivative of initial and final momenta and use the...
{'extracted_predictions': ['A', 'A'], 'extracted_golds': ['C', 'C']}
[0]
[]
"[[(1271, 11886, 420, 3575, 584, 690, 1464, 433, 1523, 3094, 555, 3094, 382, 16, 13, 1556, 27658, 37(...TRUNCATED)
"Given the following question and four candidate answers (A, B, C and D), choose the best answer.\n\(...TRUNCATED)
"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 202(...TRUNCATED)
['A']
[]
"[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198(...TRUNCATED)
"Given the following question and four candidate answers (A, B, C and D), choose the best answer.\n\(...TRUNCATED)
{'extractive_match': 0.0}
0
0
[0]
[]
"[(['To solve this problem we will break it down step by step.\\n\\n1. Aniline is heated with sulfur(...TRUNCATED)
{'extracted_predictions': ['D', 'D'], 'extracted_golds': ['A', 'A']}
[0]
[]
"[[(10267, 596, 79164, 512, 12, 1226, 1390, 311, 1505, 279, 11595, 315, 18852, 315, 279, 9958, 27605(...TRUNCATED)
"Given the following question and four candidate answers (A, B, C and D), choose the best answer.\n\(...TRUNCATED)
"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 202(...TRUNCATED)
['D']
[]
"[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198(...TRUNCATED)
"Given the following question and four candidate answers (A, B, C and D), choose the best answer.\n\(...TRUNCATED)
{'extractive_match': 1.0}
0
0
[0]
[]
"[(['Let\\'s denote:\\n- We want to find the ratio of periods of the stars orbiting one another, let(...TRUNCATED)
{'extracted_predictions': ['D', 'D'], 'extracted_golds': ['D', 'D']}
[0]
[]
"[[(1271, 1505, 279, 8187, 27924, 304, 4907, 320, 101561, 36, 8, 315, 459, 17130, 11, 584, 649, 1005(...TRUNCATED)
"Given the following question and four candidate answers (A, B, C and D), choose the best answer.\n\(...TRUNCATED)
"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 202(...TRUNCATED)
['C']
[]
"[[128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198(...TRUNCATED)
"Given the following question and four candidate answers (A, B, C and D), choose the best answer.\n\(...TRUNCATED)
{'extractive_match': 1.0}
0
0
[0]
[]
"[([\"To find the minimum uncertainty in energy (ΔE) of an electron, we can use Heisenberg's uncert(...TRUNCATED)
{'extracted_predictions': ['C', 'C'], 'extracted_golds': ['C', 'C']}
[0]
End of preview. Expand in Data Studio

Dataset Card for Evaluation run of meta-llama/Llama-3.2-1B-Instruct

Dataset automatically created during the evaluation run of model meta-llama/Llama-3.2-1B-Instruct.

The dataset is composed of 1 configuration, each one coresponding to one of the evaluated task.

The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results.

An additional configuration "results" store all the aggregated results of the run.

To load the details from a run, you can for instance do the following:

from datasets import load_dataset
data = load_dataset("lewtun/details_meta-llama__Llama-3.2-1B-Instruct",
    "results",
    split="train")

Latest results

These are the latest results from run 2025-02-05T09:37:32.362591(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval):

{
    "all": {
        "extractive_match": 0.27455357142857145,
        "extractive_match_stderr": 0.02110874729063386
    },
    "lighteval|gpqa:main|0": {
        "extractive_match": 0.27455357142857145,
        "extractive_match_stderr": 0.02110874729063386
    }
}

Dataset Details

Dataset Description

  • Curated by: [More Information Needed]
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]

Dataset Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Dataset Structure

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Data Collection and Processing

[More Information Needed]

Who are the source data producers?

[More Information Needed]

Annotations [optional]

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Dataset Card Authors [optional]

[More Information Needed]

Dataset Card Contact

[More Information Needed]

Downloads last month
10