Dataset Viewer

The dataset viewer is not available for this subset.

Cannot get the split names for the config 'default' of the dataset.

Exception:    SplitsNotFoundError
Message:      The split names could not be parsed from the dataset config.
Traceback:    Traceback (most recent call last):
                File "/usr/local/lib/python3.12/site-packages/datasets/packaged_modules/json/json.py", line 246, in _generate_tables
                  pa_table = paj.read_json(
                             ^^^^^^^^^^^^^^
                File "pyarrow/_json.pyx", line 342, in pyarrow._json.read_json
                File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
                File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
              pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to string in row 0
              
              During handling of the above exception, another exception occurred:
              
              Traceback (most recent call last):
                File "/usr/local/lib/python3.12/site-packages/datasets/inspect.py", line 286, in get_dataset_config_info
                  for split_generator in builder._split_generators(
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/packaged_modules/json/json.py", line 97, in _split_generators
                  pa_table = next(iter(self._generate_tables(**splits[0].gen_kwargs, allow_full_read=False)))[1]
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/packaged_modules/json/json.py", line 260, in _generate_tables
                  batch = json_encode_fields_in_json_lines(original_batch, json_field_paths)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/utils/json.py", line 106, in json_encode_fields_in_json_lines
                  examples = [ujson_loads(line) for line in original_batch.splitlines()]
                              ^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/utils/json.py", line 20, in ujson_loads
                  return pd.io.json.ujson_loads(*args, **kwargs)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              ValueError: Expected object or value
              
              The above exception was the direct cause of the following exception:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/config/split_names.py", line 65, in compute_split_names_from_streaming_response
                  for split in get_dataset_split_names(
                               ^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/inspect.py", line 340, in get_dataset_split_names
                  info = get_dataset_config_info(
                         ^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/inspect.py", line 291, in get_dataset_config_info
                  raise SplitsNotFoundError("The split names could not be parsed from the dataset config.") from err
              datasets.inspect.SplitsNotFoundError: The split names could not be parsed from the dataset config.

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ST-Evidence Benchmark Dataset

ST-Evidence is a comprehensive benchmark for evaluating Spatial-Temporal Evidence generation in video understanding. It contains two tasks: Generation (Gen) and Multiple Choice Question (MCQ).

This was released for research purposes only, in support of the academic paper Evidence-Backed Video Question Answering.

Dataset Overview

Total Videos: ~1,300 videos at 6fps
Annotations: Question-Answer pairs with temporal segments and spatial masks
Tasks: Generation and MCQ
Domains: Diverse video content

Files Structure

ST-Evidence/
├── st_evidence_gen/              # Generation Task
│   ├── st_evidence_gen.csv       # 924KB - Annotations (entry_id, question, answer, segments, etc.)
│   ├── videos_6fps.tar.gz        # 8.3GB - Video files at 6fps
│   └── masks.tar.gz              # 560MB - Ground truth spatial masks
│
└── st_evidence_mcq/              # Multiple Choice Question Task
    ├── st_evidence_mcq.csv       # 313KB - MCQ annotations
    ├── mask_options.json         # 575KB - Mask options metadata
    ├── temp_options.json         # 679KB - Temporal options metadata
    └── options.tar.gz            # 1.5GB - Pre-rendered option masks (1,298 entries)

Generation Task (st_evidence_gen)

Data Format

st_evidence_gen.csv contains the following columns:

entry_id: Unique identifier for each question
video_id: Video identifier
video_path: Relative path to video file
question: Question text
candidates: List of answer options (for reference)
answer: Ground truth answer
segment: Temporal evidence segments [[start1, end1], [start2, end2], ...]

Usage

import pandas as pd
import tarfile

# Load annotations
df = pd.read_csv('st_evidence_gen.csv')

# Extract videos
with tarfile.open('videos_6fps.tar.gz', 'r:gz') as tar:
    tar.extractall('videos_6fps/')

# Extract ground truth masks
with tarfile.open('masks.tar.gz', 'r:gz') as tar:
    tar.extractall('masks/')

Evaluation Metrics

QA Accuracy: Percentage of correct answers
Temporal IoU: Intersection over Union for temporal segments
- mIoU, TIoU@0.3, TIoU@0.5
Temporal IoP: Intersection over Prediction
- mIoP, TIoP@0.3, TIoP@0.5
Spatial Quality (if masks generated):
- J score (Jaccard/IoU)
- F score (contour-based)
- J&F score (average)

MCQ Task (st_evidence_mcq)

Data Format

st_evidence_mcq.csv contains:

entry_id: Unique identifier
video_id: Video identifier
video_path: Path to video
question: Question text
candidates: Answer options
answer: Correct answer
segment: Temporal evidence
mask_options: Reference to mask options
temp_options: Reference to temporal options

mask_options.json: Contains spatial mask options for each question temp_options.json: Contains temporal segment options for each question options.tar.gz: Pre-rendered mask visualizations for options (1,298 entries)

Usage

import json
import pandas as pd

# Load MCQ annotations
df = pd.read_csv('st_evidence_mcq.csv')

# Load options
with open('mask_options.json', 'r') as f:
    mask_options = json.load(f)

with open('temp_options.json', 'r') as f:
    temp_options = json.load(f)

# Extract option masks
with tarfile.open('options.tar.gz', 'r:gz') as tar:
    tar.extractall('options/')

Evaluation Metrics

Same as Generation task, but with multiple-choice format.

Download & Setup

Using HuggingFace Hub

from huggingface_hub import snapshot_download

# Download entire dataset
snapshot_download(
    repo_id="Salesforce/ST-Evidence",
    repo_type="dataset",
    local_dir="./st_evidence_data"
)

Manual Download

Download all files from this repository

Extract compressed files:

tar -xzf videos_6fps.tar.gz
tar -xzf masks.tar.gz
tar -xzf options.tar.gz

Citation

If you use this dataset, please cite:

@article{st-evidence2025,
  title={ST-Evidence: A Benchmark for Spatial-Temporal Evidence in Video Understanding},
  author={Wang, Shijie and others},
  year={2025}
}

License

CC-BY-NC 4.0

Version

Version: 1.0
Release Date: 2025-03-14
Total Size: ~10.4 GB (compressed)

Downloads last month: 32

Total file size:

1.6 GB