Dataset Viewer
Auto-converted to Parquet Duplicate
Search is not available for this dataset
image
imagewidth (px)
512
512
label
class label
2 classes
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
0baseline
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current
1current

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

satellite-disruption-triage-aux-v1-3

Civilian Conflict-Disruption Satellite VLM Dataset — Auxiliary / v1.3

This is an auxiliary dataset for training and evaluating Vision-Language Models (VLMs) to perform civilian conflict-disruption triage from paired satellite imagery. It is not a tactical intelligence dataset and not a canonical expert benchmark.

Scope & Purpose

The target task is detecting macro-scale civilian infrastructure disruption caused by war, armed conflict, bombardment, shelling, explosions, siege, or major unrest. The model compares a baseline (pre-event) satellite image with a current (post-event) image and outputs a structured triage decision.

Included civilian infrastructure:

  • Hospitals, schools, residential/civilian building clusters
  • Food/logistics warehouses, grain silos, markets, aid hubs
  • Ports, bridges, water facilities, power/desalination plants
  • IDP camps, public infrastructure

Explicitly excluded:

  • Military bases, weapons systems, troop positions, air defenses
  • Tactical route intelligence, target ranking, strike planning data

Dataset Version

  • Version: 1.3.0
  • Total examples: 3,332
  • Conflict-core examples: 3,272 (2,127 train / 1,145 eval)
  • Non-conflict auxiliary examples: 60 (42 train / 18 eval) — capped at <2% of total
  • Image resolution: 512×512 PNG
  • License: CC-BY-NC-4.0 (Maxar Open Data, non-commercial)
  • Build date: 2026-04-25

Schema

Flat JSONL (train_flat.jsonl, eval_flat.jsonl)

Each row:

{
  "example_id": "string",
  "baseline_image": "images/baseline/...png",
  "current_image": "images/current/...png",
  "target_output": {
    "action": "discard | defer | downlink_now",
    "category": "conflict_building_damage | conflict_hospital_damage | conflict_food_logistics_damage | conflict_water_infrastructure_damage | conflict_bridge_or_access_damage | conflict_port_or_silo_damage | conflict_urban_area_damage | explosion_damage | no_visible_disruption | ambiguous_or_low_visibility | other_conflict_civilian_disruption",
    "rationale": "short sentence",
    "bbox_norm": [x_min, y_min, x_max, y_max] | null
  },
  "source_dataset": "string",
  "source_event": "string",
  "source_image_name": "string",
  "provenance": "source URL or dataset reference",
  "modality": "optical-to-optical | optical-to-SAR | SAR-to-SAR | other",
  "location_name": "string",
  "country": "string",
  "conflict_context": "short string",
  "baseline_date": "YYYY-MM-DD or null",
  "current_date": "YYYY-MM-DD or null",
  "license": "string",
  "label_method": "mask-derived | metadata-derived | manual-review | vlm-assisted | weak-label",
  "damage_ratio": float | null,
  "destruction_ratio": float | null,
  "subset": "conflict_core | non_conflict_auxiliary"
}

SFT JSONL (train_sft.jsonl, eval_sft.jsonl)

Conversational format for VLM fine-tuning:

{
  "example_id": "string",
  "images": ["images/baseline/...png", "images/current/...png"],
  "messages": [
    {"role": "system", "content": "You are a civilian conflict-disruption satellite triage model. Return strict JSON only."},
    {"role": "user", "content": "Compare the baseline and current satellite images. Focus only on macro-scale civilian disruption caused by conflict or explosion. Return action, category, rationale, and bbox_norm."},
    {"role": "assistant", "content": "{ strict JSON target_output }"}
  ],
  "source_dataset": "string",
  "provenance": "string"
}

Action Definitions

  • downlink_now — Clear macro-visible civilian disruption from conflict/explosion (buildings destroyed, widespread damage)
  • defer — Plausible but ambiguous conflict disruption (partial damage, smoke, low visibility, SAR/optical mismatch)
  • discard — No visible disruption, weak evidence, non-civilian target, or invalid pair

Split Policy

Event-held-out evaluation. No event family, city damage campaign, or near-duplicate tile appears in both train and eval.

  • Ukraine: Split by city/district (18 cities → train, 9 cities → eval)
  • Beirut explosion: Exclusively in eval
  • Bata explosion: Exclusively in train
  • Auxiliary natural disasters: Events split across train/eval, no overlap

Action Distribution

Split downlink_now defer discard
Train (conflict core) 558 (26.2%) 1,100 (51.7%) 469 (22.1%)
Eval (conflict core) 330 (28.8%) 539 (47.1%) 276 (24.1%)
Target 45% 20% 35%

Note: The target balance was not fully achievable because the BRIGHT dataset genuinely contains more moderate-damage tiles (defer) than catastrophic-damage tiles (downlink_now). Labels are honest and derived from pixel-level damage masks, not artificially balanced.

Event / Country Coverage

Country Cities/Areas Examples Split
Ukraine 27 cities (Kharkiv, Mariupol, Bucha, Irpin, Bakhmut, etc.) 2,844 Train+Eval
Lebanon Beirut Port 230 Eval only
Equatorial Guinea Bata 198 Train only
USA Various (wildfires, hurricanes) 60 Train+Eval (aux)

Data Sources

Source License Rows Status
BRIGHT (Kullervo/BRIGHT) CC-BY-NC-4.0 3,272 conflict Used
xBD (DIUx/xView2) CC-BY-NC-4.0 60 auxiliary Used
UNOSAT Gaza damage assessments Proprietary Excluded — no bulk download
UNOSAT Ukraine damage layers Proprietary Excluded — no bulk download
PRS/ETH Ukraine Zenodo CC-BY-4.0 Excluded — empty/missing files
Maxar Open Data CC-BY-NC-4.0 Excluded — raw imagery, no ML-ready labels

Full source audit in source_audit.md.

Known Limitations

  1. Ukraine-only armed conflict examples: The only public, redistributable conflict-damage satellite dataset with paired pre/post images and pixel labels is BRIGHT Ukraine. Gaza, Syria, Yemen, Sudan, Myanmar, Nagorno-Karabakh, Iraq, Libya, Iran, and Mexico cartel-conflict areas have no public ML-ready satellite damage benchmarks.

  2. Optical-to-SAR modality gap: ~54% of conflict examples use pre-event optical + post-event SAR. Radar speckle and geometry differences can create false-change signals.

  3. No per-building type labels: BRIGHT provides building damage masks but not building-type classification (hospital vs residential vs warehouse). All high-damage tiles are labeled conflict_building_damage.

  4. Template-generated rationales: Rationale text is auto-generated from damage thresholds, not human expert review.

  5. No exact dates: BRIGHT provides only event names, not acquisition dates.

  6. Non-commercial license: CC-BY-NC-4.0 restricts commercial use.

How to Use

from datasets import load_dataset

dataset = load_dataset("imagefolder", data_dir="satellite-disruption-triage-aux-v1-3")

Or load the JSONL directly for VLM training:

import json

train = [json.loads(l) for l in open("train_flat.jsonl")]

Citation

@dataset{satellite_disruption_triage_v1_3,
  title = {Satellite Disruption Triage Auxiliary Dataset v1.3},
  author = {ChrisRPL},
  year = {2026},
  url = {https://huggingface.co/datasets/ChrisRPL/satellite-disruption-triage-aux-v1-3}
}

Contact

For issues or contributions, open a discussion on the Hugging Face Hub.

Downloads last month
72