Datasets:

ChrisRPL
/

satellite-disruption-triage-aux-v1-3

Modalities:

Image

Formats:

imagefolder

Size:

< 1K

Libraries:

Datasets

Dataset card Data Studio Files Files and versions

xet

Community

Dataset Viewer

Auto-converted to Parquet Duplicate

Split (3)

train · 84 rows

Search is not available for this dataset

image imagewidth (px) 512 512	label class label 2 classes
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	0baseline
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current
	1current

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

satellite-disruption-triage-aux-v1-3

Civilian Conflict-Disruption Satellite VLM Dataset — Auxiliary / v1.3

This is an auxiliary dataset for training and evaluating Vision-Language Models (VLMs) to perform civilian conflict-disruption triage from paired satellite imagery. It is not a tactical intelligence dataset and not a canonical expert benchmark.

Scope & Purpose

The target task is detecting macro-scale civilian infrastructure disruption caused by war, armed conflict, bombardment, shelling, explosions, siege, or major unrest. The model compares a baseline (pre-event) satellite image with a current (post-event) image and outputs a structured triage decision.

Included civilian infrastructure:

Hospitals, schools, residential/civilian building clusters
Food/logistics warehouses, grain silos, markets, aid hubs
Ports, bridges, water facilities, power/desalination plants
IDP camps, public infrastructure

Explicitly excluded:

Military bases, weapons systems, troop positions, air defenses
Tactical route intelligence, target ranking, strike planning data

Dataset Version

Version: 1.3.0
Total examples: 3,332
Conflict-core examples: 3,272 (2,127 train / 1,145 eval)
Non-conflict auxiliary examples: 60 (42 train / 18 eval) — capped at <2% of total
Image resolution: 512×512 PNG
License: CC-BY-NC-4.0 (Maxar Open Data, non-commercial)
Build date: 2026-04-25

Schema

Flat JSONL (`train_flat.jsonl`, `eval_flat.jsonl`)

Each row:

{
  "example_id": "string",
  "baseline_image": "images/baseline/...png",
  "current_image": "images/current/...png",
  "target_output": {
    "action": "discard | defer | downlink_now",
    "category": "conflict_building_damage | conflict_hospital_damage | conflict_food_logistics_damage | conflict_water_infrastructure_damage | conflict_bridge_or_access_damage | conflict_port_or_silo_damage | conflict_urban_area_damage | explosion_damage | no_visible_disruption | ambiguous_or_low_visibility | other_conflict_civilian_disruption",
    "rationale": "short sentence",
    "bbox_norm": [x_min, y_min, x_max, y_max] | null
  },
  "source_dataset": "string",
  "source_event": "string",
  "source_image_name": "string",
  "provenance": "source URL or dataset reference",
  "modality": "optical-to-optical | optical-to-SAR | SAR-to-SAR | other",
  "location_name": "string",
  "country": "string",
  "conflict_context": "short string",
  "baseline_date": "YYYY-MM-DD or null",
  "current_date": "YYYY-MM-DD or null",
  "license": "string",
  "label_method": "mask-derived | metadata-derived | manual-review | vlm-assisted | weak-label",
  "damage_ratio": float | null,
  "destruction_ratio": float | null,
  "subset": "conflict_core | non_conflict_auxiliary"
}

SFT JSONL (`train_sft.jsonl`, `eval_sft.jsonl`)

Conversational format for VLM fine-tuning:

{
  "example_id": "string",
  "images": ["images/baseline/...png", "images/current/...png"],
  "messages": [
    {"role": "system", "content": "You are a civilian conflict-disruption satellite triage model. Return strict JSON only."},
    {"role": "user", "content": "Compare the baseline and current satellite images. Focus only on macro-scale civilian disruption caused by conflict or explosion. Return action, category, rationale, and bbox_norm."},
    {"role": "assistant", "content": "{ strict JSON target_output }"}
  ],
  "source_dataset": "string",
  "provenance": "string"
}

Action Definitions

downlink_now — Clear macro-visible civilian disruption from conflict/explosion (buildings destroyed, widespread damage)
defer — Plausible but ambiguous conflict disruption (partial damage, smoke, low visibility, SAR/optical mismatch)
discard — No visible disruption, weak evidence, non-civilian target, or invalid pair

Split Policy

Event-held-out evaluation. No event family, city damage campaign, or near-duplicate tile appears in both train and eval.

Ukraine: Split by city/district (18 cities → train, 9 cities → eval)
Beirut explosion: Exclusively in eval
Bata explosion: Exclusively in train
Auxiliary natural disasters: Events split across train/eval, no overlap

Action Distribution

Split	downlink_now	defer	discard
Train (conflict core)	558 (26.2%)	1,100 (51.7%)	469 (22.1%)
Eval (conflict core)	330 (28.8%)	539 (47.1%)	276 (24.1%)
Target	45%	20%	35%

Note: The target balance was not fully achievable because the BRIGHT dataset genuinely contains more moderate-damage tiles (defer) than catastrophic-damage tiles (downlink_now). Labels are honest and derived from pixel-level damage masks, not artificially balanced.

Event / Country Coverage

Country	Cities/Areas	Examples	Split
Ukraine	27 cities (Kharkiv, Mariupol, Bucha, Irpin, Bakhmut, etc.)	2,844	Train+Eval
Lebanon	Beirut Port	230	Eval only
Equatorial Guinea	Bata	198	Train only
USA	Various (wildfires, hurricanes)	60	Train+Eval (aux)

Data Sources

Source	License	Rows	Status
BRIGHT (Kullervo/BRIGHT)	CC-BY-NC-4.0	3,272 conflict	Used
xBD (DIUx/xView2)	CC-BY-NC-4.0	60 auxiliary	Used
UNOSAT Gaza damage assessments	Proprietary	—	Excluded — no bulk download
UNOSAT Ukraine damage layers	Proprietary	—	Excluded — no bulk download
PRS/ETH Ukraine Zenodo	CC-BY-4.0	—	Excluded — empty/missing files
Maxar Open Data	CC-BY-NC-4.0	—	Excluded — raw imagery, no ML-ready labels

Full source audit in source_audit.md.

Known Limitations

Ukraine-only armed conflict examples: The only public, redistributable conflict-damage satellite dataset with paired pre/post images and pixel labels is BRIGHT Ukraine. Gaza, Syria, Yemen, Sudan, Myanmar, Nagorno-Karabakh, Iraq, Libya, Iran, and Mexico cartel-conflict areas have no public ML-ready satellite damage benchmarks.
Optical-to-SAR modality gap: ~54% of conflict examples use pre-event optical + post-event SAR. Radar speckle and geometry differences can create false-change signals.
No per-building type labels: BRIGHT provides building damage masks but not building-type classification (hospital vs residential vs warehouse). All high-damage tiles are labeled conflict_building_damage.
Template-generated rationales: Rationale text is auto-generated from damage thresholds, not human expert review.
No exact dates: BRIGHT provides only event names, not acquisition dates.
Non-commercial license: CC-BY-NC-4.0 restricts commercial use.

How to Use

from datasets import load_dataset

dataset = load_dataset("imagefolder", data_dir="satellite-disruption-triage-aux-v1-3")

Or load the JSONL directly for VLM training:

import json

train = [json.loads(l) for l in open("train_flat.jsonl")]

Citation

@dataset{satellite_disruption_triage_v1_3,
  title = {Satellite Disruption Triage Auxiliary Dataset v1.3},
  author = {ChrisRPL},
  year = {2026},
  url = {https://huggingface.co/datasets/ChrisRPL/satellite-disruption-triage-aux-v1-3}
}