# 配送视频门头/交付时刻标注规范

## 目标

为 ONNX 帧分类模型提供训练标签：每条视频标注 **门头最佳时刻** 与 **交付最佳时刻**（各 1 个秒数）。

## 字段说明

| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| `media_id` | int | 是 | 对应 `collection_media.id` |
| `video_path` | string | 是 | 本地路径或 FTP HTTP URL |
| `storefront_time_sec` | float | 是 | 门头最佳帧时刻（秒） |
| `handover_time_sec` | float | 是 | 交付最佳帧时刻（秒） |
| `store_type` | string | 否 | 便利店/超市/餐饮/其他 |
| `has_voice_marker` | bool | 否 | 是否含「到店/交付」语音 |
| `recorder_sn` | string | 否 | 设备 SN |
| `driver_date` | string | 否 | 司机+日期，用于 train/val/test 分组 |
| `split` | string | 是 | `train` / `val` / `test` |
| `notes` | string | 否 | badcase 说明 |

## 类别边界

- **storefront（门头）**：店招、门牌、店铺入口为主体；人可入画但货品非主体
- **handover（交付）**：货品/包装在画面中心，可见递交、放置、签收动作
- **other（负样本）**：行车、仓库、店内走动、空镜等（训练时自动从非 ±5s 窗口采样）

## 标注操作

1. 播放整段 MP4，暂停在 **最清晰、构图最好** 的门头画面，记录当前秒数
2. 继续播放，在 **货品交接最清晰** 的一帧记录秒数
3. 约束：`handover_time_sec > storefront_time_sec`（通常相差数十秒以上）
4. 若某条视频无交付场景（仅到店），在 `notes` 标注「无交付」，该条暂不纳入训练

## 导出格式（JSONL）

每行一条 JSON：

```json
{"media_id": 123, "video_path": "http://host/collection_media/20250609/123.mp4", "storefront_time_sec": 742.5, "handover_time_sec": 1085.2, "store_type": "便利店", "has_voice_marker": true, "driver_date": "driver001_20250609", "split": "train"}
```

## 数据划分

- train / val / test = **70% / 15% / 15%**
- 按 `driver_date` 或 `recorder_sn + 日期` **分组划分**，避免同司机同天视频泄漏到测试集

## 规模建议

| 阶段 | 视频数 |
|------|--------|
| POC | 80~100 |
| 内测 | 300+ |
| 上线 | 1000+ |

## Label Studio

见 [`label_studio_config.xml`](../tools/label_studio_config.xml)。导入 `tools/export_media_list.py` 生成的 CSV 后，标注两个时间点并导出 JSON，再用 `tools/convert_labelstudio.py` 转为 JSONL。