一、说明

1.1. SEI 帧基础说明

在 H.264/AVC 和 H.265/HEVC 视频编码标准中，码流是由一系列 NALU（网络抽象层单元） 组成的。常见的 NALU 包括 I 帧（IDR）、P 帧、SPS、PPS 等。

SEI (Supplemental Enhancement Information) 也是一种 NALU。它的特殊之处在于：它不包含解码器重构图像所需的像素数据，而是用于携带附加信息。

H.264 中的 SEI：NALU Type 为 0x6。
H.265 中的 SEI：NALU Type 为 39 (Prefix SEI) 或 40 (Suffix SEI)。

核心机制：自定义未注册数据 (Payload Type 5)

H.264/265 官方定义了多种 SEI payload type（比如时间戳、色彩空间信息）。为了塞入我们的 AI 分析结果，我们需要使用 Payload Type = 5 (User data unregistered)。

一个用于携带 AI 结果的 SEI NALU 完整结构如下：

NAL Header: 标识这是一个 SEI 帧（H.264 是 0x06 或 0x66）。
Payload Type: 写入 0x05。
Payload Size: 动态计算。由于是以单字节累加，如果大小为 260 字节，则写成 0xFF 0x05 (255+5)，（长度最好不要超过1kB）。
UUID (16 Bytes): 你需要自己生成一个 16 字节的 UUID（例如 [0x11, 0x22, ..., 0x66]）。接收端在解析时，通过这个唯一的 UUID 来区分这是你们定义的 AI 数据，还是其他厂商/编码器自带的 SEI。
User Data (Payload): 实际的 AI 分析结果。可以是一段 UTF-8 编码的 JSON 字符串（如 {"bbox": [10, 20, 100, 200], "class": "person"}），或者是为了极致压缩性能而采用的 Protobuf 二进制流。
RBSP Trailing Bits: 尾部对齐字节（通常是 0x80）。

1.2. 端到端整体流转机制

1.2.1. 发送端（边缘计算盒子/AI 后端）处理流程

发送端的核心任务是“截获码流 -> 推理 -> 封装 SEI -> 重新打包”。如果您的后端网关或推流服务是用 Go 或 Node.js 编写的，处理逻辑大致如下：

步骤 A：拉流与解码。 将前端 IPC 的 RTSP/GB28181 视频流拉取下来，分离出原始的 H.264/265 NALU，并解码出 YUV/RGB 图像帧。
步骤 B：AI 推理。 将当前图像帧（记为 Frame N）送入 AI 模型，得出分析结果（例如：检测到了 3 个吸烟人员及其坐标）。
步骤 C：数据序列化。 将结果转换为 JSON 或自定义的二进制格式。
步骤 D：SEI 组装与转义（防竞争处理）。 * 按照上述的 UUID 和结构组装出 SEI 的字节数组。
步骤 E：插入码流。 将组装好的 SEI NALU，紧贴着塞入对应的 Frame N 视频 NALU 之前（通常放在 AUD 和 SPS/PPS 之后，Slice NALU 之前）。
步骤 F：推流出网。 将混入了 SEI 的裸流重新封装（如 FLV、TS、RTMP）并推向流媒体服务器。

1.2.2. 接收端（Web 前端/客户端）解析流程

接收端（如 Vue.js 开发的管理后台或定制的桌面播放器）需要具备抽离和解析 SEI 的能力。标准的 HTML5 <video> 标签原生是不支持抛出 SEI 数据的，需要借助特定的播放器或 WebAssembly。

步骤 A：解封装流。 客户端通过 WebSocket/HTTP-FLV 拉取到流，使用如 flv.js、hls.js 或基于 WebAssembly 的自定义解码器进行解复用（Demux）。
步骤 B：拦截 NALU 探测 SEI。 在将流喂给解码器（如 Media Source Extensions 或 Wasm FFmpeg）之前，先在数据流中扫描 NALU 起始码 00 00 00 01。
步骤 C：UUID 匹配与提取。 发现 Type 6 / 39 的 SEI NALU，检查其前 16 个字节的 UUID。如果是你们约定的 UUID，则将后面的 Payload 提取出来。
步骤 D：反转义（Unescape）。 将 Payload 中所有的 0x00 0x00 0x03 还原回 0x00 0x00。
步骤 E：渲染联动。 解析出 JSON 数据（如坐标信息）。由于 SEI 是和紧随其后的视频帧绑定在同一个时间戳（PTS）下发送的，你可以利用一个悬浮在 <video> 上方的透明 <canvas>，精准无延迟地在画面对应位置画出追踪框和告警红圈。

二、伪代码

2.1. 数据结构：

#pragma once
#include <time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <iostream>
#include <memory>
#include <vector>

typedef enum {
    TASK_DETECT = 0,
    TASK_TRACK,
    TASK_REID,
    TASK_CLASSIFY,
    TASK_SEGMENT,
} TaskType;

class TaskClassifyDto {
public:
    std::string cls;
    int class_id;
    int track_id;
    int x;
    int y;
    int w;
    int h;
    int score;
    uint32_t color;
    std::string alg_name;
};

class TaskCompositDto {
public:
    using Ptr = std::shared_ptr<TaskCompositDto>;
    static Ptr CreateShared()
    {
        return std::make_shared<TaskCompositDto>();
    }
    Ptr Copy()
    {
        return std::make_shared<TaskCompositDto>(*this);
    }

    void CopyFrom(const TaskCompositDto::Ptr other)
    {
        if (other.get() == nullptr) {
            return;
        }
        video_width = other->video_width;
        video_height = other->video_height;
        ts = other->ts;
        results.clear();
        results = other->results;
    }

    TaskCompositDto()
    {
        ts = 0;
        video_width = 0;
        video_height = 0;
        results.clear();
    }
    ~TaskCompositDto() { results.clear(); }
    
    TaskCompositDto(const TaskCompositDto &other)
    {
        video_width = other.video_width;
        video_height = other.video_height;
        ts = other.ts;
        results = other.results;
    }

    TaskCompositDto &operator=(const TaskCompositDto &other)
    {
        if (this != &other) {
            video_width = other.video_width;
            video_height = other.video_height;
            ts = other.ts;
            results = other.results;
        }
        return *this;
    }

    void addResult(const TaskClassifyDto &result)
    {
        results.push_back(result);
    }

public:
    int32_t video_width;
    int32_t video_height;
    uint64_t ts;
    std::vector<TaskClassifyDto> results;
};

2.2. 算法结果组sei帧：

#include "SeiFrame.hpp"

#include "AlgDto.hpp"
#include "CubeaiLogger.hpp"

/* 目前代码不够完善，没有做UUID生成，如果用户相机本身自带SEI帧会误报 */
size_t SEIFrame::AssemSeiFrame(const void *src, const size_t &len,
                               const int32_t &codec, std::vector<uint8_t> &dst)
{
    dst.clear();
    uint8_t *data = (uint8_t *)src;
    if (!src || len == 0) {
        LOGE("src is nullptr or len is 0");
        return 0;
    }
    if (codec != 0 && codec != 1) {
        LOGE(
            "codec {} is not supported, only 0 (H.264) and 1 (H.265) are "
            "supported",
            codec);
        return 0;
    }

    // 预留最大可能的sei头部+结尾
    // 起始码(4) + nal(1) + payloadType(1) + payloadSize(最多2) + sei数据(len) +
    // 结束(1)
    dst.clear();
    dst.reserve(4 + 1 + 1 + 2 + len + 1);

    // 起始码 0x00 00 00 01
    dst.push_back(0x00);
    dst.push_back(0x00);
    dst.push_back(0x00);
    dst.push_back(0x01);

    // NAL 单元类型
    dst.push_back(codec == 0 ? 0x06 : 0x50);  // H.264: 0x06, H.265: 0x50

    // payloadType 0x05: user_data_unregistered
    dst.push_back(0x05);

    // payloadSize 以0xFF为单位扩展
    size_t payload_len = len;
    while (payload_len >= 0xFF) {
        dst.push_back(0xFF);
        payload_len -= 0xFF;
    }
    dst.push_back(static_cast<uint8_t>(payload_len));

    // 拷贝SEI内容
    dst.insert(dst.end(), data, data + len);

    // 终止符
    dst.push_back(0x80);

    return dst.size();
}

size_t SEIFrame::AlgResult2Frame(const TaskCompositDto::Ptr &data,
                                 const int32_t &codec,
                                 std::vector<uint8_t> &dst)
{
    if (data.get() == nullptr) {
        LOGE("data is nullptr");
        return 0;
    }
    AlgReportDto report_dto;
    report_dto.ts = data->ts;
    for (auto &result : data->results) {
        AlgResultDto alg_dto;
        alg_dto.cls = result.cls;
        alg_dto.x = result.x;
        alg_dto.y = result.y;
        alg_dto.w = result.w;
        alg_dto.h = result.h;
        alg_dto.score = result.score;
        alg_dto.color = result.color;
        report_dto.addResult(alg_dto);
    }
    std::string alg_info = "";
    try {
        nlohmann::json j = report_dto;
        alg_info = j.dump();
    } catch (const nlohmann::json::type_error &e) {
        LOGE("JSON type error: {}", e.what());
        return 0;
    } catch (const std::exception &e) {
        LOGE("General exception: {}", e.what());
        return 0;
    }
    char *info_buffer = new char[alg_info.size() + 1];
    if (info_buffer == nullptr) {
        LOGE("new char failed");
        return 0;
    }
    size_t info_size = alg_info.length();
    memcpy(info_buffer, alg_info.c_str(), info_size);
    info_buffer[info_size] = '\0';
    size_t m_sei_size = AssemSeiFrame(info_buffer, info_size, codec, dst);
    delete[] info_buffer;
    return m_sei_size;
}

目录CONTENT

视频算法结果填充SEI帧说明