Core technologies for streaming workflows: a 2026 architectural reassessment // Part 1

Five years have passed since our previous attempt to outline the core technologies shaping streaming workflows. That interval allows for a clearer assessment of what stabilized, what proved resilient under scale, and what quietly became infrastructural. The Mile High Video 2026 conference did not introduce radically new primitives, but it highlighted both the directions gaining momentum and the frictions that continue to constrain deployment at scale.

Several projections from 2021 materialized. CMAF consolidated as the common media container, and segment-based HTTP delivery remains the dominant abstraction. Adaptive bitrate streaming continues to structure packaging and playback. Yet the structural constraints persist: heterogeneous device fleets, encryption fragmentation, long hardware cycles, and CDN economics that ultimately govern what is operationally viable.

Most progress since 2021 came not from replacing the stack, but from consolidating it. Packaging semantics tightened, control planes became explicit, observability evolved into structured telemetry, and monetization and trust mechanisms integrated more directly into delivery workflows. The HTTP streaming model did not fracture; it hardened. At the same time, emerging requirements – finer-grained control, lower latency, bidirectional interaction, and tighter coupling between edge and client – are exposing structural tensions that incremental refinement alone may not fully resolve.

This reassessment is organized into three sections. The first examines how HTTP streaming consolidated across packaging, compression, transport, control, monetization, and trust into a coherent architectural model. The second looks at Media over QUIC as a session-native architecture that rethinks delivery semantics without discarding existing media artifacts. The final section asks whether streaming as a whole is approaching a deeper architectural transition – one that extends beyond transport upgrades toward a broader redefinition of workflow boundaries.

HTTP Streaming Architecture Consolidation

Five years ago, HTTP-based streaming appeared fragmented but converging. In 2026, it is largely consolidated. The core abstraction – client-driven retrieval of addressable media objects over HTTP – remains intact. What evolved is the degree of formalization and operational refinement required to sustain that abstraction under increasing latency, cost, and control-plane pressures. Let’s take a closer look at where the HTTP streaming stack has remained stable – and where it has evolved – starting with its packaging foundation: CMAF.

CMAF deployment maturity and remaining device constraints

CMAF is now the unquestioned container baseline for OTT delivery. HLS and DASH routinely share identical fragmented ISO-BMFF segments in production workflows. Encoding pipelines assume CMAF as primary output. In deployments where HLS and DASH reference the same CMAF segments, the CDN caches a single byte-identical media object rather than protocol-specific variants. As the media fragments are shared, it improves cache hit ratios, rationalizes packaging, reduces duplicate storage and midgress traffic.

The recognition of CMAF with a Technical Emmy Award in 2025 symbolically marked the end of the container debate. The industry no longer treats CMAF as a convergence experiment. It is the norm.

Recipients of the 2025 Technical Emmy Award for CMAF - a major milestone for the streaming industry
Recipients of the 2025 Technical Emmy Award for CMAF – a major milestone for the streaming industry

However, convergence remains bounded by device fleet reality. While CBCS is fully supported across FairPlay, Widevine and PlayReady DRM ecosystems, unconditional CBCS-only packaging is not universally deployable across heterogeneous device populations without tedious validation (see Amy Prosser’s Demuxed 2022 presentation). Long-tail Android devices and legacy connected TV firmware continue to impose compatibility checks that large distributors cannot ignore. This is not a specification gap; it is a hardware lifecycle constraint. 

The result is substantial – but not absolute – rationalization. CMAF eliminated parallel container ecosystems. It did not eliminate encryption compatibility validation across diverse silicon generations.

Subtitle format alignment and device rendering gaps

IMSC 1.1 has effectively become the canonical interchange format across most premium OTT production pipelines. Subtitle mezzanine assets are commonly authored in constrained IMSC text profiles, with conversion to WebVTT typically handled during packaging for HLS workflows. Profiling efforts such as EBU-TT-D and vendor-constrained IMSC profiles remain necessary, as the full IMSC surface is too broad to assume uniform client behavior.

Convergence occurred at the authoring and interchange layer. It did not fully materialize at runtime. Rendering engines across browsers, Smart TV firmware, and mobile OS media frameworks implement overlapping but not identical subsets of TTML styling semantics. Region positioning, coordinate interpretation, line wrapping, background rendering, font fallback, and timing behavior continue to vary across platforms. In low-latency live workflows, subtitle activation windows aligned to CMAF fragment boundaries may behave differently depending on player buffering and clock discipline.

In practice, interoperability relies on constrained styling and empirical device-class validation rather than specification completeness. Subtitle workflows are more aligned than in 2021, but rendering remains platform-dependent.

DASH 5th and 6th Editions: clarifying and structuring the representation model

The changes introduced in the 5th (2022) and 6th (scheduled for ISO publication in 2026) Editions of MPEG-DASH are evolutionary rather than disruptive. They do not alter the segment-based delivery model, but they make its behavior more explicit and less dependent on implementation assumptions.

The 5th Edition reinforced CMAF alignment through dedicated DASH profiles and clarified how clients recover alignment within segments through the concept of resynchronization. MPD patching was introduced to support incremental manifest updates instead of full regeneration. The client processing model for Event Streams and timed metadata tracks was tightened, and content protection signaling was extended. None of these changes redefined DASH, but they reduced ambiguity in how players interpret and react to the MPD (Media Presentation Description). At the time, MPD patching appeared to offer a path toward efficient dynamic timeline manipulation, particularly for live and advertising use cases. In practice, large-scale monetization flows evolved toward explicit alternative presentation models rather than incremental patch mutation (see the advertising section below). That shift reflects a broader lesson: continuously mutating a single manifest becomes fragile under scale; defining controlled transitions between presentation contexts is more stable.

The 6th Edition continues in that direction. Duration Patterns allow repetitive segment cadence to be described algorithmically instead of enumerated explicitly. This reduces MPD verbosity and origin load, especially in long-running live workflows, and shifts repetition from server-side description to client-side computation.

MPEG-DASH 6th Edition Duration Patterns syntax
MPEG-DASH 6th Edition Duration Patterns syntax

The segment abstraction remains unchanged, but its expression becomes more compact and predictable. Alternative Media Presentations introduce explicit insertion and replacement semantics through independently defined alternative MPDs. Timeline transitions are scheduled rather than inferred, and prefetch timing rules reduce last-minute manifest rewrites under concurrency. The result is not a new delivery model, but a more disciplined one. Representation cadence and transition logic are defined rather than implied.

HLS evolution: expanding control without changing the model

The evolution of HLS across recent rfc8216bis drafts mirrors DASH: the core HTTP playlist and segment retrieval model remains unchanged, but the control surface around it has expanded.

Low-latency constructs – partial segments, hold-back parameters, and refined reload rules – have become more clearly defined, reducing reliance on client heuristics near the live edge. The system is still playlist-driven, but timing behavior is more bounded and predictable. Multivariant playlists are also more expressive. Additional rendition attributes and supplemental codec signaling allow origins to describe variants with greater precision, improving adaptation decisions without altering the underlying ABR model.

More structurally, HLS now exposes clearer control semantics. Delivery directives support delta playlist updates, reducing redundant reloads under concurrency. Content Steering introduces a separate pathway selection layer without redefining the playlist abstraction. Interstitial signaling formalizes insertion points rather than relying on ad hoc playlist manipulation.

HLS Interstitial syntax
HLS Interstitial syntax

The result is incremental tightening rather than architectural change. HLS remains a pull-based, segment-oriented system, but its manifests increasingly act as structured coordination surfaces rather than simple media indexes.

SSAI to SGAI: flexibility, cacheability and converging control semantics

Server-Side Ad Insertion (SSAI) remains dominant because it is operationally robust and device-friendly. By stitching advertisements server-side, services normalize playback behavior across heterogeneous clients and keep player logic simple. This remains attractive in fragmented device ecosystems. The tradeoff is structural. Fine-grained targeting and per-viewer personalization require individualized manifest generation, reducing CDN cache efficiency and concentrating computation in the manifest manipulation tier. As concurrency increases, per-session processing and cache-unfriendly manifests become the primary scaling bottleneck.

Server-Guided Ad Insertion (SGAI) is the industry’s response to these limits. The objective is to retain SSAI’s robustness while eliminating its worst scaling pathologies. The key shift is to standardize client behavior at insertion points. Instead of bespoke CSAI logic, the manifest carries a concrete trigger with a defined response contract. The player performs a standardized request to obtain an alternative presentation, while ad decisioning and ADS translation remain within a server-side SGAI endpoint. This separates proprietary business logic from interoperable switching mechanics and reduces bespoke client maintenance.

In MPEG-DASH 6th Edition, this approach is formalized through Alternative MPD events, particularly “replacement” semantics for live ad replacement, return-to-network transitions, and blackout switching. A defined processing model governs event updates and controlled return behavior, including immediate return through constrained duration parameters.

MPEG-DASH 6th Edition Alternative MPD Events insertion and replacement
MPEG-DASH 6th Edition Alternative MPD Events insertion and replacement

Prefetch windows allow ad decisions to be resolved ahead of splice points, reducing last-second decisioning spikes under high concurrency. List MPDs enable ad pods to be represented as linked periods rather than fully re-molding the primary manifest, improving scalability and simplifying beacon insertion. Annex I introduces stateless targeting primitives – such as session identifier propagation and exposure of selected client state variables – so endpoints can perform effective targeting without maintaining heavy per-session state. Alternative MPD event support is already implemented in dash.js and Shaka Player, demonstrating practical deployability rather than theoretical alignment.

HLS evolved in parallel. Interstitial signaling and EXT-X-DATERANGE constructs introduce structured insertion points within playlists, enabling controlled transitions without full playlist re-authoring. Combined with delivery directives and steering constructs, HLS exposes explicit switching semantics instead of relying exclusively on server-side stitching. The DASH Alternative MPD model was deliberately shaped to remain conceptually close to this emerging HLS SGAI logic, which explains why its formalization appears in the 6th Edition rather than earlier revisions.

While the mechanisms differ syntactically, the architectural direction converges. In both DASH and HLS, insertion becomes declarative, timeline transitions are scheduled rather than inferred, and manifest mutation gives way to structured orchestration. Specification maturity is no longer the limiting factor; ecosystem normalization and cross-vendor operational rollout now define the pace of transition.

Low-latency streaming under HTTP constraints

Low-latency profiles matured between 2021 and 2026, trying to overcome the HTTP ABR model latency limits. LL-HLS introduced blocking playlist reload semantics and PRELOAD-HINT mechanisms that transform periodic polling into longer-lived request–response exchanges. This reduces glass-to-glass latency under controlled conditions but increases sensitivity to intermediary behavior. CDN timeout configuration, request collapsing logic, and partial object caching policies become materially more critical. Some large-scale LL-HLS deployments happened (like Canal+ in 2022), yet they remain more selective than early enthusiasm suggested, precisely because of the unnatural constraints overloading HTTP mechanisms.

LL-DASH initially relied on HTTP/1.1 chunked transfer encoding to expose partial CMAF segments progressively from the origin. BBC’s published results show that sub-6s DASH target latency (≈9s end-to-end) can be delivered at scale with QoE statistically comparable to conventional live streaming; at Mile High Video, the authors further emphasized that this parity was achieved only on qualified device cohorts with catch-up capability, and that startup behavior and device-specific execution variance remain the dominant residual constraints. Those results demonstrate that the chunked-transfer model is viable under controlled conditions, but they also make visible the degree to which low latency in this formulation depends on transport-layer mechanics rather than representation semantics. Because chunked transfer is specific to HTTP/1.1, CDNs terminating origin connections must buffer and reframe those partial responses into HTTP/2 DATA frames or QUIC streams when serving downstream clients. This translation layer introduces intermediary state and flow-control behavior that was not originally part of the DASH abstraction, and it complicates cache and timeout handling at scale.

The Low-Latency Low-Delay DASH (L3D) model introduced in the DASH 6th Edition resolves this elegantly by relocating low-latency semantics from HTTP transfer behavior into explicit representation addressing. 

MPEG-DASH 6th Edition L3D Manifest Signaling
MPEG-DASH 6th Edition L3D Manifest Signaling (Fraunhofer FOKUS)

SubNumber addressing and Segment Sequence Representations make sub-segments explicitly addressable, allowing clients to derive deterministic URLs rather than depending on transport-layer chunking behavior. Low latency becomes a representation-layer construct rather than a side effect of HTTP/1.1 transfer semantics. L3D restores transport neutrality across HTTP/1.1, HTTP/2, and HTTP/3 while preserving HTTP cacheability and intermediary compatibility. Because the same CMAF fragments can be shared with LL-HLS workflows, media-layer duplication is reduced and encoding pipelines remain unified. Compared to earlier LL-DASH approaches, L3D removes chunk-translation penalties at the CDN layer while preserving the segment-oriented abstraction.

L3D comes with more benefits than a cleaner low latency operational model: because sub-segments are independently addressable with L3D, clients can bootstrap from a dedicated Representation and begin decoding at the target presentation time without waiting for full segment completion, enabling low-delay startup within the conventional segment model. Adaptation decisions can occur at partial-segment boundaries rather than only at segment edges, reducing switching inertia under bandwidth fluctuations. Deterministic templating limits manifest requests hammering, since sub-segment URLs are derived rather than discovered through repeated reload cycles. By eliminating reliance on HTTP/1.1 chunked transfer, L3D exposes explicit object boundaries and transfer sizes, which simplifies client-side throughput estimation and ABR heuristics. Trick-play operations benefit from the same determinism: IDR-aligned sub-segments can be selectively retrieved without reconstructing full segment sequences.

On the player side, L3D implementations are available in Shaka Player and dash.js v5.1, and on the encoding/packaging side of things, Ateme has been leading the R&D efforts. It remains an open question whether L3D will see wider implementation than previous low-latency profiles. What is clearer is that segment-based refinement continues to extract incremental performance gains from the existing HTTP model, extending its practical runway while Media over QUIC (MoQ) develops as a longer-horizon architectural alternative.

AVC and HEVC: installed base inertia

AVC remains the compatibility anchor for global OTT distribution. Its persistence is driven by deterministic decode support across long-tail Smart TVs, operator-managed set-top boxes, embedded silicon in consumer electronics, and older mobile hardware. Even modest percentages of unsupported devices translate into meaningful viewing hours at scale. Industry analyses through 2025 continue to show AVC as the single largest deployed codec footprint, with H.264/AVC capturing around 44 % of the video encoder market share for 2025, reflecting near-universal deployment in production encoding workflows. For services operating across heterogeneous global fleets, removing AVC would introduce playback risk that outweighs incremental bitrate savings.

The size of this installed decoder base has also justified continued investment by encoder vendors in AVC optimization rather than abandonment. Technology providers such as Harmonic have continuously worked on content-aware encoding techniques to extract additional bitrate savings from AVC without requiring new decoder capability. These approaches leverage per-title and scene-adaptive rate control, complexity analysis, and dynamic ladder shaping to reduce delivered bitrate while preserving perceptual quality. For operators, this has two important consequences. First, it narrows the efficiency gap between AVC and newer codecs in many real-world assets, particularly in HD workflows. It also extends the economic viability of AVC by reducing delivered bitrate on the codec that already reaches the full device base. Because no new decoder capability is required, these savings can be realized without introducing compatibility risk or maintaining additional codec ladders. At scale, incremental percentage reductions in average bitrate on AVC translate directly into lower CDN delivery cost while preserving universal reach.

HEVC adoption expanded in UHD, HDR, and bandwidth-sensitive deployments where bitrate reductions materially affect delivery cost or enable higher perceptual quality at constrained throughput. Industry usage surveys from 2025 continue to show substantial HEVC deployment alongside AVC, with professional workflows regularly reporting near-50 %+ usage of HEVC in practice. Compression gains of 30–50 % relative to AVC at comparable perceptual quality underpin its economic case in high-resolution delivery. While HDR signaling is technically possible with AVC, real-world decoder support for consistent 10-bit pipelines and tone-mapping remains heterogeneous across legacy devices. In contrast, HDR10 and HLG are broadly implemented and validated in HEVC decoder pipelines across modern Smart TVs, streaming devices, and operator set-top boxes. Dynamic HDR formats such as Dolby Vision and HDR10+ have also achieved critical penetration within the HEVC-enabled UHD device population, enabling predictable handling of dynamic metadata in premium tiers. As a result, reliable large-scale HDR delivery in OTT workflows is effectively anchored in the HEVC decoder base. 

In practice, HEVC functions as both an efficiency and capability layer above AVC rather than a universal replacement. Many services operate dual ladders: AVC ensures baseline SDR reach, while HEVC enables UHD and HDR tiers and reduces bitrate for capable devices. The decision remains economic – balancing encoding compute, storage overhead, monitoring complexity, and royalty exposure against CDN savings and the ability to deliver consistent premium HDR experiences.

AV1 in production workflows

AV1 crossed an operational threshold once hardware decode support became widespread across mainstream silicon generations. Most recent flagship Android devices support AV1 hardware decoding, as do Intel 11th Gen and newer CPUs, AMD RDNA2+ GPUs, NVIDIA RTX 30-series and later GPUs, and the majority of 2022+ Smart TV platforms. Streaming dongles and current-generation consumer devices increasingly ship with AV1 decode capability as a baseline. In the Apple ecosystem, hardware support began with A17 Pro and M3-class silicon. While this limits immediate reach to newer cohorts, it establishes a clear forward trajectory within that platform.

Alliance For Open Media AV1 Ecosystem
Alliance For Open Media AV1 Ecosystem (AOM)

Measured across the global active device base, hardware penetration remains meaningful but not yet dominant. Independent usage analysis from ScientiaMobile indicate that approximately 9–10% of devices in active use had hardware AV1 decode capability as of mid-2024. Penetration is substantially higher within recent smartphone and TV cohorts, but materially lower across legacy fleets. At the same time, browser and operating system support for AV1 playback – including software decode paths – exceeds 80% in many environments. Platform-level metrics further illustrate the shift: Netflix has reported that AV1 accounts for roughly 30% of its streaming hours, reflecting preferential targeting of capable devices and the growing share of newer hardware in active viewing time. Together, these figures describe an ecosystem where hardware acceleration remains concentrated in newer devices, while software compatibility is already broad. Prior to hardware maturity, the dav1d software decoder played a critical bridging role, particularly in browser environments. It enabled viable playback at acceptable performance levels while silicon support propagated through refresh cycles, reducing early deployment risk and allowing AV1 assets to be distributed ahead of full hardware saturation.

HDR support in AV1 has moved from experimental to production-capable, though ecosystem predictability varies by device cohort. AV1 natively supports 10- and 12-bit profiles and carries static HDR metadata through standard ISOBMFF signaling, including mastering display metadata describing the color primaries and luminance characteristics of the mastering display (SMPTE ST 2086) and content light level metadata indicating the maximum pixel brightness and maximum frame-average brightness within the program (MaxCLL and MaxFALL). HDR10 over AV1 is now broadly functional on recent Android TV platforms, modern Smart TVs with AV1 hardware decode, and PC hardware with AV1 acceleration, with large-scale validation in production environments. HLG is also supported at the bitstream and container level, though device validation remains less uniform than for HDR10. Dynamic HDR formats such as HDR10+ are standardized for AV1, but ecosystem penetration remains narrower than for base HDR10 and less mature than in HEVC deployments. However, early production deployments are beginning to appear. In 2025, Netflix introduced AV1-encoded HDR10+ streams and has been progressively expanding HDR10+ coverage within its AV1 catalog, providing clear evidence of production-scale deployment. Dolby Vision profiles for AV1 exist in specification and early implementations, but practical deployment remains smaller than Dolby Vision over HEVC, with many Dolby Vision streams still carried as HEVC in commercial services.

AV2: efficiency, scalability, and hardware-conscious design

AV2 represents the next major evolution in royalty-free distribution coding under the auspices of the Alliance for Open Media. After roughly five years of tool exploration and convergence, the project is targeting a 2026 release window. Reference results presented at Mile High Video indicate on the order of 30% objective bitrate reduction versus AV1 in Random Access configurations under common test conditions. These gains are not positioned as laboratory-only deltas; tool selection has been continuously gated against hardware decoder feasibility. The objective is explicit: AV2 is designed as a deployable distribution codec, not a research sandbox.

AV2 Decoder Architecture
AV2 Decoder Architecture (Meta)

Architecturally, AV2 extends structural flexibility beyond AV1. Extended Recursive Partitions allow superblocks up to 256×256, increasing coding efficiency for high-resolution content and improving large-area prediction decisions. Transform partitioning has been revised to better align residual modeling with local signal characteristics, while refinements in inter prediction and motion modeling improve robustness in high-motion and high-detail sequences. Ranked reference management reduces signaling overhead by prioritizing and structuring reference picture usage more efficiently, tightening entropy coding efficiency in complex scenes.

Scalability is treated as a first-class design principle rather than an afterthought. AV2 supports structured scalable operating points within a single coded bitstream, with explicit dependency graphs describing spatial, temporal, and quality-layer relationships. This enables coherent layer-based adaptation instead of maintaining entirely independent ladder rungs. In practical distribution terms, this shifts ABR logic from selecting discrete encodes to managing dependency-aware layer sets, potentially reducing storage duplication and improving switching consistency.

Taken together – ~30% bitrate reduction versus AV1 in Random Access, Extended Recursive Partitions up to 256×256, revised transform partitioning, ranked reference management, explicit scalable dependency graphs, and continuous hardware feasibility gating – AV2 positions itself as a distribution-grade codec engineered for deployment at scale rather than experimentation.

LCEVC: enhancement overlay and the broadcast convergence path

MPEG-5 Part 2 LCEVC (Low Complexity Enhancement Video Coding) follows an efficiency-overlay model rather than defining a standalone codec. A conventional base codec – AVC, HEVC, AV1, or VVC – remains independently decodable, typically in hardware. A synchronized enhancement layer refines reconstruction quality where supported. The structure is explicit: hardware base decode with lightweight software refinement.

Backward compatibility is intrinsic. Devices without LCEVC support decode the base stream normally, avoiding ecosystem fragmentation. Devices with support apply the enhancement layer for higher perceptual quality. In ABR workflows, adaptation can involve switching the base layer, enabling or disabling the enhancement layer, or coordinating both. The representation model is extended, not replaced.

The hybrid design introduces a defined CPU overhead tradeoff because enhancement processing runs in software. In exchange, operators report substantial bitrate reductions at equivalent perceptual quality. Trials and deployments have demonstrated reductions on the order of 40% while maintaining or improving quality – for example enhancing a 1080p base to near-4K perceptual output. LCEVC encoding is supported on Nvidia GPUs with NVENC since 2024. At workflow level, vendors and operators have cited potential cost savings approaching 70% when accounting for encoding, storage, and CDN delivery efficiencies.

LCEVC Encoder in Nvidia NVENC
LCEVC Encoder in Nvidia NVENC (Nvidia)

Grupo Globo has provided high-visibility operational validation. During the 2022 FIFA World Cup, it executed what was described as the first broadcast of an LCEVC-enhanced channel. This was followed by large-scale live trials during the 2023 Brazilian Carnival, delivering enhanced streams to millions of viewers and confirming integration within existing encoding and decoding workflows. As part of Brazil’s TV 3.0 development, Globo also showcased LCEVC at the Paris 2024 Olympic Games, including LCEVC-enhanced VVC delivering 4K UHD at approximately 10 Mbps.

LCEVC has also been demonstrated in live 4K workflows under ATSC 3.0 and is formally integrated into Brazil’s TV 3.0 framework. The silicon ecosystem reinforces this broadcast alignment through partnerships with Amlogic, Realtek, and MediaTek, enabling television and set-top box integration. In the context of Brazil TV 3.0, LCEVC functions as a pragmatic bridge – an efficiency overlay that enables UHD evolution and bitrate reduction without forcing immediate hardware replacement cycles.

Brazil TV 3.0: governance-driven codec transition

The Globo trials and ATSC 3.0 demonstrations illustrate operator-led validation. Brazil TV 3.0 (also known as DTV+) represents the next step: a governance-driven national deployment model. Derived from ATSC 3.0 principles, TV 3.0 formalizes the VVC + LCEVC stack within a coordinated broadcast–broadband architecture. The specification was finalized in July 2024 and formally confirmed by presidential decree on 27 August 2025. Commercial rollout aligns with national broadcast milestones, including the 2026 FIFA World Cup.

Brazil TV 3.0 Architecture
Brazil TV 3.0 Architecture (Fórum SBTVD)

TV 3.0 selected VVC as the primary compression layer in combination with MPEG-5 LCEVC, developed by V-Nova, as an enhancement layer. The technical rationale is quantifiable. Testing by Fórum SBTVD demonstrated that native 2160p60 HDR encoded with VVC required up to 16.58 Mbps in worst-case scenarios. In a demonstration combining a 1080p HDR VVC base layer with an LCEVC enhancement layer, a 2160p60 HDR signal was delivered at 7.6 Mbps — low enough to fit within the bandwidth budget broadcasters often allocate to a single UHD service in an ATSC 3.0 multiplex. This represents a 54% saving versus standalone 2160p VVC in worst-case conditions and roughly 41% average savings across the tested content set. In effect, UHD HDR becomes feasible at bitrates comparable to 1080p HDR, enabling UHD deployment within the finite RF capacity of a 6 MHz ATSC 3.0 broadcast channel.

The distinguishing factor is governance: TV 3.0 operates under a coordinated receiver certification model in which newly certified television sets must implement the mandated VVC + LCEVC decoder stack for over-the-air (OTA) reception. This enables synchronized codec introduction across the national device base while preserving spectral efficiency within ATSC 3.0 constraints. In the Brazil OTA broadcast context, LCEVC licensing is structured at the device level, with obligations carried by TV and chipset vendors rather than by broadcasters, aligning licensing cost with certified receiver deployment. Because DTV+ integrates broadband delivery alongside broadcast – with OTT distribution delivered via HLS – the same codec stack applies across both transmission modes, though licensing responsibilities may differ outside the OTA path. Under this model, next-generation compression moves from laboratory validation to commercial deployment on a nationally coordinated timeline. This governance-driven approach may be of interest to neighboring countries evaluating future terrestrial broadcast transitions.

VC-6: representation optimized for AI-native workflows

AI-centric video pipelines expose a structural inefficiency in conventional codecs. Most streaming architectures decode full-resolution frames before downscaling for inference, causing decode cycles, memory transfers, and I/O to dominate total pipeline cost. Work presented at Mile High Video showed that this architectural mismatch can make data movement and decode more expensive than inference itself.

SMPTE’s VC-6 codec (ST 2117-1) addresses this inefficiency at the representation level. VC-6 is a hierarchical intra-frame codec built on progressive refinement rather than motion-compensated inter prediction. Each frame is encoded into embedded Hierarchical Levels of Quality (LoQ), enabling spatially selective and resolution-selective access directly from the compressed domain. A decoder reconstructs only the LoQ required by a model or incrementally refines specific regions of interest. The hierarchy is intrinsic to the bitstream, not assembled from parallel encodes. This structure aligns directly with inference workflows. A lower LoQ can be decoded for detection; only regions corresponding to relevant objects are refined. Irrelevant areas never incur full reconstruction cost. Benchmarks presented at MHV reported pipeline time reductions of up to 72% versus AVC intra (with frame sampling) and up to 96% versus AVC inter, alongside more than 30× I/O savings in selective-access scenarios. Detector outputs remained consistent at the target inference resolution, indicating that gains stemmed from avoided decode and reduced data movement rather than model modification.

The implications extend into token-metered AI systems. Vision-language and embedding models convert images into patches whose quantity scales with spatial area and resolution. Selective decode reduces the number of high-resolution patches generated per frame, lowering token counts and downstream inference cost. VC-6 therefore affects not only decode and bandwidth efficiency, but also model expenditure.

V-Nova VC-6 AI Blueprint
VC-6 AI Blueprint (V-Nova)

VC-6 has been commercialized by V-Nova, with SDK support and integration across professional and streaming workflows. CUDA-accelerated implementations developed in collaboration with NVIDIA demonstrate that LoQ-based decode maps efficiently onto GPU architectures, reducing CPU utilization and memory bandwidth while increasing throughput. The VC-6 AI Blueprint further illustrates multi-inference pipelines in which LoQ selection feeds parallel models operating at different resolutions, eliminating redundant decode and resampling stages.

In AI-native streaming architectures, encoding priorities shift. Streams are structured around how models consume visual data, not solely around perceptual viewing. Hierarchical LoQ and deterministic partial access become operational levers for reducing compute, I/O, and token spend. The objective moves from reconstructing complete frames to delivering only the visual information required by the model.

Licensing tension: compression under legal uncertainty

Codec strategy in recent years has been shaped as much by licensing architecture as by compression efficiency. Since 2021, patent litigation and patent pool activity have increased in visibility, elevating legal exposure from a background consideration to a primary planning variable in distribution strategy.

HEVC continues to operate under a fragmented patent pool structure, with multiple licensing bodies and evolving commercial terms. For large-scale distributors, royalty exposure modeling – including device-based, subscriber-based, capped versus uncapped, and territory-specific scenarios – remains integral to deployment planning. The administrative and financial complexity of HEVC licensing has influenced rollout scope, particularly for services with global device reach and heterogeneous hardware penetration.

AV1 entered the market under a royalty-free positioning through the Alliance for Open Media Patent License 1.0, granting a perpetual, worldwide, no-charge license for conformant implementations under defined conditions. Subsequent patent pool formation and third-party assertions – including the establishment of an AV1 pool by Sisvel – have introduced a degree of long-term uncertainty. Even where material royalty obligations have not crystallized, the presence of claims requires structured legal review and risk modeling, particularly for organizations committing to multi-year encoding and infrastructure investments.

AV2 is being developed under the same Alliance for Open Media licensing framework, inheriting the Patent License structure that positioned AV1 as royalty-free for conformant implementations. At present, no independent AV2 patent pool has formed (yet). However, the AV1 experience demonstrates that third-party assertions and pool activity can emerge post-standardization. For organizations making multi-year infrastructure commitments, licensing evaluation will therefore remain dynamic until AV2 reaches material commercial scale and the surrounding IP landscape stabilizes.

LCEVC and VC-6 present a structurally different licensing profile. Rather than operating through multiple patent pools, LCEVC licensing is administered by a single IP holder under defined distribution and device programs. Published frameworks distinguish between service-provider licensing (e.g., active-user-based models) and consumer-device licensing (per-unit royalties, including broadcast-specific provisions). This centralization reduces fragmentation risk and improves predictability relative to multi-pool regimes, although commercial negotiation remains bilateral. VC-6 follows a similarly centralized model under V-Nova stewardship, without competing pool structures.

For large streaming platforms, codec selection is therefore not purely technical. It reflects a multi-dimensional evaluation of device penetration, bitrate savings, encoding compute cost, storage impact, and licensing predictability. Compression gains are weighed against long-term legal clarity and operational stability. In practice, licensing architecture can accelerate – or constrain – adoption as materially as bitrate efficiency itself.

CMCD v2: from request annotation to structured telemetry

The evolution of CMCD v2 represents a structural shift. CMCD v1 enriched media GET requests with QoE hints for CDN optimization and log-based analytics. It achieved broad emission, including on Apple platforms, but was more often recorded than operationalized. In practice, v1 signals were rarely integrated into real-time routing or load balancing. The limitation was architectural. CMCD v1 beaconing was implicit within media requests, requiring implementers to extract data from high-volume CDN logs – assuming it was logged at all – and reconstruct session context offline. There was no structured reporting surface decoupled from object delivery. Telemetry existed, but it remained embedded inside infrastructure logs rather than exposed as a first-class control-plane signal. CMCD v2 redesigns the signaling model. The introduction of Event Mode alongside Request Mode is the key change. Request Mode continues to attach metadata to media requests, while Event Mode allows the player to POST telemetry independently of object retrieval. Earlier draft concepts such as separate response reporting were consolidated into event semantics, recognizing that responses are simply events within the playback lifecycle. This decoupling shifts CMCD from a header extension into an HTTP-native telemetry channel.

CMCD Reporting Modes Evolution
CMCD Reporting Modes Evolution (Akamai)

Event Mode supports batch reporting – multiple line-feed–delimited reports within a single POST – and multiple reporting destinations. This reduces overhead and makes session-level state emission feasible at scale. CMCD v2 introduces an explicit event taxonomy and playback state model (starting, playing, seeking, rebuffering, paused, ended, fatal error, quit), with mandatory fields such as event type, epoch timestamp, and monotonic sequence number providing ordering guarantees and loss detection. Metric coverage expands to include dropped frames, live latency, startup delay, rebuffer statistics, and response-layer telemetry such as HTTP status and time-to-first-byte variants. Structured inner lists support simultaneous audio and video reporting. Adoption is progressing but uneven. HLS.js and Shaka Player provide early support; dash.js integration is ongoing. The Common Media Library offers emission abstractions, and open-source tooling such as cmcd-toolkit supports generation, parsing, and validation. These tools matter because v2 introduces backend requirements – batch POST handling, filtering, and sequence discipline – that extend beyond simple header parsing. The CMCD v2 specification has been published as a formal standard, though access to the normative document is now restricted to CTA members or available for purchase. The specification is stable; ecosystem normalization continues.

CMSD: server-plane diagnostics and Media Quality extensions

If CMCD captures client-plane telemetry, CMSD provides server-plane diagnostics. Ecosystem support first emerged on the encoder, packager, and origin side – across vendors such as AWS Elemental, Unified Streaming, Norsk, and Touchstream – allowing delivery intent and diagnostic metadata to surface upstream. CDN-side adoption progressed more slowly, resulting in partial end-to-end QoS visibility. 

Originally defined to expose delivery characteristics such as cache state and routing hints, CMSD is now being extended within the Streaming Video Technology Alliance (SVTA) Measurement/QoE Working Group to support Media Quality Assessment (MQA) signaling. The extension enables CMSD to carry per-segment or per-GOP perceptual metrics – such as VMAF or similar perceptual quality metrics – encoded using HTTP structured fields. This broadens CMSD beyond QoS telemetry; delivery and perceptual metrics coexist within a single signaling channel. Structured CMSD signaling has also been explored by Qualabs as a mechanism for enabling advanced playback coordination behaviors, including synchronization across distribution paths and DASH 6th Edition timing models.

MQA data transmission across content preparation and delivery pipelines
MQA data transmission across content preparation and delivery pipelines (SVTA)

The operational implication is that perceptual quality can inform runtime decision paths rather than remain retrospective analytics. Amazon CloudFront has documented media quality-aware resiliency workflows in which CMSD-embedded MQA signals influence origin routing decisions. AWS Elemental MediaPackage v2 supports input failover policies informed by MQA signals, enabling quality-based switching rather than transport-error-only triggers. Monitoring platforms such as Touchstream incorporate CMSD-MQA into operational dashboards, bringing perceptual telemetry into both control and observability domains.

Because CMSD propagates server-to-client, analytics require a return path. Players extract CMSD-MQA data and re-emit it via CMCD v2 Event Mode. CMSD remains the delivery-plane carrier; CMCD v2 becomes the analytics-plane transport that closes the loop. Observability shifts from passive logging to operational feedback.

CMCD v2, CMSD, C2PA and ad beaconing: a convergence opportunity

Ad beaconing remains fragmented, with no standardized client-to-server signaling model for confirming individualized ad delivery at segment level. CMCD v2 Event Mode provides a potential convergence surface. A CMSD-injected per-session ad token could be echoed by the player via structured CMCD v2 events upon playback confirmation. Monotonic sequencing and loss detection provide ordering guarantees, and optional binding to a C2PA segment hash could further strengthen integrity. This is not a production blueprint, but it illustrates how observability, delivery metadata, and trust frameworks could converge into a more standardized beaconing model linking monetization and accountability.

Content Steering: control-plane normalization

Content Steering originated within Apple’s HLS specification work, where the model first appeared in rfc8216bis-10 in late 2021. It was later externalized to the IETF in 2025 as a standalone Content Steering draft, separating the core control-plane logic from HLS-specific syntax and establishing an independent protocol definition usable across streaming formats. MPEG-DASH 6th Edition incorporates this IETF-defined mechanism, providing a standardized runtime control surface for DASH deployments. Ecosystem normalization is still underway.

Steering defines a runtime interface through which pathway selection can be influenced after playback has begun. Clients periodically query a Steering Server for updated pathway priorities and alternative locations, allowing the control plane to adjust delivery decisions while sessions remain active. This mechanism complements traditional startup-time routing methods such as DNS or manifest-level URL selection by introducing a mid-session coordination layer between client and infrastructure.

MPEG-DASH Content Steering Workflow
MPEG-DASH Content Steering Workflow

Steering operates independently of manifest reload cadence and maintains deterministic contact with active sessions at TTL-defined intervals. This creates a predictable rendezvous between client and control plane, enabling mid-session remapping of delivery paths. Unlike DNS-based routing—limited by resolver caching and TTL propagation—Steering allows operators to reorder pathway priorities, migrate sessions between CDNs, or redirect traffic without regenerating manifests or interrupting playback. The refresh cadence also provides a natural control point for renewing short-lived authorization artifacts such as Common Access Tokens, aligning delivery decisions with session-level trust management. In practice, Steering moves traffic management from a startup-time DNS decision to an ongoing protocol interaction embedded in the streaming workflow.

Originally positioned for multi-CDN load balancing, Steering is increasingly used as a runtime control surface for streaming infrastructure. At Mile High Video 2026, Gwendal Simon (Synamedia) described operational scenarios extending beyond CDN selection, including content-aware pathway differentiation, localized CDN capacity mitigation, and controlled client/edge remapping during maintenance windows. These examples illustrate how Steering can coordinate delivery and processing infrastructure during active sessions.

Common Access Token (CAT): edge-validated session authorization

The Common Access Token (CAT) has emerged as a central component of the security layer. CAT is an HTTP-native, cryptographically verifiable authorization artifact designed for validation at CDN edge without origin round-trips. Unlike legacy URL signing approaches, CAT can encode richer claims such as entitlement tier, scope constraints, and optional session identifiers. Validation typically consists of signature verification, expiry evaluation, and scope matching, all performed statelessly at the edge. Major CDNs like Akamai, CloudFront or Fastly now all support validation and creation of CBOR-encoded CAT tokens.

Modern deployments increasingly adopt session-scoped CAT profiles. Rather than issuing long-lived object-level tokens, services generate short-lived tokens bound to a playback session identifier. Tokens rotate during playback while maintaining session continuity, reducing replay exposure and enabling targeted session invalidation. Edge validation remains the primary enforcement surface, limiting origin dependency and preserving latency characteristics. Short-lived tokens require structured renewal. Traditional mechanisms rely on explicit refresh endpoints or manifest reloads, but a more coherent operational model leverages the deterministic revalidation cycle of Content Steering. Synamedia’s Gwendal Simon discussed this model in his Demuxed 2024 presentation, and a Multi-CDN CAT demo involving Synamedia, Akamai, Castlabs and dash.js illustrated its applicability in coordinated multi-CDN deployments.

Common Access Token renewal through Content Steering (Synamedia)

Because Steering requires periodic client contact at defined TTL intervals, this exchange can also convey refreshed CAT artifacts. Steering therefore functions not only as a delivery control surface but also as a structured renewal channel for authorization.

Transport hardening techniques are also emerging. Binding CAT claims to a TLS fingerprint, such as JA4, can mitigate replay risk across device classes, particularly for premium live events. While not standardized within streaming specifications yet (but hopefully soon), JA4 fingerprints definitely introduce a significant amount of friction to token sharing.

C2PA: cryptographic provenance in segmented streaming

Beyond access control, trust also extends to content authenticity. The Coalition for Content Provenance and Authenticity (C2PA) defines a framework for asserting and verifying provenance. C2PA does not grant access; it cryptographically documents origin and modification history. The original specification primarily targeted file-based media workflows, where signed manifests describe content creation and editing lineage. In streaming environments, this model maps most naturally to video-on-demand (VOD) publishing pipelines, where finished assets can be signed before distribution.

Operational deployments are beginning to appear in production environments. Capture-stage signing using cameras capable of embedding digital signatures at acquisition is being trialed in newsroom workflows. Open-source stamping and verification tools are under development to reduce workflow overhead, and several public broadcasters—including ARD and France Télévisions—have begun integrating C2PA signing into their production pipelines. Adobe also contributed a dash.js-based reference implementation in 2023 and presented integration approaches for DASH workflows at ACM MMSys 2024. These initiatives illustrate early adoption of C2PA across production and VOD distribution workflows, but player support, tooling, and ecosystem integration remain limited and will require substantial additional development investment before provenance verification becomes broadly operational.

High-level overview of the C2PA verification pipeline (Adobe)
High-level overview of the C2PA verification pipeline (Adobe)

The live specification adapts the provenance model to segmented streaming environments. Two mechanisms are defined. One embeds signed provenance data directly within media segments, binding each segment to its authenticity assertions. The other adopts a more bandwidth-efficient design: initialization segments convey session keys, and each media segment carries a signed event message containing sequence number and segment hash information. This approach aligns naturally with CMAF-based live workflows and enables per-segment validation without repeating full provenance payloads.

Live C2PA deployments are still emerging—at present, EZDRM is among the few vendors offering an operational implementation. The live model also changes the architectural assumptions that limited adoption of the original file-oriented specification. Instead of attaching large provenance manifests to finished assets using Merkle-tree structures, the segmented approach integrates authenticity signaling directly into the streaming workflow. If ecosystem tooling and player support mature, this model could make provenance verification more practical for both live and on-demand video, potentially accelerating C2PA adoption across the broader streaming ecosystem.

Live A/B Watermarking: from metadata dependency to deterministic assignment

Forensic watermarking has matured through formal standardization. The DASH-IF watermarking specification published at ETSI in 2023 defines the end-to-end A/B watermarking workflow across encoder, packager/origin, and CDN stages. In parallel, the UHD Forum Watermarking API for Encoder Integration is being updated to align encoder interfaces with these signaling models, improving interoperability across encoding, packaging, and delivery infrastructure.

Initial ecosystem integration is already visible. Edge execution environments such as Akamai EdgeWorkers enable watermark-related logic at the CDN layer, while origin vendors such as Unified Streaming integrate watermark signaling directly within packaging workflows. Standardized signaling reduces proprietary coupling across the encoder–origin–CDN pipeline and enables interoperable deployment surfaces.

A/B watermarking mechanism for ABR
A/B watermarking mechanism for ABR

However, live deployment historically introduced integration friction. Earlier A/B watermarking models relied on external per-segment signaling artifacts, creating tight coupling between encoders, packagers, and origins. This dependency on auxiliary metadata complicated real-time workflows and limited operational flexibility for large-scale live streaming.

Within the SVTA, current work focuses on simplifying live watermarking integration by reducing reliance on external metadata signaling. The emerging approach derives watermark assignment deterministically from timecode and segment index mapping. By allowing switching decisions to be computed directly from stream structure, encoder/packager/CDN coordination becomes simpler and auxiliary per-segment control artifacts can be eliminated. This simplification primarily targets live workflows; VOD pipelines continue to use external metadata files as part of their control plane.

Watermarking complements authorization rather than replacing it. Common Access Tokens (CAT) govern access control, while watermarking provides attribution in the event of redistribution. Session identifiers embedded in CAT can align with watermark assignment strategies, strengthening forensic traceability in live multi-CDN environments.

DRM scalability and key rotation cadence

While newer trust mechanisms evolve around authorization, provenance, and attribution, the core DRM cryptosystem remains structurally stable. CPIX continues to serve as the orchestration mechanism for multi-DRM key exchange and packaging consistency, with recent revisions refining schema hygiene and bulk coordination rather than altering entitlement semantics or device enforcement models.

Operational pressure, however, is increasing around frequent key rotation in large-scale live deployments. Shorter key lifetimes reduce exposure windows but can stress DRM license delivery platforms if rotation events concentrate license requests. Work within the SVTA Security Working Group is examining mechanisms that enable higher rotation frequency while preserving backend scalability – specifically by distributing rotation load over time and avoiding burst amplification against license servers. A determining success factor will be broad support for multi-key license responses, particularly within Apple FairPlay environments. Delivering multiple content keys within a single license transaction materially reduces license request amplification under rapid rotation scenarios and enables shorter key lifetimes without proportional growth in backend load.

In aggregate, the trust layer in 2026 streaming architectures extends beyond DRM alone. It combines stateless edge-validated tokens, structured renewal cycles aligned with steering, optional transport hardening, cryptographic provenance mechanisms, scalable key rotation strategies, and standardized forensic watermarking. Each element addresses a distinct threat model; together they form a layered integrity and access-control framework appropriate for modern live and on-demand streaming environments.

Open Caching and Multicast ABR: technical maturity, limited adoption

Alternative distribution models struggled to generalize beyond targeted deployments. Open Caching, highlighted in 2021 as a potential mechanism to rebalance CDN economics through operator-hosted cache federation, progressed at the specification level and saw real-world experimentation. Yet commercial alignment proved more complex than interface alignment. As noted by industry observers such as Dan Rayburn, Open Caching did not reach the deployment density or stable economic equilibrium required to materially reshape delivery architectures. It remains viable in specific bilateral arrangements, but it did not become a universal distribution primitive.

Multicast ABR delivery followed a comparable trajectory. The efficiency case for large live audiences is clear, but architectural fragmentation limited broader adoption. The DVB Adaptive Media Streaming over IP Multicast specification (ETSI TS 103 769) defines a framework supporting multiple multicast transport profiles, including FLUTE and ROUTE, and in later revisions NORM and MSync. This plurality enabled vendor flexibility but did not converge into a single multicast ABR transport commonly implemented across client and/or home gateway platforms, so deployments always require vendor-specific agents instead of native device support.Rather than attempting to converge the ecosystem around a single multicast-aware client profile, BT’s Multicast-Assisted Unicast Delivery (MAUD), developed with Broadpeak, repositions multicast as an operator-side optimization layered beneath standard HTTP ABR workflows. While largely compatible with DVB multicast ABR specifications, MAUD shifts execution complexity into operator-network proxy components: a root proxy converts CDN-delivered unicast streams into multicast for distribution across the access network, while an edge proxy, typically located in the subscriber’s broadband gateway, converts the multicast stream back to unicast for the player. Client applications remain unmodified, and CDN authentication, session visibility, and logging semantics are preserved. In doing so, MAUD sidesteps device-level transport fragmentation rather than resolving it through specification consolidation.

MAUD Architecture (BT)

Its viability therefore depends on operator-domain control and coordinated deployment. Because MAUD sources streams from CDN-delivered unicast before redistributing them over multicast, additional proxy processing and buffering stages may introduce latency overhead relative to purely unicast delivery paths. MAUD standardization continues within IETF Media Operations (MOPS), yet hybrid multicast has not converged into a widely deployed, interoperable delivery model.

SVTA EDGE: re-architecting edge deployment for live and low-latency workloads

In parallel, edge deployment itself is evolving. The SVTA EDGE working group was initiated in December 2024 to address limitations exposed by live, low-latency, and high-concurrency workloads that increasingly strain traditional CDN cache architectures. Rather than introducing a new delivery protocol, SVTA EDGE focuses on specifying how specialized, media-aware workloads can be deployed and operated at the edge. It aims to standardize models for container- and VM-based deployment, capability advertisement, telemetry exposure, and a clearer separation between control and data planes. This makes it operationally viable to deploy purpose-built delivery components – such as session-aware relays, content steering logic, or low-latency processing elements – within CDN and operator environments without relying on proprietary integration patterns. SVTA EDGE is transport-agnostic and can accommodate HTTP-based delivery, MoQ, or other protocols. Beyond deployment mechanics, it has structural implications for the CDN model itself. By standardizing how media workloads are orchestrated and monitored at the edge, it lowers the barrier to operator-managed and private edge deployments. Broadcasters, telcos, and large platforms can run streaming components within infrastructure they directly control, enabling hybrid delivery models that combine public CDN capacity with private or federated edge nodes. SVTA EDGE does not displace commercial CDNs, but it reduces dependence on proprietary integration and increases flexibility in how edge functionality is owned and operated. Its long-term impact will depend less on specification maturity than on ecosystem adoption.

HTTP/3 and QUIC: modernization under economic constraints

HTTP/3 is broadly available across modern browsers and supported across major CDN infrastructures. Its advantages – reduced connection establishment latency and elimination of transport-layer head-of-line blocking – are well established. In streaming workloads, however, the impact is incremental rather than transformative. Large sequential segment transfers benefit less from multiplexing than web workloads composed of many small assets. Once connections are established, throughput differences between HTTP/2 and HTTP/3 for media delivery are often modest.

The constraint is economic rather than functional. In steady-state deployments, QUIC frequently consumes roughly 1.8×–2.2× the CPU cycles per delivered gigabyte compared to optimized HTTP/1.1 over TCP. The delta is primarily compute-driven: per-packet encryption and authentication account for a significant share of utilization, alongside user-space transport processing, acknowledgement handling, pacing, and memory copy overhead. Unlike TCP, which benefits from decades of kernel optimization and mature network interface offload, QUIC shifts more processing into user space and onto host CPUs.

Closing this gap does not require protocol redesign, but it does require disciplined fleet engineering. Near-term gains come from improved packet batching, fewer memory copies, better CPU–NIC alignment, reduced system call overhead, and tighter pacing control. These optimizations are largely software-driven. Longer-term improvements depend on broader availability of hardware cryptographic acceleration and deeper transport–NIC integration. In high-density environments where encryption dominates host cost, selective NIC refresh or SmartNIC adoption may become justified; for most fleets, meaningful efficiency gains remain achievable through software optimization first.

Until those efficiencies are normalized, many large-scale streaming systems continue to rely predominantly on optimized HTTP/1.1 or HTTP/2 for bulk delivery, enabling HTTP/3 selectively where startup or resilience benefits justify the additional CPU cost. CDN optimization investment therefore becomes the determining factor in the viability of QUIC-based delivery – whether through HTTP/3 or MoQ Transport. Discussions at Mile High Video following MoQ presentations underscored that QUIC-based delivery has not yet reached economic parity at scale and still requires sustained engineering investment in CPU efficiency and hardware acceleration before it can be considered operationally neutral.


Across packaging, compression, observability, trust, and transport, the pattern is consistent: the HTTP segment model has been refined, disciplined, and economically optimized. It has not been replaced. Yet the pressures that motivated these refinements – lower latency, finer-grained control, bidirectional interaction, and tighter edge–client coupling – continue to accumulate. Media over QUIC does not attempt to optimize the segment model further; it questions whether the segment abstraction itself remains the right primitive for the next decade of streaming. The second half of this reassessment examines that alternative in detail, exploring Media over QUIC as a session-native architecture and asking whether streaming is approaching a deeper structural transition.