What Is Low-Latency HLS?
A video format and protocol created in 2009 for streaming video and audio from a media server to a user device, Low-Latency HTTP Live Streaming (LL-HLS) is based on Apple’s HTTP HLS protocol. Since it prioritized reliability over latency, the original HLS protocol is less suitable for streaming live video over public networks with limited bandwidth. In contrast, the modern LL-HLS protocol delivers broadcast-quality video streaming over all public networks.
LL-HLS provides a parallel channel for distributing media. The media is divided into a larger number of smaller files called HLS Partial Segments. These smaller files are distributed at the live edge of the Media Playlist, optimizing it for low-latency streaming. Plus, LL-HLS has latency even lower than cable TV (less than 2s), making it essential for professional broadcasting.
LL-HLS runs a latency of 2-8 seconds, a remarkable reduction of the up-to-30-seconds latency in traditional streaming protocols. Several LL-HLS protocol features make that possible, including the use of partial segments to distribute media files, delta playlists that prioritize low-latency segments, and preload hints that help clients anticipate which data to download next.
This article explains the basics of the LL-HLS protocol and compares it to two other popular streaming formats: the Common Media Application Format (CMAF) and Web Real-Time Communications (WebRTC).
Related content: a practical guide to HTTP Live Streaming
This article covers the following topics:
- Key Functionality of Low-Latency HLS
- The Process of Low-Latency HLS
- Low-Latency HLS Versus Low-Latency CMAF
- Low-Latency HLS Versus WebRTC
- Live Video Streaming With Cloudinary
What is Low Latency?
“Low latency” refers to a short delay or minimal lag between an action being initiated and its corresponding effect being observed. In various technological contexts, particularly in computer networking, low latency describes the brief time interval required for data to travel from one point to another. Low-latency networks are crucial for applications that necessitate quick response times.
Key Functionality of Low-Latency HLS
Back-end production tools and content delivery systems can apply rules that enable low-latency streaming and playback. With LL-HLS, you can perform the following tasks:
- Generate partial segments. Low-latency HLS lets you distribute media files by dividing them into smaller files called partial segments. Since those files have a short duration, you can package and publish them faster than the parent segment.
- Generate delta playlists. Low-Latency HLS allows you to transfer playlists frequently with delta playlists, which are smaller than the full playlist and which prioritize low-latency segments. You can also request a delta update of playlists to reduce the cost of transfers.
- Block playlist-reload requests. Low-Latency HLS lets you block playlist reload requests, enabling the server to more efficiently notify the client of new media segments and partial segments. The server can block a request until a playlist version containing the relevant segment becomes available.
- Block media downloads. Low-Latency HLS provides preload hints from LL-HLS, you can block media downloads, eliminating round trips and enabling delivery of low-latency streams at a large scale.
- Add rendition reports. Since the server can add rendition reports to each media playlist, you can switch renditions to adapt the bitrate.
The Process of Low-Latency HLS
LL-HLS aims to provide the scalability of HLS with a latency of 2-8 seconds (as opposed to the typical latencies of 24-30 seconds) by making the following simple changes to the protocol:
- Divide segments into subsegments called “parts,” which are listed in the playlist separately. Full segments are persistent, but you can delete subsegments.
- Generate playlists, based on whose preload hints the player can predict the data it must download, thus reducing overhead.
- HLS parts are tagged as EXT-X-PART in a playlist and are listed near the live edge only. Parts don’t have to be playable individually. They can have unique URI addresses or byte ranges, which enable the segment to be sent as a single file. The parts can be requested and addressed separately.
- The EXT-X-PRELOAD-HINT tag points out to the player which data is needed for playback. The player then sends a request to the URI in the hint, after which the server returns it as quickly as possible.
- Servers can assign unique identifiers for each playlist version, making it more cacheable and eliminating stale data.
Based on the unique naming sequences for media playlists (applied through simple query parameters), the player can request playlists based on their segments or parts. The server can then quickly see the player’s requirements and provide updated playlists if they are available—with no need for large buffers, which increase latency.
LL-HLS reduces latency by simplifying the player side with player-side buffers while servers publish information and while the segment is being generated. As a result, buffering becomes more efficient than traditional HLS. Note that you can use the LL-HLS protocol extension for older players, which, being backward compatible with HLS, might not be LL-HLS aware, however.
Deliver videos instantly without buffering using on-the-fly encoding and automatic format and quality selection.
Low-Latency HLS Versus Low-Latency CMAF
What Is CMAF?
CMAF is an extensible standard for packaging and encoding segmented media objects for adaptive multimedia delivery. A hypothetical application model abstracts delivery and decoding on end-user devices, supporting various implementations, such as MPEG Dynamic Adaptive Streaming over HTTP (DASH) and HLS.
CMAF defines the following logical media objects:
- CMAF track, which contains encoded samples of media, such as video, audio, and subtitles, with a CMAF header and fragments. The samples are stored in a CMAF-specified container based on the ISO Base Media File Format (ISO BMFF). You can also protect media samples by means of MPEG Common Encryption (COMMON ENC).
- CMAF switching set, which contains alternative tracks with different resolutions and bitrates for adaptive streaming, which you can splice in at the boundaries of CMAF fragments.
- Aligned CMAF switching set, which contains switching sets from the same source through alternative encodings (e.g., with different codecs), which are time-aligned to one another.
- CMAF selection set, which contains switching sets in the same media format. That format might contain different content, such as alternative camera angles or languages; or different encodings, such as alternative codecs.
- CMAF presentation, which contains one or more presentation time-synchronized selection sets.
Comparison
Before Apple’s 2019 Worldwide Developers Conference, the streaming community reduced latency for CMAF with chunked transfer encoding. After Apple announced support for a new low-latency technology for HLS, a major difference emerged: the use of HTTP/2 Server Push.
However, efficiently delivering video at scale requires that all vendors across the workflow support the specification—an ongoing challenge for most content delivery networks (CDNs).
You can use either chunked CMAF or LL-HLS for applications with a large audience and an acceptable latency in the 3-to-8-second range. Be sure to take into account factors like digital rights management (DRM) and browser and device support.
Being more mature, CMAF enjoys implementations from CDNs, video players, and encoder providers, which means that fewer devices have LL-HLS-compliant players. Nonetheless, now that LL-HLS is part of the HLS specifications, that scenario is likely to change. Supported in macOS, tvOS 14, iOS 14, and WatchOS 7, LL-HLS promises to be a good way to future-proof your app.
Low-Latency HLS Versus WebRTC
What Is WebRTC?
Released in 2011, Web Real-Time Communications (WebRTC) is a Google-developed, open-source protocol that offers a set of conventions, standards, and JavaScript programming interfaces for end-to-end encryption. Google adopts WebRTC for its applications like Hangouts and YouTube.
Since WebRTC does not contain third-party software or plugins, it can pass through firewalls without losing quality or adding latency.
Below is a comparison of LL-HLS and WebRTC.
Latency
Designed for bidirectional, real-time communication, WebRTC is the fastest protocol available with wide support. Even though traditional HLS is in wide use, it has a 10-to-40-second latency. For use cases like live streaming, a latency of up to 500 milliseconds is acceptable. LL-HLS can reduce HLS latency to 2-3 seconds.
Quality
Adaptive bitrate (ABR) is crucial for high-quality media because it works over various connections, devices, and software. HLS is the standard for ABR video, which gives LL-HLS the upper hand. The video stream’s quality automatically adjusts between multiple resolutions and bitrates, ensuring that the media server delivers the highest-quality stream to all viewers (based on the device and connection).
WebRTC emphasizes speed over quality, so even though it works in ABR, that support is limited to a built-in subscriber-side ABR only. In the case of multiple subscribers, if one has a weak network, the publisher switches to a low-quality stream, impacting the others.
Compatibility
Even though HLS is the most popular protocol for streaming, since LL-HLS is less widespread, the compatibility of LL-HLS is much lower than that of standard HLS. However, Apple announced in 2020 that it was dropping the requirement for HTTP/2 Server Push, meaning that the low-latency protocol can be extended to the entire HLS specification.
Fully compatible with CDNs, HLS is widely supported by browsers and devices, but LL-HLS is not compatible with major non-Apple players like Android ExoPlayer.
WebRTC can function in the browser without additional software or plugins. For all that most desktop browsers support WebRTC, they can be buggy, and mobile browsers offer less support. That poor experience means that users must install native applications. Given its emphasis on LL-HLS, Apple is unlikely to support WebRTC.
Scalability
WebRTC is the most suitable protocol for real-time, peer-to-peer streaming, but not for scale. To stream to over 50 viewers, either of these two workflows would be a good choice:
- End-to-end WebRTC workflow, which ensures the lowest possible latency. To scale your audience, add a media server and reduce the bandwidth for low-latency streaming to up to 300 viewers.
- Transmuxing of WebRTC to DASH or HLS, which is slated for audiences of over 300 viewers. Keep in mind that scaling up your audience adds latency.
Thanks to the growing support for LL-HLS, global media delivery and reduced latency are now a reality because more content is cached near the viewers. Though more latent than WebRTC, LL-HLS can stream to thousands of viewers in under three seconds.
Security
Be sure to protect your data and streams so as to prevent leaks and tampering by unauthorized users. You can encrypt HLS and, theoretically, LL-HLS can benefit from security capabilities, such as token authentication, key rotation, and digital rights management (DRM), assuming that they’ve been correctly configured by the providers in their systems.
WebRTC is encrypted by default, preventing unauthorized access to streams. It also offers features like user, file, and round-trip authentication. Consequently, WebRTC security is often sufficient for DRM purposes, eliminating the need to contract third-party DRM support.
Cost
Given that it is open source, WebRTC is free and cost effective for small audiences. Scaling it up requires additional servers and can be expensive.
HLS is based on affordable HTTP infrastructure and existing network technology developed according to the Transmission Control Protocol (TCP). Hence why LL-HLS is a more economical option for large-scale streaming.
Live Video Streaming With Cloudinary
With HLS, you create many video copies or video representations, each of which must meet the requirements of various device resolutions, data rates, and video quality. Additionally, you must do the following:
- Add index and streaming files for the representations.
- Create a master file that references the representations and that provides information on and metadata for the various video versions.
That’s a load of grunt work for even one video. Turn to Cloudinary, which automatically generates and delivers all those files from the original video, transcoded to either HLS, MPEG-DASH, or both of those protocols. Called “streaming profiles,” that feature enables you to configure adaptive streaming processes. You can customize the default profile settings as you desire. Once those settings are in place, Cloudinary automatically handles all the drudgery for you.
For details on streaming profiles, see the following:
- The article HTTP Live Streaming Without the Hard Work
- The related Cloudinary documentation: HLS and MPEG-DASH adaptive streaming
Other resources are our interactive demos, our FAQ page, and the Cloudinary community forum.
Above all, try out this superb feature yourself. Start with creating a free Cloudinary account.