MEDIA GUIDES / Video

How Picture-in-Picture Video Works Across Mobile & Desktop

Video is no longer consumed in a single, focused moment. Users increasingly expect to watch content while doing something else—browsing, messaging, working, or navigating between apps.

Picture-in-Picture (PiP) addresses this shift by allowing video playback to continue in a small, floating window that stays visible while users interact with other content. Instead of forcing a choice between watching and multitasking, PiP makes video a persistent part of the user experience.

Today, PiP is supported across modern browsers and mobile operating systems, making it a standard feature for streaming platforms, video conferencing tools, and content-driven applications.

This guide explains how Picture-in-Picture works across desktop and mobile environments and describes how developers can implement PiP using web APIs and the Cloudinary video player.

Key takeaways:

  • Picture-in-Picture video is a playback mode that lets videos float in a small window while users multitask, without changing the video itself. While useful for continuous viewing, it has limitations like inconsistent browser support and restricted controls within the overlay.
  • Picture-in-Picture on desktop and mobile lets videos play in a floating window controlled by the browser or operating system, using APIs like requestPictureInPicture(). While developers can enable and trigger PiP, the system manages the window’s behavior, limiting customization but ensuring consistent multitasking support.
  • Cloudinary’s video player includes built-in Picture-in-Picture support that can be enabled with a simple configuration, eliminating the need to manually use browser APIs. This feature improves user experience and performance by allowing floating playback while reducing player load size through lazy loading.

In this article:

What Picture-in-Picture Video Is

Picture-in-Picture video is best understood as a playback mode, not a separate video format or delivery method. Importantly, PiP does not change the source video itself–the media stream, encoding, and playback logic remain the same.

What changes is the way the viewing surface is presented: the web browser (or operating system) lifts the active video out of the page or app layout and displays it in a smaller floating window. This overlay stays visible while the user switches tabs, opens another app, or continues another task on the same device.

As a result, Picture-in-Picture video is especially useful for content that benefits from continuous visibility rather than full-screen attention. Tutorials, live events, video calls, navigation, and long-form streams all benefit because viewers can continue watching without interrupting their other tasks.

On the web, PiP is tied to the video element itself and can be entered or exited through browser APIs such as requestPictureInPicture() and exitPictureInPicture(). On Android, PiP is treated as a special multi-window mode, which is why the experience feels more tightly integrated with app switching and system controls.

The challenge with PiP is that there can be practical limitations for implementers, including:

  • A lack of universal browser support
  • Custom HTML controls cannot be added directly inside the PiP overlay
  • Applications often need to adapt to the surrounding UI when the video enters or leaves PiP mode.

These constraints shape how PiP should be designed across desktop and mobile, which is why platform-specific behavior matters as much as the basic concept.

How Picture-in-Picture Works on Desktop Browsers

Desktop browsers support Picture-in-Picture video via the Picture-in-Picture Web API, which allows a video element to be displayed in a floating window managed by the browser.

When a video enters PiP mode, the browser temporarily removes it from the document layout and renders it in a small overlay window that remains visible even when the user switches tabs or interacts with other applications. The window can typically be moved or resized, with basic playback controls such as play and pause remaining available.

Developers can control PiP using JavaScript APIs. The most common function for enabling a video element to enter PiP mode is requestPictureInPicture(). Returning the video to its original position is achieved by calling exitPictureInPicture().

Browsers also expose properties such as document.pictureInPictureEnabled, which allows applications to detect whether PiP is supported in the current environment.

A minimal example of triggering PiP is as follows:

if (document.pictureInPictureEnabled) {
  video.requestPictureInPicture();
}

In addition to programmatic control, many browsers provide built-in ways for users to activate PiP, such as player controls or browser-level UI elements. Video platforms often surface this functionality through a PiP button that triggers the underlying browser APIs.

Because the floating window is controlled by the browser rather than the web page, developers have limited control over its appearance and behavior. Instead, the browser manages window behavior, playback controls, and resizing.

How Picture-in-Picture Works on Mobile Devices

Mobile operating systems provide native support for Picture-in-Picture video, allowing playback to continue while users navigate between apps.

On Android, PiP was introduced in Android 8.0 as a specialized multi-window mode designed for video playback and navigation. When an app enters PiP mode, the current activity shrinks into a small window that remains pinned to the screen while the user switches to other applications. The system manages the window position and ensures that playback continues in the background.

Users can interact with the PiP window in several ways. They can move it to different corners of the screen, resize it, or expose playback controls such as play, pause, and close. In newer Android versions, users can also stash the window to the side of the screen or temporarily expand it.

iOS and iPadOS provide similar functionality. When a supported video app enters PiP mode, playback continues in a floating window that remains visible even after the user returns to the home screen or opens another app. Users can reposition the window, resize it using pinch gestures, or restore the video to full-screen playback.

For developers, PiP on mobile devices typically involves enabling platform-specific APIs or configuration settings within the app. Once enabled, the operating system automatically manages the floating playback window.

On Android, developers must declare PiP support in the app manifest and trigger PiP mode from an activity:

<activity
    android:name=".VideoActivity"
    android:supportsPictureInPicture="true"
    android:configChanges="screenSize|smallestScreenSize|screenLayout|orientation" />

// Then, switch the activity into PiP mode when appropriate:

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
    enterPictureInPictureMode()
}

// On newer Android versions, PiP can automatically be enabled during navigation:

val params = PictureInPictureParams.Builder()
    .setAutoEnterEnabled(true)
    .build()

setPictureInPictureParams(params)

On iOS, PiP is supported through AVKit. For video playback using AVPlayerViewController, PiP can be enabled with a single property:

let playerController = AVPlayerViewController()
playerController.player = player
playerController.allowsPictureInPicturePlayback = true

When the user leaves the app or taps the PiP button, the system automatically transitions the video into a floating window.

Setting Up Picture-in-Picture for Web Video

Developers can enable PiP in web applications by using the browser’s Picture-in-Picture API alongside standard HTML video elements. Unlike the section on desktop browser behavior (which focuses on how PiP is rendered), this section covers how to implement and control PiP in a web application.

Enabling PiP with HTML and JavaScript

A typical implementation begins with a standard <video> element embedded in a web page. Developers then add JavaScript logic to control when the video enters or exits Picture-in-Picture mode.

Before enabling PiP, be sure to check whether the web app’s browser supports the feature:

When the user leaves the app or taps the PiP button, the system automatically transitions the video into a floating window.

Setting Up Picture-in-Picture for Web Video

Developers can enable PiP in web applications by using the browser’s Picture-in-Picture API alongside standard HTML video elements. Unlike the section on desktop browser behavior (which focuses on how PiP is rendered), this section covers how to implement and control PiP in a web application.

Enabling PiP with HTML and JavaScript

A typical implementation begins with a standard <video> element embedded in a web page. Developers then add JavaScript logic to control when the video enters or exits Picture-in-Picture mode.

Before enabling PiP, be sure to check whether the web app’s browser supports the feature:

if (document.pictureInPictureEnabled) {
  pipButton.disabled = false;
}

// When the user activates PiP, the application can request Picture-in-Picture mode:

video.requestPictureInPicture();

Developers can also listen for lifecycle events such as enterpictureinpicture or leavepictureinpicture to update the interface or trigger application logic:

video.addEventListener("enterpictureinpicture", () => {
  console.log("Entered PiP mode");
});

video.addEventListener("leavepictureinpicture", () => {
  console.log("Exited PiP mode");
});

This approach keeps PiP logic centralized within the application while relying on the browser to manage the floating playback window.

Implementing Picture-in-Picture with the Cloudinary Video Player

For developers using the Cloudinary video player, enabling Picture-in-Picture does not require manually implementing the browser’s PiP API. The player includes built-in support for PiP through a configurable toggle.

Cloudinary introduced native picture-in-picture video support in an updated version of its video player, allowing viewers to move playback into a floating window directly from the player interface. This makes it easier for users to continue watching content while navigating other parts of a page or switching between apps.

To enable the feature, developers simply set the pictureInPictureToggle parameter to true when initializing the player.

const player = cloudinary.videoPlayer("my-video", {
  pictureInPictureToggle: true
});

Once enabled, the player automatically displays a Picture-in-Picture video in supported browsers. When users activate the toggle, the video enters the browser’s native PiP model and continues playing in a floating window.

This player update also introduced performance improvements. Several advanced player features (including chapters, ads, recommendations, and playlists) are now lazy-loaded, reducing the core bundle size from approximately 283 KB to 233 KB. This smaller bundle helps improve page load performance while maintaining full video functionality.

Keep Video in View While You Build

Picture-in-Picture video allows video to remain visible while users multitask across apps, tabs, or screens. By separating playback from the main interface, PiP makes it easier to watch tutorials, participate in video calls, follow livestreams, or monitor media while working in other applications.

For developers, implementing PiP can be as simple as using the browser’s Picture-in-Picture API or enabling built-in support in modern video players. Tools like the Cloudinary video player make this even easier by integrating PiP functionality directly into the player interface.

As video continues to play a larger role in digital experiences, features like Picture-in-Picture help create more flexible and user-friendly media workflows across both desktop and mobile environments.

Ready to add Picture-in-Picture to your video experience?

If you’re building applications where users need to watch a video while they interact with the application, PiP is only one part of the equation. Reliable playback, cross-device compatibility, and seamless player controls must work together.

Contact Cloudinary now to see how our video player and media pipeline can help you deliver flexible, high-performance video experiences across web and mobile.

Frequently Asked Questions

Which browsers support Picture-in-Picture video?

Most modern browsers support Picture-in-Picture video, including Chrome, Edge, Safari, and Opera. Firefox also supports PiP, although its implementation differs slightly. Developers can check support programmatically using document.pictureInPictureEnabled to ensure compatibility before enabling the feature.

Can I customize the controls inside a Picture-in-Picture window?

Unfortunately, no. The PiP window is controlled by the browser or operating system, so developers cannot add custom HTML UI elements within it. However, standard media controls such as play and pause are typically available, and additional controls may appear if integrated with APIs like the Media Session API.

Does Picture-in-Picture affect video performance or streaming quality?

Picture-in-Picture does not change the video stream itself. The same video source, bitrate, and delivery logic are used whether the video is in PiP mode or embedded in the page. However, maintaining smooth playback still depends on proper video optimization, such as adaptive bitrate streaming and efficient delivery through a CDN.

QUICK TIPS
Tali Rosman
Cloudinary Logo Tali Rosman

In my experience, here are tips that can help you better implement and optimize picture in picture video:

  1. Treat PiP as a continuity feature, not a control surface
    PiP works best when the user already understands the content and just wants to keep it visible. Don’t rely on the PiP window for discovery, setup, or complex interaction because the browser or OS controls that surface, not your application.
  2. Design the main player UI for seamless PiP transitions
    The real UX work happens before and after PiP, not inside it. Make sure users can easily resume in-page playback, restore context, and keep their place in the content without feeling like they switched to a separate experience.
  3. Prioritize videos that stay useful at small sizes
    Not every video belongs in PiP. Talking-head content, tutorials with clear framing, navigation, monitoring dashboards, and live updates tend to work well, while dense interfaces, fine text, or complex visuals often become unreadable in the floating window.
  4. Pair PiP with Media Session metadata
    Since you cannot fully customize PiP controls, make the most of what the platform can expose. Accurate title, artwork, and playback state improve system-level controls and create a more polished multitasking experience across browsers and mobile devices.
  5. Handle subtitle strategy carefully
    Captions that look fine in the main player can become oversized, clipped, or distracting in PiP. Test subtitle readability in the floating window and consider simplifying line length, placement, or styling for content that is likely to be used in multitasking scenarios.
  6. Use PiP state changes to adapt the surrounding interface
    When video enters PiP, the page or app can reclaim space for related tasks such as chat, notes, browsing, or forms. Treat enter and leave PiP events as layout signals so the rest of the product becomes more useful instead of merely hiding the player.
  7. Avoid making PiP the only multitasking path
    Some environments support PiP differently, and some users never use it at all. Your experience should still work through sticky mini-players, background audio, or fast restore behavior so the product remains resilient when native PiP is unavailable or inconsistent.
  8. Test content behavior during navigation, not just playback
    PiP often breaks down at app boundaries: route changes, modal openings, tab switches, or SPA re-renders. Make sure the video element lifecycle is stable enough that the PiP session does not unexpectedly terminate when the user moves through the application.
  9. Be selective about when you surface the PiP option
    A PiP button on every video can create clutter without adding value. It is usually more effective on longer-form content, live streams, educational material, calls, and task-oriented viewing where users are likely to continue interacting with the page or device.
  10. Measure PiP usage as a workflow signal
    PiP adoption can reveal that users want to watch while doing something else, which is valuable product insight. Track when PiP is entered, how long sessions continue, what actions users take alongside playback, and whether PiP users retain better than standard viewers.
Last updated: Mar 27, 2026