MEDIA GUIDES / Live Streaming Video

Node.js WebRTC Video Stream: Build Real-Time Peer-to-Peer Video

Ever tried to build a video chat app and got lost in the maze of NAT traversal, STUN servers, and ICE candidates? WebRTC promises real-time peer-to-peer communication, but the setup can feel overwhelming.

We’re going to walk through building a production-ready WebRTC video streaming solution with Node.js, covering everything from basic signaling to handling thousands of concurrent connections.

Key takeaways:

  • WebRTC lets browsers stream audio, video, and data directly without plugins, but still needs a server to manage connections. Node.js is perfect for this role, handling real-time signaling efficiently.
  • WebRTC relies on three key parts: a Node.js signaling server to connect peers, STUN servers to reveal public IPs, and TURN servers to relay data if direct connections fail. Most video streams go peer-to-peer, but in tougher network cases, servers like TURN or advanced options like SFUs help keep things working.

In this article:

Understanding WebRTC and Why Node.js Is a Great Fit

Web Real-Time Communication (WebRTC) is a set of browser APIs that enable streaming audio, video, and data directly between browsers without plugins. The magic happens in the browser, but we still need a server to coordinate connections, and that’s where Node.js shines.

Node.js is ideal for WebRTC signaling due to its event-driven architecture. When hundreds of peers need to exchange connection information simultaneously, Node.js handles these asynchronous operations without breaking a sweat. The ecosystem also offers excellent WebSocket libraries, such as Socket.IO, making real-time communication straightforward.

WebRTC handles three types of data: audio, video, and arbitrary data channels. For video streaming, we’re primarily concerned with capturing media from the user’s camera and establishing a direct connection to another peer’s browser.

Node.js WebRTC Architecture: Signaling, STUN/TURN, and Media Routing

Let’s break down the architecture. WebRTC requires three key components working together: signaling servers, STUN servers, and, optionally, TURN servers.

  1. The signaling server is built with Node.js. It helps peers discover each other and exchange the information needed to establish a direct connection. Think of it as a matchmaker; it introduces peers but doesn’t handle the actual video data.
  2. STUN (Session Traversal Utilities for NAT) servers help peers discover their public IP addresses. Most devices sit behind routers using NAT, so they don’t know their public-facing address. A STUN server simply echoes back, “Here’s how the internet sees you.”
  3. TURN (Traversal Using Relays around NAT) servers act as a fallback when direct peer-to-peer connections fail. Instead of streaming directly, the video data routes through the TURN server. This happens in roughly 8-15% of connections where firewalls or network configurations block direct communication.

For media routing, we have two main options: true peer-to-peer, where browsers connect directly, or server-assisted, where an SFU (Selective Forwarding Unit) or MCU (Multipoint Control Unit) manages streams. We’ll explore both approaches later.

Setting Up a Node.js Signaling Server with WebSocket or Socket.IO

Our signaling server coordinates the WebRTC handshake between peers. We’ll use Socket.IO because it simplifies WebSocket communication and handles automatic reconnection.

First, let’s install the dependencies:

npm install express socket.io

Here’s our basic signaling server:

const express = require('express');
const http = require('http');
const socketIO = require('socket.io');

const app = express();
const server = http.createServer(app);
const io = socketIO(server, {
  cors: {
    origin: "*",
    methods: ["GET", "POST"]
  }
});

const rooms = new Map();

io.on('connection', (socket) => {
  console.log('Peer connected:', socket.id);

  socket.on('join-room', (roomId) => {
    socket.join(roomId);


    if (!rooms.has(roomId)) {
      rooms.set(roomId, new Set());
    }
    rooms.get(roomId).add(socket.id);

    // Notify others in the room
    socket.to(roomId).emit('peer-joined', socket.id);


    // Send existing peers to the new joiner
    const existingPeers = Array.from(rooms.get(roomId))
      .filter(id => id !== socket.id);
    socket.emit('existing-peers', existingPeers);
  });

  socket.on('offer', (data) => {
    socket.to(data.target).emit('offer', {
      offer: data.offer,
      sender: socket.id
    });
  });

  socket.on('answer', (data) => {
    socket.to(data.target).emit('answer', {
      answer: data.answer,
      sender: socket.id
    });
  });

  socket.on('ice-candidate', (data) => {
    socket.to(data.target).emit('ice-candidate', {
      candidate: data.candidate,
      sender: socket.id
    });
  });

  socket.on('disconnect', () => {
    rooms.forEach((peers, roomId) => {
      if (peers.has(socket.id)) {
        peers.delete(socket.id);
        socket.to(roomId).emit('peer-left', socket.id);
      }
    });
  });
});

server.listen(3000, () => {
  console.log('Signaling server running on port 3000');
});

This server manages rooms where peers can join and exchange WebRTC signaling messages. The key events are offer, answer, and ice-candidate, which carry the connection information peers need to establish direct communication.

Notice how we’re not handling video data here: the signaling server only coordinates the handshake, then peers stream directly to each other. We’ll circle back to this later in the code.

Capturing and Sending Video from the Browser (getUserMedia + RTCPeerConnection)

On the client side, we need to capture video from the user’s camera and establish peer connections. The getUserMedia API handles camera access, while RTCPeerConnection manages the peer-to-peer connection.

Here’s the client-side code:

const socket = io('http://localhost:3000');
const localVideo = document.getElementById('local-video');
const remoteVideos = document.getElementById('remote-videos');
const peerConnections = new Map();

let localStream;

const config = {
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'stun:stun1.l.google.com:19302' }
  ]
};

async function startVideo() {
  try {
    localStream = await navigator.mediaDevices.getUserMedia({
      video: { width: 1280, height: 720 },
      audio: true
    });
    localVideo.srcObject = localStream;
  } catch (err) {
    console.error('Error accessing media devices:', err);
  }
}

function createPeerConnection(peerId) {
  const pc = new RTCPeerConnection(config);


  // Add local stream tracks
  localStream.getTracks().forEach(track => {
    pc.addTrack(track, localStream);
  });

  // Handle incoming tracks
  pc.ontrack = (event) => {
    let video = document.getElementById(`remote-${peerId}`);
    if (!video) {
      video = document.createElement('video');
      video.id = `remote-${peerId}`;
      video.autoplay = true;
      video.playsInline = true;
      remoteVideos.appendChild(video);
    }
    video.srcObject = event.streams[0];
  };

  // Send ICE candidates
  pc.onicecandidate = (event) => {
    if (event.candidate) {
      socket.emit('ice-candidate', {
        target: peerId,
        candidate: event.candidate
      });
    }
  };

  peerConnections.set(peerId, pc);
  return pc;
}

async function makeOffer(peerId) {
  const pc = createPeerConnection(peerId);
  const offer = await pc.createOffer();
  await pc.setLocalDescription(offer);


  socket.emit('offer', {
    target: peerId,
    offer: offer
  });
}

socket.on('existing-peers', (peers) => {
  peers.forEach(peerId => makeOffer(peerId));
});

socket.on('peer-joined', (peerId) => {
  makeOffer(peerId);
});

socket.on('offer', async (data) => {
  const pc = createPeerConnection(data.sender);
  await pc.setRemoteDescription(data.offer);
  const answer = await pc.createAnswer();
  await pc.setLocalDescription(answer);


  socket.emit('answer', {
    target: data.sender,
    answer: answer
  });
});

socket.on('answer', async (data) => {
  const pc = peerConnections.get(data.sender);
  await pc.setRemoteDescription(data.answer);
});

socket.on('ice-candidate', async (data) => {
  const pc = peerConnections.get(data.sender);
  if (pc) {
    await pc.addIceCandidate(data.candidate);
  }
});

socket.on('peer-left', (peerId) => {
  const pc = peerConnections.get(peerId);
  if (pc) {
    pc.close();
    peerConnections.delete(peerId);
  }
  const video = document.getElementById(`remote-${peerId}`);
  if (video) video.remove();
});

// Start everything
startVideo().then(() => {
  socket.emit('join-room', 'room-1');
});

The flow is straightforward: we capture the local video stream, create an RTCPeerConnection for each peer, and exchange offers and answers to negotiate the connection. ICE candidates help peers find the best path to connect directly.

Peer-to-Peer vs Server-Assisted Streams: SFU/MCU Options

Pure peer-to-peer works great for one-on-one video calls, but scaling to multiple participants gets tricky. In a five-person call, each participant needs to send four separate streams—that’s a lot of bandwidth.

This is where Selective Forwarding Units (SFUs) come in. An SFU receives streams from all participants and forwards them to everyone else. Instead of sending four streams, each participant sends one stream to the SFU. The SFU handles routing, dramatically reducing bandwidth requirements.

MCUs (Multipoint Control Units) take this further by mixing all streams into a single composite stream. Each participant receives one stream containing everyone’s video. MCUs use more server resources but require the least client bandwidth.

For production WebRTC at scale, we recommend using an SFU. Popular options include:

  • mediasoup: A powerful open-source SFU built on Node.js. It gives us fine-grained control over routing and supports hundreds of participants per server. The learning curve is steeper, but the performance is excellent.
  • Janus: A C-based WebRTC server with Node.js bindings. It’s incredibly fast and battle-tested, though configuration can feel complex initially.
  • LiveKit: Offers a modern, cloud-native approach, with excellent documentation and built-in features such as recording and simulcast. It’s our go-to recommendation for teams wanting to get production-ready quickly.

Here’s a basic mediasoup setup:

const mediasoup = require('mediasoup');

let worker;
let router;

async function initMediasoup() {
  worker = await mediasoup.createWorker({
    logLevel: 'warn',
    rtcMinPort: 10000,
    rtcMaxPort: 10100
  });

  router = await worker.createRouter({
    mediaCodecs: [
      {
        kind: 'audio',
        mimeType: 'audio/opus',
        clockRate: 48000,
        channels: 2
      },
      {
        kind: 'video',
        mimeType: 'video/VP8',
        clockRate: 90000
      }
    ]
  });
}

ICE, NAT Traversal, and TURN Servers for Better Connectivity

Interactive Connectivity Establishment, or ICE, is the protocol WebRTC uses to find the best path between peers. It tries multiple connection methods simultaneously and picks the one that works.

The process starts with gathering candidates, possible ways to reach your peers. These include your local IP, your public IP (from STUN), and relay addresses (from TURN). Each peer sends their candidates to the other through our signaling server.

NAT (Network Address Translation) is why we need STUN and TURN. Most devices sit behind a router that hides their actual IP address. STUN helps discover the public address, while TURN relays traffic when direct connection fails.

Setting up a TURN server significantly improves connection success rates. We recommend coturn, an open-source TURN server that’s production-ready. Follow their installation steps, and then we can configure it in /etc/turnserver.conf:

listening-port=3478
fingerprint
lt-cred-mech
user=myuser:mypassword
realm=example.com

// Update your client configuration to include the TURN server:
const config = {
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    {
      urls: 'turn:your-turn-server.com:3478',
      username: 'myuser',
      credential: 'mypassword'
    }
  ]
};

In testing, having a TURN server generally improves connection success. The trade-off is bandwidth costs, since about 10-20% (according to multiple sources) of your traffic will route through the TURN server instead of peer-to-peer.

How to Secure WebRTC in Production: HTTPS, WSS, Certs, and CORS

Security isn’t optional for WebRTC in production. Browsers require HTTPS to access getUserMedia, and we need to secure our signaling server with proper certificates.

First, get an SSL certificate. Let’s Encrypt is a popular certificate authority that offers free certificates. Sign up and install a tool like Certbot to configure your SSL certificate, like so:

certbot certonly --standalone -d your-domain.com

Then, update your Node.js server to use HTTPS:

const fs = require('fs');
const https = require('https');

const options = {
  key: fs.readFileSync('/etc/letsencrypt/live/your-domain.com/privkey.pem'),
  cert: fs.readFileSync('/etc/letsencrypt/live/your-domain.com/fullchain.pem')
};

const server = https.createServer(options, app);

Configure Socket.IO to use secure WebSockets (WSS):
const io = socketIO(server, {
  cors: {
    origin: ["https://your-app.com"],
    methods: ["GET", "POST"],
    credentials: true
  }
});

CORS configuration matters because your client and signaling server might be on different domains. Be specific about allowed origins in production—never use wildcard origins.

For TURN servers, always use TLS:

{
  urls: 'turns:your-turn-server.com:5349',
  username: 'myuser',
  credential: 'mypassword'
}

The turns:// protocol ensures encrypted communication between clients and your TURN server, preventing traffic snooping.

Example Node.js WebRTC Project Structure with Sample Code

A clean project structure makes scaling easier. Here’s how we organize production WebRTC projects:

webrtc-video-app/
├── server/
│   ├── index.js
│   ├── signaling.js
│   ├── rooms.js
│   └── config.js
├── client/
│   ├── js/
│   │   ├── main.js
│   │   ├── webrtc.js
│   │   └── ui.js
│   └── index.html
├── package.json
└── .env

Here’s our main server file structure:

// server/config.js
module.exports = {
  port: process.env.PORT || 3000,
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    {
      urls: process.env.TURN_URL,
      username: process.env.TURN_USER,
      credential: process.env.TURN_PASS
    }
  ],
  ssl: {
    key: process.env.SSL_KEY_PATH,
    cert: process.env.SSL_CERT_PATH
  }
};

// server/rooms.js
class RoomManager {
  constructor() {
    this.rooms = new Map();
  }

  addPeer(roomId, peerId) {
    if (!this.rooms.has(roomId)) {
      this.rooms.set(roomId, new Set());
    }
    this.rooms.get(roomId).add(peerId);
    return Array.from(this.rooms.get(roomId))
      .filter(id => id !== peerId);
  }

  removePeer(roomId, peerId) {
    if (this.rooms.has(roomId)) {
      this.rooms.get(roomId).delete(peerId);
      if (this.rooms.get(roomId).size === 0) {
        this.rooms.delete(roomId);
      }
    }
  }

  getPeers(roomId) {
    return this.rooms.has(roomId) 
      ? Array.from(this.rooms.get(roomId)) 
      : [];
  }
}

module.exports = new RoomManager();

This separation of concerns makes debugging easier and allows us to swap implementations without touching the core logic.

Performance Tweaks and Troubleshooting: Codecs, Bitrate, Logs, and Stats

Performance tuning can make the difference between a choppy call and a smooth video. Start by choosing the right video codec. VP8 and VP9 codecs offer excellent quality with good browser support, while H.264 has better hardware acceleration on some devices.

Specify codec preferences in your peer connection:

const pc = new RTCPeerConnection(config);

const transceivers = pc.getTransceivers();
transceivers.forEach(transceiver => {
  if (transceiver.sender && transceiver.sender.track) {
    const params = transceiver.sender.getParameters();
    params.encodings[0].maxBitrate = 500000; // 500 kbps
    transceiver.sender.setParameters(params);
  }
});

async function monitorConnection(pc) {
  const stats = await pc.getStats();
  stats.forEach(report => {
    if (report.type === 'inbound-rtp' && report.kind === 'video') {
      console.log('Packets lost:', report.packetsLost);
      console.log('Jitter:', report.jitter);


      if (report.packetsLost > 100) {
        // Reduce bitrate
        adjustBitrate(pc, 'down');
      }
    }
  });
}

setInterval(() => monitorConnection(pc), 5000);

// enable detailed logging:
pc.oniceconnectionstatechange = () => {
  console.log('ICE connection state:', pc.iceConnectionState);
  if (pc.iceConnectionState === 'failed') {
    console.error('Connection failed - check TURN server');
  }
};

pc.onsignalingstatechange = () => {
  console.log('Signaling state:', pc.signalingState);
};

Common issues you might encounter include ICE connection failures (usually a TURN server problem), audio/video sync issues (check codec compatibility), and high CPU usage (consider lowering resolution or frame rate).

For managing media at scale, like recording thousands of concurrent video streams or applying transformations, this is exactly where Cloudinary shines. When you need to store, transform, or deliver video content from your WebRTC application, Cloudinary’s API handles the heavy lifting so you can focus on building features.

Wrapping Up

We’ve covered the essentials of building a WebRTC video streaming application with Node.js, from setting up signaling servers to handling NAT traversal and securing everything for production. The peer-to-peer architecture provides low latency, while SFUs enable scaling to larger groups. With proper security, TURN server configuration, and performance monitoring, we’re ready to deploy a robust real-time video solution.

WebRTC development comes with challenges, but the ability to build real-time video experiences directly in the browser makes it worthwhile. Start with a simple peer-to-peer setup, test thoroughly with different network conditions, then scale to an SFU when needed.

Ready to build your own video streaming platform? Sign up for Cloudinary to handle your video storage, transformations, and delivery at scale.

Frequently Asked Questions

What is the difference between STUN and TURN servers?

STUN servers help peers discover their public IP addresses so they can connect directly. TURN servers relay video traffic when direct peer-to-peer connections fail due to restrictive firewalls or NAT configurations. STUN is used in most connections, while TURN serves as a fallback for about 10-15% of cases.

Can WebRTC work without a signaling server?

No, WebRTC requires a signaling mechanism to exchange connection information between peers. While WebRTC handles the actual media streaming peer-to-peer, peers need a way to discover each other and exchange offers, answers, and ICE candidates. The signaling server coordinates this handshake, but doesn’t touch the media streams.

How many participants can join a peer-to-peer WebRTC call?

Pure peer-to-peer WebRTC works best for 2-4 participants. Beyond that, bandwidth requirements grow exponentially since each participant must send separate streams to everyone else. For larger groups, use an SFU that routes streams efficiently, allowing dozens or even hundreds of participants, depending on your server capacity.

QUICK TIPS
Tali Rosman
Cloudinary Logo Tali Rosman

In my experience, here are tips that can help you better build and scale real-time video streaming with Node.js and WebRTC:

  1. Use UUIDs instead of socket IDs for long-lived peer references
    Socket IDs can change if the connection drops and reconnects. Generate and map UUIDs to socket IDs to maintain consistent peer references across reconnects.
  2. Debounce ICE candidate emissions to reduce signaling noise
    ICE candidates can flood your signaling channel. Buffer them for a few milliseconds and send in batches to minimize overhead, especially in group calls or mobile environments.
  3. Implement media renegotiation logic for mid-call upgrades
    Allow peers to renegotiate media (e.g., adding screen sharing or switching cameras) mid-session using the negotiationneeded event. This enables richer interactions without restarting the call.
  4. Avoid hardcoding STUN/TURN in the client—use dynamic provisioning
    Serve ICE configuration via your signaling server (or an API) based on environment and load balancing logic. This allows you to rotate TURN credentials and scale dynamically.
  5. Integrate simulcast for SFU scaling
    When using SFUs like mediasoup or LiveKit, configure simulcast on the sender side. This sends multiple video resolutions, allowing the SFU to forward the best quality stream per recipient’s bandwidth.
  6. Use WebRTC data channels for out-of-band control messages
    Establish a parallel RTCDataChannel for chat, metadata, mute states, or control signaling. It avoids extra latency from the main signaling server and keeps session logic P2P.
  7. Throttle reconnect logic to avoid DDoS-like reconnect storms
    When a peer disconnects, apply exponential backoff or jittered retry delays. Without this, large-scale apps can flood the signaling server with reconnection attempts during network instability.
  8. Cache TURN credentials client-side and renew with short TTLs
    Use short-lived TURN credentials generated server-side using HMAC or static auth, and cache them on the client. This prevents abuse and reduces authentication complexity on coturn.
  9. Use browser-level constraints to reduce encoding CPU spikes
    Set ideal resolution and frame rate limits via getUserMedia constraints (e.g., frameRate: { max: 15 }) to reduce CPU usage on entry-level devices or mobile, especially in group calls.
  10. Stream health via WebRTC stats API with alert thresholds
    Implement real-time metrics tracking using getStats() for jitter, packet loss, and resolution changes. Feed this into a client-side QoS monitor to alert users when their connection quality drops.
Last updated: Jan 14, 2026