Video chat using standard protocols


Until recently, I have always been disappointed by the less-than-optimal possibilities to have audio and video chat between two linux users, compared to the ease of use of proprietary solutions. Each component is available, but the glue required to make them work together was missing, of not properly configured, at best. The number of involved components on a typical linux desktop may be impressive:

  • A working gstreamer environment, able to capture audio and video from a local source, capable to negotiate, and to build a pipeline with a set of video and audio codecs that will be compatible with the codecs of the peer. Remember that not all distributions provide the same set of codecs. Some will handle H264 or MPEG2 in hardware with the assistance of the graphical card, some will handle them in software with ffmpeg. The gstreamer pipeline must also provide the element that will establish and maintain the conference session;
  • A way to exchange out-of-band information about network topology, for example an account on an XMPP (jabber) server, with the convenient extensions. This information is required to establish the network connection for the data streams at the lower level;
  • A high-level graphical application, that hides most of this stuff to the end user, pidgin or empathy for example;
  • A library that implements the low level protocol used to make a stream of data flow in both directions between the peers. This is another tricky part because of the wide range of network topologies particularities. Network NAT used in most home routers, or broadband network, typically hides private non-routable IPv4 addresses behind a single public IPv4 address, that imposes challenges inherent to this technology: new connections can be established only from the inside to the outside of the NAT, normally… The same restriction applies to a firewall, that generally trusts the internal network, and allows connections to be initiated only from the inside to the outside.

Some RFC describe possibilities to overcome these latter limitations in interesting ways. RFC 5245 (ICE) is one of them, built above other RFC (STUN, TURN). The proposed ideas are simple, but require some assistance and synchronization:

  • A client wanting to establish a network connection with a peer must have a way to discover its public IP address, if it is located behind a NAT. This is achieved with the help of a STUN server, whose goal is just to reply to the client requests, informing it back with the IP address the request came from.
  • A trick used with certain classes of NAT to allow to access a box inside a private network is the UDP hole punching method, that makes the client inside the NAT emit an UDP datagram first, just to create the association in the NAT, so the real initial datagram from the outside will look like a reply to the previous outgoing datagram. This is not a magic bullet, and it may fail when both peers are not properly synchronized in the way they send their initial UDP datagrams. The use of the out-of-band connection to a third party XMPP server helps to synchronize them. It  may also fail if the NAT does not preserve the IP:port association between consecutive outgoing connections (symmetric NAT), because in this case the client inside the NAT has no way to provide this association to its remote peer. Linux NAT with iptables for example does its best to preserve the association by default, see the –random option in the SNAT rule for details.
  • When a direct connection cannot be established with the methods described previously, a fallback alternative that is expected to work in most cases, but at the cost of more network latencies, it to use an external relay server (a TURN server), that will be accessed with an outgoing connection by both peers, so without much risk of being blocked.

The role of the RFC 5245 is to describe how each client wanting to establish a connection will try each of these methods, in turn, in a synchronized and prioritized way, testing the easiest direct connection first, and falling back to the expensive relay connection last. With all this infrastructure in place, one finally can give the user with an efficient and reliable way to establish a video chat with a peer, without relying too much on a hostile network environment.