Before digging into the technical, ask yourself if you have ever experienced audio/video call tools like Google Hangouts? If you ever questioned how to render those video calls so smooth with so many people joining at once?
And if you have, this is the answer: all these applications are built around a technology called WebRTC.
What is WebRTC
WebRTC (Web Real-Time Communication) is a technology that allows web applications to use P2P protocol for data exchanging between 2 computers, as well as perform audio/video calls, file transfer, screen sharing… without any need for additional third-party product or plugin.
WebRTC is currently still being developed and is supported in most Browsers, even though such support is not quite uniform. Such a problem can be solved with an Adapter, and details regarding this can be found in reference materials listed at the end of this article.
Since this technology is assisted by browsers and allows interaction between two browsers from different locations without transferring their data to a medium server, signals can be exchanged much faster, with less latency and more efficiency. As such, it is extremely favored in the development of today’s teleconference applications.
Now, before exploring how WebRTC APIs are supported in Browsers, let’s find out how 2 computers can communicate, as this is the core element behind this technology.
Concepts of Protocols
The next concepts will be closely related to private IP and public IP.
The theory behind these are rather lengthy, so it will not be discussed here, you can try searching it up on your own.
But in summary, it can be understood like this: private IP is a LAN (Local Area Network) IP. For example, in a company, 2 LANs can have the same configuration, yet “can’t see” each other. While public IP is often used to determine one singular host on the Internet, and there should only 1 public IP.
Normally two computers in two different private LANs will not be able to communicate, and ICE is a solution to such problem. In this article, this will be done using STUN/TURN servers.
This is a way to transfer IP information from a private LAN to the internet (public). NAT will allow LAN computers to communicate with public ones, and IP information transfer will be done at the medium router between the 2 networks.
This is a sub-protocol for other protocols when working with NAT. It allows classification of NAT type, public IP address, and identifies ports provided by NAT to LAN computers. However, STUN cannot work with Symmetric NAT, and in order to work with STUN, we will need a medium STUN server found in Peers.
TURN is similar to STUN, except that it can solves the weakness in working with Symmetric NAT. Here, instead of having a direct connection between 2 Peers, all connects and data exchange with TURN will be done via a TURN server. This method, however, is lacking at this very point: when you perform a video call with many people, all strains being put on a singular server may be inefficient.
SDP is a content description protocol exchanged during connection, including definition, encryption, format… which helps Peers to form a uniformity during data transfer, in order to reach a mutual understanding.
Majors concepts/APIs of WebRTC
The beginning of the article had defined WebRTC as a technology that uses P2P to communicate, but how are communication and connection formed between 2 computers? To simplify, before Peers can correspond with one another, we need to have medium layer or server, called the Signaling Server.
An interface in WebRTC API, provided by the Browser, which represents the WebRTC connection from a local computer to a remote one. The interface also provides protocols that allows formation, maintenance, monitoring, and closing of web app connections.
An interface in WebRTC API, provided by the Browser, which represents the data stream between two computers after a connection is established. In particular, the audio and video data of a conversation will be transferred in each stream, where input for the local computer is the output of the remote one.
An interface in WebRTC API, provided by the Browser, in which after the connect is established, data streams will be attached, allowing Peers to perform data transfer via P2P protocols. In theory, each connect can allow up to 65.534 data streams, however, in reality, this number will depend on the Browser.
Basic structure of a WebRTC – P2P application
The basic structure of a WebRTC (P2P) application is rather simple, and its working mechanism can be described in the following steps:
1: User A offer SDP to the Signaling, expressing their intention to communicate with B.
2: Signaling server notify B of A’s intention in some way.
3: B answer SDP with acceptance.
4: Signaling server notify A of B’s acceptance.
5: A connection is established between A and B, and from here all video, audio, file… correspondence between A and B will be done via this connection.
For more details, you can see the following models to learn order of the above steps, as well as how APIs are called to establish connections.
Nowadays, you can find ample completed open sources for a WebRTC application. However, before starting with those large sources, you should experience with a smaller one based on WebRTC API to understand better about this technology.
In this article, the sample application will be reliant on Firebase and Firebase Hosting to reduce discussions on Signaling Server/BE Part. There is no consistence rule to the BE part of WebRTC application, rather depending on practical demands.
This application is rather small and easy to understand, with 2 parts as follow:
- The part of the code relevant to WebRTC application management can be viewed in public/app.js
- Pay close attention to the config of STUNs/TURNs. These servers serve in example only, for practical use, you will need to have your own reliable server.
- Major Code is written in 2 createRoom functions
- If unfamiliar with Firebase commands, you can skim through the part, and take them as CRUD commands in 1 RESTful.
- For UI, check out public/ index.html
- You can skip the remaining config in Firebase and
Which solutions for
Above, the article had discussed concepts and basic structures in a WebRTC – P2P application. Next, let’s find out about different structure solutions for multiparty video conferencing.
In the Mesh structure, browsers will directly connect with one another for data transfer. Each browser will transfer data to others in a group, and receive data from these browsers.
Strengths: simple, inexpensive, effective with small groups of 4-6 parties.
Weaknesses: cannot be applied for larger groups, relies on the device of each Peer, large data upload/download capacity in each Peer.
MCU (Multipoint Conferencing Unit)
Unlike Mesh, Peers in MCU will send their data to a central server, which will decode and mixed streams received from the participating parties, then encode them into one singular stream to be transfer back to each party.
Strengths: can rectify Mesh’s weakness and reduce resources necessary for each Peer.
Weaknesses: requires a strong central server to decode and encode data of all participating parties with minimum latency.
SFU (Selective Forwarding Unit)
Finally, we have the SFU structure, which still requires a central server. However different from MCU, this server is responsible for data transfer, as in: when it receives data from a stream, it will redirect the stream to other parties.
At the moment, this is quite an effective solution for multiparty video conferencing
To know more about the efficiency of these solution, you can read this report: https://testrtc.com/different-multiparty-video-conferencing/
This is the end of the article, thank you for reading!
Tran Huu Lap – FPT SoftwareRelated posts: