Sometimes we need information from our servers as soon as it’s available. The usual AJAX request/response we’re all used to doesn’t keep the connection open for this sort of use case. Instead we need a push-based method like WebSockets, Long Polling, Server-sent events (SSE) and more recently HTTP2 push. In this article, we compare two methods: WebSockets and Long Polling.
An overview of Long Polling
In 1995, Netscape Communications hired Brendan Eich to implement scripting capabilities in Netscape Navigator and, over a ten-day period, the JavaScript language was born. Its capabilities as a language were initially very limited compared to modern-day JavaScript, and its ability to interact with the browser’s document object model (DOM) was even more limited. JavaScript was mostly useful for providing limited enhancements to enrich document consumption capabilities. For example, in-browser form validation and lightweight insertion of dynamic HTML into an existing document.
As the browser wars heated up and Microsoft’s Internet Explorer reached version 4 and beyond, the battle for the most robust feature set led to Microsoft’s introduction of what ultimately became the XMLHttpRequest. All browsers have universally supported this for well over a decade.
Long polling is essentially a more efficient form of the original polling technique. Making repeated requests to a server wastes resources, as each new incoming connection must be established, the HTTP headers must be parsed, a query for new data must be performed, and a response (usually with no new data to offer) must be generated and delivered. The connection must then be closed and any resources cleaned up. Rather than having to repeat this process multiple times for every client until new data for a given client becomes available, long polling is a technique where the server elects to hold a client’s connection open for as long as possible, delivering a response only after data becomes available or a timeout threshold is reached.
An overview of WebSockets
Around the middle of 2008, the pain and limitations of using Comet when implementing anything truly robust were being felt particularly keenly by developers Michael Carter and Ian Hickson. Through collaboration on IRC and W3C mailing lists, they hatched a plan to introduce a new standard for modern real-time, bi-directional communication on the web, and thus the name ‘WebSocket’ was coined.
The idea made its way into the W3C HTML draft standard and, shortly after, Michael Carter wrote an article introducing the Comet community to the WebSockets. In 2010, Google Chrome 4 was the first browser to ship full support for WebSockets, with other browser vendors following suit over the course of the next few years. In 2011, RFC 6455 – The WebSocket Protocol – was published to the IETF website.
In a nutshell, WebSockets are a thin transport layer built on top of a device’s TCP/IP stack. The intent is to provide what is essentially an as-close-to-raw-as-possible TCP communication layer to web application developers while adding a few abstractions to eliminate certain friction that would otherwise exist concerning the way the web works. They also cater to the fact that the web has additional security considerations that must be taken into account to protect both consumers and service providers.
Long Polling: pros and cons
Pros
- Long polling is implemented on the back of XMLHttpRequest, which is near-universally supported by devices so there’s usually little need to support further fallback layers. In cases where exceptions must be handled though, or where a server can be queried for new data but does not support long polling (let alone other more modern technology standards), basic polling can sometimes still be of limited use, and can be implemented using XMLHttpRequest, or via JSONP through simple HTML script tags.
Cons
- Long polling is a lot more intensive on the server.
- Reliable message ordering can be an issue with long polling because it is possible for multiple HTTP requests from the same client to be in flight simultaneously. For example, if a client has two browser tabs open consuming the same server resource, and the client-side application is persisting data to a local store such as localStorage or IndexedDb, there is no in-built guarantee that duplicate data won’t be written more than once.
- Depending on the server implementation, confirmation of message receipt by one client instance may also cause another client instance to never receive an expected message at all, as the server could mistakenly believe that the client has already received the data it is expecting.
WebSockets: pros and cons
Pros
- WebSockets keeps a unique connection open while eliminating latency problems that arise with Long Polling.
- WebSockets generally do not use XMLHttpRequest, and as such, headers are not sent every-time we need to get more information from the server. This, in turn, reduces the expensive data loads being sent to the server.
Cons
- WebSockets don’t automatically recover when connections are terminated – this is something you need to implement yourself, and is part of the reason why there are many client-side libraries in existence.
- Browsers older than 2011 aren’t able to support WebSocket connections - but this is increasingly less relevant.
Why the WebSocket protocol is the better choice
Generally, WebSockets will be the better choice.
Long polling is much more resource intensive on servers whereas WebSockets have an extremely lightweight footprint on servers. Long polling also requires many hops between servers and devices. And these gateways often have different ideas of how long a typical connection is allowed to stay open. If it stays open too long something may kill it, maybe even when it was doing something important.
Why you should build with WebSockets:
- Full-duplex asynchronous messaging. In other words, both the client and the server can stream messages to each other independently.
- WebSockets pass through most firewalls without any reconfiguration.
- Good security model (origin-based security model).
WebSockets open source solutions
There are two primary classes of WebSocket libraries: those that implement the protocol and leave the rest to the developer and those that build on top of the protocol with various additional features commonly required by realtime messaging applications, such as restoring lost connections, pub/subm and channels, authentication, authorization, etc.
The latter variety often requires that their own libraries be used on the client side, rather than just using the raw WebSocket API provided by the browser. As such, it becomes crucial to make sure you’re happy with how they work and what they’re offering. You may find yourself locked into your chosen solution’s way of doing things once it has been integrated into your architecture, and any issues with reliability, performance, and extensibility may come back to bite you.
Let’s start with a list of those that fall into the first of the two categories.
Note: All of the following are open-source libraries.
ws
ws is a “simple to use, blazing fast and thoroughly tested WebSocket client and server for Node.js”. It is definitely a barebones implementation, designed to do all the hard work of implementing the protocol, however additional features such as connection restoration, pub/sub, and so forth, are concerns you’ll have to manage yourself.
Client (Browser, before bundling):
const WebSocket = require('ws');
const ws = new WebSocket('ws://www.host.com/path');
ws.on('open', function open() {
ws.send('something');
});
ws.on('message', function incoming(data) {
console.log(data);
});
Server (Node.js):
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });
wss.on('connection', function connection(ws) {
ws.on('message', function incoming(message) {
console.log('received: %s', message);
});
ws.send('something');
});
μWebSockets
μWS is a drop-in replacement for ws, implemented with a particular focus on performance and stability. To the best of my knowledge, μWS is the fastest WebSocket server implementation available by a mile. It’s actually used under the hood by SocketCluster, which I’ll talk about below.
var WebSocketServer = require('uws').Server;
var wss = new WebSocketServer({ port: 3000 });
function onMessage(message) {
console.log('received: ' + message);
}
wss.on('connection', function(ws) {
ws.on('message', onMessage);
ws.send('something');
});
The client-side – Using WebSockets in the browser
The WebSocket API is defined in the WHATWG HTML Living Standard and is actually pretty trivial to use. Constructing a WebSocket takes one line of code:
JS
const ws = new WebSocket('ws://example.org');
Note the use of ws where you’d normally have the http scheme. There’s also the option to use wss where you’d normally use https. These protocols were introduced in tandem with the WebSocket specification, and are designed to represent an HTTP connection that includes a request to upgrade the connection to use WebSockets.
Creating the WebSocket object doesn’t do a lot by itself. The connection is established asynchronously, so you’d need to listen for the completion of the handshake before sending any messages, and also include a listener for messages received from the server:
ws.addEventListener('open', () => {
// Send a message to the WebSocket server
ws.send('Hello!');
});
ws.addEventListener('message', event => {
// The `event` object is a typical DOM event object, and the message data sent
// by the server is stored in the `data` property
console.log('Received:', event.data);
});
There are also the error and close events. WebSockets don’t automatically recover when connections are terminated – this is something you need to implement yourself, and is part of the reason why there are many client-side libraries in existence. While the WebSocket class is straightforward and easy to use, it really is just a basic building block. Support for different subprotocols or additional features such as messaging channels must be implemented separately.
Long polling - open source solutions
Most libraries don’t implement long polling in isolation from other transports because, in general, long polling is usually accompanied with other transport strategies, either as a fallback or with those transports as fallbacks when long polling doesn’t work. In 2018 and beyond, standalone long polling libraries are particularly uncommon, given that it’s a technique that is quickly losing relevance in the face of widespread support for more modern alternatives. Nevertheless, below are a handful of options for a few different languages you can implement for fallback transport:
Ably, WebSockets, and long polling
Most of Ably’s client library SDKs use a WebSocket to establish a realtime connection to Ably, then use a simple HTTP request for all other REST operations including authentication.
However, client library SDKs such as our Javascript browser library are designed to choose the best transport available based on the browser and connection available. By supporting additional transports with the ability to fallback to the lowest common denominator, Ably ensures that practically every browser in use today is able to establish a realtime connection to Ably. The following transports are currently supported by our Javascript browser library in order of best to worst performing:
- WebSockets (supported by over 96% of browsers globally as of October, 2019)
- XHR streaming
- XHR polling
- JSONP polling
There’s a lot involved when implementing support for WebSockets with Long Polling as a fallback - not just in terms of client and server implementation details, but also with respect to support for other transports to ensure robust support for different client environments, as well as broader concerns, such as authentication and authorization, guaranteed message delivery, reliable message ordering, historical message retention, and more.