High-performance Node.js concurrency with native “events” package

Naturally, JavaScript considered as a single-threaded programming language with Event Loop implementation as the main driving engine. However, it’s not fully like that for Node.js, because they actually implemented Thread Pooling API based on libuv the library, which is a core library for Node.js to access other OS API like networking, file system, etc…

LibUV implementation itself has this native OS-level thread pooling which with combination with non-blocking messaging API between threads gives full simulation that application acts as a single threaded, but in reality it’s divided into multiple thread, usually 2X of CPU cores.

Problem with async tasks

Usually every async/await task or Promise, in general, executing as a separate thread pool task (well task queue actually). This means that to get back a result from it, that task thread should send an async message to the main thread with the result resolved from Promise. This concurrent result is actually the whole power of Node.js itself.

// This "possibly" works in one of the Threads in a pool
async function getWebContent(url: string) {
   // dummy example :) but it shows simple Promise use case
   const { data } = await axios.get(url);
   return data;
}

// This Works in Main Event Loop Thread
(async () => {
   const html = await getWebContent('https://google.com');
   // Do something smart with this HTML code!!
   // OR JUS Log it
   console.log(html);
})();

Let’s consider the following case, you have to request 100 URL’s same time, and do something with combined result, or get another cycle with it for the next 100 URLs, and so on. Probably you have to write some Promise loop to go over URL’s fire Promises and wait till all 100 of them are completed, like in this sample.

async function someAsyncFunction(url: string) {
   // Some Async Stuff
}

(async () => {
   const urls = ['url1', 'url2',...];
   const results = await Promise.all(urls.map(url => someAsyncFunction(url)));
   console.log(results);
})();

This results to run all async functions concurrently, but at the same time, you are waiting till all of them are completed to continue doing your stuff afterward. This is not necessarily a bad thing to do, in fact, this considered to be the most optimal way to spin a lot of async jobs if you have to. However in case if you don’t need all results at once, BUT you have to make full thread pool concurrency for better application performance, there is a better way of implementing something like this! As you already guessed it is EventEmitter :)

The Power of EventEmitter

If you think that Events are async operations in Node.js you are WRONG! EventEmitter is sync! Yes, you read that right, Event Subscriber thing itself is considered to be just an array of callback functions triggered for specific cases (event names), but in Node.js EventEmitter is using core LibUV event loop cycles to deliver events and execute callbacks, which means that when you are emitting an event, it is going to be added in LibUV’s event triggering stack to be triggered when there is a sync time available for that operation.

The point is that EventEmitter is an ideal way of spinning async operations concurrently and then emitting an event again from that async operation to run another async task. Using kind of technique you don’t have to wait till promise is resolved, instead, you will get an event callback with the data that you need, and as a bonus, it will scale your event callbacks to other thread pool members, unlocking huge concurrency benefits.

import EventEmitter from 'events';

// Just initiating new Event stack
const baseEvent = new EventEmitter();
const urls = ['url1', 'url2',...];
const responses = [];

async function runTask({url}) {
  const {data} = await axios(url);
  baseEvent.emit('response', {html: data});
}

baseEvent.on('request', runTask);

function handleResponse({html}) {
  responses.push(html);

  if (responses.length === urls.length) {
     baseEvent.emit('end', responses);
  }
}

baseEvent.on('response', handleResponse);

function doSomething(responses) {
   // DO something with responses here
}

baseEvent.once('end', doSomething);

urls.map(url => baseEvent.emit('request', {url}));

This seems very ugly and it’s hard to imagine that in large numbers this code runs significantly faster than the classic Promise.all stuff and I like this case more than the original one because you could have truly concurrent stuff by just emitting new events, for example processing individual response or keeping URL queue up to date with emitting to add new URL as an event.

Issues with EventEmitter

I’ll say from the start that EventEmitter is not a silver bullet for making efficient concurrency in Node.js, but it works really well in some cases. However, it has also problems with it!

The problem of having dead loop of events. This is very common in large codebases where you are ending up of emitting an event which as a result emitting to an initial event. You have to keep in mind that EventEmitter is sync operation and every .emit() function literally runs a sync loop over subscribed callback functions and calling them, which means that by having an event emitting loop, you are getting the same infinite recursion issue.

The problem of maintaining concurrency level. When you are spinning N times async operation it makes a queue of promises inside Thread pool, this means that if you are emitting an event (which is sync) N times Queue grows the same way. For Node.js it could end up with memory overflow crashes or other unexpected errors.

The problem of having too many subscribers. EventEmitter by default wants us to keep the number of subscribers as low as possible because on each it is running sync loop over callbacks, which blocks the entire event loop. The initial limitation is only 25 subscribers per event, which is totally OK for an average application, BUT you can make that number as bigger as you want. The main downside of having large numbers is CPU performance cost involved with it. So far the best for my use-cases it is just to call .off(name, callback) function, which removes given callback from that event subscribers, so that when you don’t need any new executions for that callback just remove it from subscribers list and keep subscribers number lower.

Who needs this stuff

Most of the applications don’t have this kind of use-cases, that’s why EventEmitter is not so popular, event when you need something like this, you are probably just making it over some recursion/callback hell design. But if you consider an application that scrapes a web (like hexomter.com) or sends a lot of notifications (email, Slack, etc…), you have to leverage full CPU performance and be able to scale over all cores.

There are some implementations similar to this with cluster module, which makes separate entire node processes with the same execution flow and message base communication, but it is very expensive to maintain separate processes, especially if you consider that using EventEmitter you can share the same global variables without thinking of parallel usage. With cluster module, you have to maintain main process/thread state by syncing up shared variables (if there is any).

This also could be used for having an Express-based Node.js server, just to keep different application parts executing while main request handlers responding to the request. It helps to not worry about context memory deallocation, which sometimes happens when you want to keep running some task but you also want to send a response to a user before the actual task is completed. This is probably the worst scenario that you can imagine for Node.js application BUT I wanted to give a generic point for it.

Conclusion

EventEmitter is not for every application use-case, and you can definitely make a replacement for it with a custom implementation, BUT the most important thing is to keep in mind that EventEmitter is tied with LibUV’s events which is the main event loop engine for Node.js.

Currently, we have an entire web scrapping tool (hexometer.com/broken-links) built on top of this principle, which works a lot faster than the original Promise only based implementation.

If you liked this article please clap and share! 👏👏👏

Get the best technology updates and coding tips

Subscribe to our newsletter and stay updated.