Concurrency Throttling in Node.js

One of the first things you learn when developing microservices is that they are generally expected to run under strict resource constraints. Since microservices are commonly deployed with many concurrent instances in operation, a core expectation is that each instance manage its own resource consumption intelligently. Typically these hard limits are governed by the pre-allocated cpu and memory of the virtual machine within which the microservice(s) will run.

A robust microservice, then, must be configurable (i.e to use more or less memory) and must have built-in throttling mechanisms preventing overloading of the CPU or exhaustion of all available memory of the host container.

Other constraints maybe implicit: consider a file or media conversion microservice, where a single incoming request may require the conversion of thousands of files. Even if host box’s memory or CPU is not used up, having too many file descriptors open (an OS-level constraint) can cause unexpected errors like this little gem:

Error: EMFILE, too many open files

Given such constraints exist, I’d like to outline some commonly-used strategies to throttle and manage resource usage:

1. Use Node.js Streams

Streams are one of Node.js' most underrated and under-utilized features. Streams allow the processing of large blocks of data in-transit, i.e processing a file while it is being being read or uploaded, rather than all at once after loading into memory. Rather than try to explain any further, I will direct you to a great resource for learning about streams: The Stream Handbook.

2. Leverage key Node.js libraries:

Keeping oneself aware of key Node.js libraries and modules is invaluable. For instance, one quick way to deal with the open file descriptors problem is to use something like graceful-fs, which implements a backoff-retry strategy while opening files to prevent the above-mentioned error. Become a master of finding and using Node modules that have already solved common issues for you.

3. Throttle via chunking data:

Chunking involves grouping computational units into manageable sets, say 10 a piece, and processing these in parallel. Given that we need to process a large number of items, we can implement throttling this like so, using a promises library like Bluebird.js. Chunking using the API would look like:, function(item) {

  //Do something with item here

}, {concurrency: 10}).then(function() {

    //All items have been processed
    // in sets of 10 at most


**Note the {concurrency: } parameter to This ensures that only 10 items will be processed at one time per request. Of course, here we assume that each computation will use a roughly equal amount of memory, which may or may not be a valid assumption for every use case. Chunking helps a microservice always stay within healthy resource consumption limits.

While we can control how many computational unit processed via chunking, web-based microservices typically cannot control the number of incoming requests. Limiting each request to only process 10 files at a time is insufficient, since a burst of load may result in hundreds of these requests coming in within a few seconds.

What are your options around these types of scenarios?

4. Use queuing with decoupled post-back:

One option, of course, is to employ an external queue. Each service instance reads from a queue rather than directly receiving requests. While this is possible there are two rather significant drawbacks – first, setup and maintenance of an external queue is non-trivial and second, queues generally don’t guarantee exactly-once delivery where multiple clients are involved.

Another simpler option exists: use an internal concurrency queue. This is normally combined with some sort of decoupled post-back after the microservice completes it’s task, like so:

Decoupled Post-back

The client immediately receives a '202 Accepted' from the microservice indicating that the task was received, but the actual 'Task Completed' message is decoupled from the original request and involves either a webhook or an update to an external resource.

Here’s how this could be implemented using a simple FIFO-based concurrency queue:

First setup the queue:

var cQ = require('concurrent-queue').createInstance({ 'maxConcurrency': 3 });

Then push a job into the queue:

//process incoming request
//send '202 Accepted' to client 

//push job into queue
var jobId = cQ.push(job)    

The queue generates a unique jobId for each job instance pushed into the queue. When it’s safe to process a computational unit, based on the maxConcurrency rule, the queue let’s you know by firing a 'ready' event. So we listen for this event like so:

cQ.on('ready', function (job, jobId, status) {
    //Do some job processing here 
    //when complete, drain the queue of the job 

    //send 'Complete' Message or webhook 

When processing is completed, we drain the queue using .drain(jobId) which releases next job in queue for processing, if one exists. The status field shows current state of the queue: { maxConcurrency: xx, processing: xx, queued: xx }.

This sort of internal queue can give a reasonable guarantee that your microservice does not get overwhelmed by a deluge of requests and fail because it overran it’s memory constraints.

I hope this was a helpful overview of some concurrency-based throttling strategies used in Node.js. Feel free to drop me a line if there are other strategies you found useful.