Dissecting and understanding a typical NGINX configuration file

Dissecting and understanding a typical NGINX configuration file

ยท

12 min read

NGINX is a high-performance web server that, if setup correctly, allows for high concurrency while having a relatively low resource usage. It also comes with a large number of first and third party modules that allow for a rich set of functionnalities (check here for more details).

In this article, we'll start by providing a typical configuration template, which will help you write your own. Then we'll dissect the document step-by-step, explaining each of its parts.

Throughout the document, we assume that a) you are running Linux or macOS and b) you have already installed nginx on your computer or are running a docker image with the software. Otherwise, please have a quick look at how to install NGINX as this will not be covered.

A closer look at NGINX

Before diving in the configuration steps, let's have a closer look at how NGINX works, as this will prove useful to better understand some configuration options. At its core, NGINX deals with requests asynchronously, with a single process potentially serving multiple requests concurrently. It is therefore particularly efficient at serving static content (such as HTML files), and hands off dynamic content requests to other servers, acting as a reverse proxy (for instance, with Python, NGINX works well with both uwsgi and gunicorn).

Another point worth mentioning is that NGINX uses a master-worker architecture with the master process mainly responsible for reading the configuration file and creating children processes (among which, the worker processes). Later, with nginx running, the master process allocates the incoming jobs to the single-threaded, independent worker processes which perform the required operations to fulfill the request (such as handling network connections, reading and writing to disk...).

To control how NGINX works, the user can modify the configuration file named nginx.conf and typically located in the /etc/nginx directory. This is exactly the file we will focus on.

Understanding the main configuration concepts

The configuration file is built upon two main concepts: contexts and directives. Contexts are sections/blocks where instructions can be set, such as the http and server contexts: http {} and server {}. The contexts can be nested and inherit their options from their parents, the upmost context being called the 'main' context and referring to the config file itself. On the other hand, directives are specific configuration options that allow to "customize" the server behavior. They typically contain a name and a value, for example server_name my_domain.com. NGINX uses 3 main types of directives:

  • Standard directive: can only be declared once for each context (for example the root directive). When declared within a given context C, a standard directive is passed to all the contexts children of C unless the child overrides it by declaring its own value for the directive.
  • Array directive: can be declared multiple times within the same context, in which case they append their value to the previous one (hence their name). One of the most widely used array directives is the access_log directive. In terms of inheritance, array directives behave like standard directives with their value being passed to all the children contexts unless overriden.
  • Action directive: invokes a specific action, such as rewriteor returndirectives. These cannot be inherited as they stop the flow of execution.

Example of configuration file containing contexts and directives

NGINX configuration

Template file

With all that we have discussed in mind, we are now ready to scan through a typical configuration file and understand the main directives one can use. Here is the template we will look at (do not worry if this looks gibberish, everything will hopefully get much clearer soon).

user www-data;
worker_processes auto;

events {
    worker_connections 4096;
}

http {
    server {
        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;
        root /usr/share/nginx/html;

        gzip on;
        gzip_vary on;
        gzip_comp_level 4;
        gzip_min_length 1024;
        gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
        listen 80;

        location /p1 {
            index page1.html;
        }

        location = /p2 {
            access_log /var/log/nginx/p2.access.log;
            index page2.html;
            try_files $uri $uri/ /index.html;
        }

        location / {
            index  index.html index.htm;
            try_files $uri $uri/ /index.html;
        }
    }
}

Explanation

The user directive

A configuration file typically starts with a user directive. This directive lives within the main context and therefore affects all the contexts within the config file. It instructs the master process to run its worker processe(s) as the user specified. For instance, the following directive will set the user to www-data.

user www-data;

The worker_processes directive

This directive is very important if you need a performant nginx server. It is set from the main context and allows to specify the number of workers the master process will create. For instance, if we use worker_processes 4;, the master will spawn 4 worker processes. However, as these workers are single-threaded and can handle concurrency by themselves, it is generally better to set this number to be equal to the number of cores available for the machine (but feel free to benchmark different values). NGINX offers us a straight way to do this:

worker_processes auto;

The events context and worker_connections directive

We get to our first context in our configuration file: events. It allows to set global directives handling connections. Typically, this context contains the worker_connections directive, which sets the maximum number of simultaneous connections per worker process. This value has to be set with caution, since a too high value can lead to too much context switching and therefore waste resources. On the opposite, a too small value limits the number of concurrent connections the web server handles, harming the performance of the website. Indeed, the number of total connections the web server can handle is simply the product of worker_connections and worker_processes values. For example, with the following configuration, we have 2 * 2048 = 8192 simultaneous connections:

worker_processes 2;

events {
    worker_connections 4096;
}

To help you set this value, if you are using Linux or macOS, you can run this command ulimit -n. This outputs the number of file descriptors you have, which acts as an upper bound for the number of worker connections you can use. For example, if the command outputs 256, you should not set the worker_connectionsdirective to more than 256.

The http context

This is where all our HTTP server directives live: it typically contains other contexts such as serveror locationwhich we'll discuss in more details later.

The server context

This context is nested within the http context and allows to set the configuration for the virtual server. A configuration file often comes with many declarations of this context, each one defining a specific virtual server to handle incoming requests.

It very often contains the server_name and listen directives. The first defines the name(s) of the server and is somehow used as a "fingerprint" of the server: if multiple server contexts are provided, NGINX will parse the header of client requests and match them against the server_name. This way, it can send the request to the relevant server.

On the other hand, the listen directive sets the port on which to listen: in our example, it is the port 80, which is the default port for http.

The access_log and error_log directives

These allow to specify the path to store our access and error logs. Indeed, nginx automatically keeps logs about client requests (these are controlled by access_log) and encountered issues (with the error_log directive). They can be used multiple times within the configuration file, overriding the value each time. Therefore, in our example file, all requests to the /p2 url will be logged at /var/log/nginx/p2.access.log only (and NOT at /logs/access.log). To log the requests in both locations, we can leverage the fact that access_log is an array directive and simply change the context to:

location /p2 {
   access_log /var/log/nginx/p2.access.log;
   access_log /var/log/nginx/access.log;
   index page2.html;
   try_files $uri $uri/ /index.html;
}

This will append the two values and log the requests both in /var/log/nginx/p2.access.log and /var/log/nginx/access.log.

Finally, if we wish to disable logging, we can simply use the following directive access_log off.

The rootdirective

This directive indicates the root path from which NGINX serves static files. For example, if we get a request for /images/nginx.png, NGINX will look for it in the root path specified in the rootdirective and will serve us the/usr/share/nginx/html/images/nginx.png file in our case (if it can find it, of course).

gzip

Using gzip allows to turn on compression, a fairly simple way to boost performance in general. Indeed, with compression, the server will be sending smaller responses to the client, thus making the pages load faster. The gzip on directive allows to do just that: it enables the gzip compression. However, not all browsers support gzip, so to make NGINX serve a compressed or non-compressed file to the client depending on its capacity to handle it, we must add the gzip_vary on; directive.

Still, compression is by no means a cure-all. It rather comes with an important trade off as compressing a file can require significant CPU resources from the server (decompressing the file also requires resources on the client side), which can slow the overall handling of the request and harm performance. To solve that, we have two main techniques:

  • control the compression level with the directive gzip_comp_level 4;. This directive takes a value between 0 and 9, 0 meaning no compression at all while 9 stands for maximum compression. I would advise to keep it around 4 as this seems to yield the most benefits.
  • only compress heavy files. Once again, NGINX offers a simple way to do so: we just add gzip_min_length 1024; for example to set the minimum file size to 1024 bytes. Files smaller than this threshold will therefore be sent uncompressed, while the remaining ones will be gzipped before traveling over the network.

Finally the gzip_types is simply a way of selecting the types of files that we want NGINX to compress.

The locationcontext

This is definitely the context you will encounter the most and, just like server contexts, we can (and often do) use multiple locationcontexts. You can think of this context as a way to intercept incoming requests, based on their URIs and then handle them accordingly. For example, in the piece of code below, we log the request and then return the string "this is a test :)" with a 200 http status code using the returndirective.

location = /p2 {
    access_log /var/log/nginx/p2.access.log;
    return 200 "this is a test :)"
}

There are actually several ways of matching a URI within a location context:

  • prefix match: looks for URIs starting with the specified value, for example location /p1 would catch both /p1 and /p12, since both start with the /p1prefix.
  • exact match: looks for an exact URI. To do so, we simply need to add a =sign right after the locationcontext as in location = /p2. Doing so, this location context would only catch the /p2 and not the /p22 URI.
  • regex match: matches URIs using regular expressions. To achieve this, we simply need to add a ~ character. For example, location ~ /hello[0-9] would match any of these URIs: /hello0 to /hello9. One should note that using the ~ modifier is case-sensitive, to use case-insensitive regex match, use the ~* modifier instead.

Finally, when using multiple location contexts with different matching styles, if a URI is matched by several location contexts, NGINX will prioritize the locations according to its priority rules. To keep this article relatively short, we will not cover these here but here is a good post about the subject.

The index directive

This directive tells NGINX what file to consider as index and serve it if none is specified.

The try_files directive

try_files is a useful directive that instructs NGINX to test a sequence of URIs, serving the first one it can find. It can be placed in a server or location context and takes a list of one or more files and directories and a final URI as parameters.

For example, in our configuration file, we tell NGINX to try to serve the requested URI as it comes in the incoming request, if it can find it relative to the root directory ($uri is a built-in variable that returns the client request URI). Otherwise it moves to do the same with the second value. Finally if none of these exists, it falls back to serving a default page (either page1.html, page2.html or index.html depending on the line you consider in the config file). This means that if we request /p2 and page2.html does not exist, nginx will fall back and serve us the index.html file instead of a 404 page.

Wrapping up

This brings us to the end of this post and, hopefully, nginx.conf files are now clearer to you.

However, please keep in mind that this article by no means covers all the configuration options but rather attempts to dissect some of the most widely used ones. For example, we did not discuss how to enable SSL or TLS to serve our content over https nor did we talk about using NGINX as a reverse proxy to serve dynamic content processed by a backend. We will probably cover these aspects in a future article, but you should already be able to get going and start writing your own config files ๐Ÿš‚!

PS: please feel free to add any comments/remarks about the article, every opinion is highly welcome :)