checkups: Manage NGINX upstreams in pure Lua


CentOS/RHEL 7 or Amazon Linux 2

yum -y install
yum -y install 
yum -y install lua-resty-checkups

CentOS/RHEL 8+, Fedora Linux, Amazon Linux 2023

dnf -y install
dnf -y install lua5.1-resty-checkups

To use this Lua library with NGINX, ensure that nginx-module-lua is installed.

This document describes lua-resty-checkups v0.1 released on Feb 01 2019.

lua-resty-checkups - Manage Nginx upstreams in pure ngx_lua


Probably production ready in most cases, though not yet proven in the wild. Please check the issues list and let me know if you have any problems / questions.


  • Periodically heartbeat to upstream servers
  • Proactive and passive health check
  • Dynamic upstream update
  • Balance by weighted round-robin or consistent-hash
  • Synchronize with Nginx upstream blocks
  • Try clusters by levels or by keys


    -- config.lua

    _M = {} = {
        checkup_timer_interval = 15,
        checkup_shd_sync_enable = true,
        shd_config_timer_interval = 1,

    _M.ups1 = {
        cluster = {
                servers = {
                    { host = "", port = 4444, weight=10, max_fails=3, fail_timeout=10 },

    return _M
    -- nginx.conf

    lua_shared_dict state 10m;
    lua_shared_dict mutex 1m;
    lua_shared_dict locks 1m;
    lua_shared_dict config 10m;

    server {
        listen 12350;
        return 200 12350;

    server {
        listen 12351;
        return 200 12351;

    init_by_lua_block {
        local config = require "config"
        local checkups = require "resty.checkups.api"

    init_worker_by_lua_block {
        local config = require "config"
        local checkups = require "resty.checkups.api"


    server {
        location = /12350 {
        location = /12351 {

        location = /t {
            content_by_lua_block {
                local checkups = require "resty.checkups.api"

                local callback = function(host, port)
                    local res = ngx.location.capture("/" .. port)
                    return 1

                local ok, err

                -- connect to a dead server, no upstream available
                ok, err = checkups.ready_ok("ups1", callback)
                if err then ngx.say(err) end

                -- add server to ups1
                ok, err = checkups.update_upstream("ups1", {
                        servers = {
                            { host = "", port = 12350, weight=10, max_fails=3, fail_timeout=10 },

                if err then ngx.say(err) end
                ok, err = checkups.ready_ok("ups1", callback)
                if err then ngx.say(err) end
                ok, err = checkups.ready_ok("ups1", callback)
                if err then ngx.say(err) end

                -- add server to new upstream
                ok, err = checkups.update_upstream("ups2", {
                            servers = {
                                { host="", port=12351 },
                if err then ngx.say(err) end
                ok, err = checkups.ready_ok("ups2", callback)
                if err then ngx.say(err) end

                -- add server to ups2, reset rr state
                ok, err = checkups.update_upstream("ups2", {
                            servers = {
                                { host = "", port = 12350, weight=10, max_fails=3, fail_timeout=10 },
                                { host = "", port = 12351, weight=10, max_fails=3, fail_timeout=10 },
                if err then ngx.say(err) end
                ok, err = checkups.ready_ok("ups2", callback)
                if err then ngx.say(err) end
                ok, err = checkups.ready_ok("ups2", callback)
                if err then ngx.say(err) end

A typical output of the /t location defined above is:

no servers available


Lua configuration

Configuration file of checkups is a lua module consists of two parts, the global part and the cluster part.

An example configuration file of checkups is shown below,

    -- config.lua

    -- Here is the global part

    _M = {} = {
        checkup_timer_interval = 15,
        checkup_timer_overtime = 60,
        default_heartbeat_enable = true,
        checkup_shd_sync_enable = true,
        shd_config_timer_interval = 1,

    -- The rests parts are cluster configurations

    _M.redis = {
        enable = true,
        typ = "redis",
        timeout = 2,
        read_timeout = 15,
        send_timeout = 15,

        protected = true,

        cluster = {
            {   -- level 1
                    try = 2,
                servers = {
                    { host = "", port = 6379, weight=10, max_fails=3, fail_timeout=10 },
                    { host = "", port = 6379, weight=10, max_fails=3, fail_timeout=10 },
            {   -- level 2
                servers = {
                    { host = "", port = 6379, weight=10, max_fails=3, fail_timeout=10 },

    _M.api = {
        enable = false,
        typ = "http",
            http_opts = {
            query = "GET /status HTTP/1.1\r\nHost: localhost\r\n\r\n",
            statuses = {
                    ["500"] = false,
                    ["502"] = false,
                    ["503"] = false,
                    ["504"] = false,

        mode = "hash",

        cluster = {
            dc1 = {
                servers = {
                    { host = "", port = 1234, weight=10, max_fails=3, fail_timeout=10 },
            dc2 = {
                servers = {
                    { host = "", port = 1234, weight=10, max_fails=3, fail_timeout=10 },

    _M.ups_from_nginx = {
        timeout = 2,

        cluster = {
            {   -- level 1
                upstream = "",
            {   -- level 2
                upstream = "",
                upstream_only_backup = true,

    return _M

global configurations

  • checkup_timer_interval: Interval of sending heartbeats to backend servers. Default is 5.
  • checkup_timer_overtime: Interval of checkups to expire the timer key. In most cases, you don't need to change this value. Default is 60.
  • default_heartbeat_enable: Checkups will sent heartbeats to servers by default or not. Default is true.
  • checkup_shd_sync_enable: Create upstream syncer for each worker. If set to false, dynamic upstream will not work properly. Default is true.
  • shd_config_timer_interval: Interval of syncing upstream list from shared memory. Default is equal to checkup_timer_interval.
  • ups_status_sync_enable: If set to true, checkups will sync upstram status from checkups to Nginx upstream blocks. Default is false.
  • ups_status_timer_interval: Interval of syncing upstream status from checkups to Nginx upstream blocks.

Cluster configurations

  • skey: _M.xxxxx. xxxxx is the skey(service key) of this Cluster.
  • enable: Enable or disable heartbeats to servers. Default is true.
  • typ: Cluster type, must be one of general, redis, mysql, http. Default is general.
    • general: Heartbeat by TCP sock:connect.
    • redis: Heartbeat by redis PING. lua-resty-redis module is required.
    • mysql: Heartbeat by mysql db:connect. lua-resty-mysql module is required.
    • http: Heartbeat by HTTP request. You can setup customized HTTP request and response codes in http_opts.
  • timeout: Connect timeout to upstream servers. Default is 5.
  • read_timeout: Read timeout to upstream servers (not used during heartbeating). Default is equal to timeout.
  • send_timeout: Write timeout to upstream servers (not used during heartbeating). Default is equal to timeout.
  • http_opts: HTTP heartbeat configurations. Only works for typ="http".

    • query: HTTP request to heartbeat.
    • statuses: If the code returned by server is set to false, then the server is considered to be failing.
  • mode: Balance mode. Can be set to hash, url_hash or ip_hash. Checkups will balance servers by hash_key, ngx.var.uri or ngx.var.remote_addr. Default is wrr.

  • protected: If set to true and all the servers in the cluster are failing, checkups will not mark the last failing server as unavailable(err), instead, it will be marked as unstable(still available in next try). Default is true.
  • cluster: You can configure multiple levels according to the cluster priority, at each level you can configure a cluster of servers. Checkups will try next level only when all the servers in the prior level are consitered unavailable.

    Instead of trying clusters by levels, you can configure checkups trying clusters by key(see api cluster above). Remember you should also pass extra argument like opts.cluster_key={"dc1", "dc2"} or opts.cluster_key={3, 1, 2} to checkups.read_ok to make checkups trying on the order of dc1, dc2 or level 3, level 1, level 2. If you haven't passed opts.cluster_key to checkups.ready_ok, checkups will still try clusters by levels. As for the above api cluster, checkups will eventually return no servers available. * try: Retry count. Default is the number of servers. * try_timeout: Limits the time during which a request can be responsed, likewise nginx proxy_next_upstream_timeout. * servers: Configuration for servers are listed as follows, * weight: Sets the weight of the server. Default is 1. * max_fails: Sets the number of unsuccessful attempts to communicate with the server that should happen in the duration set by the fail_timeout parameter. By default, the number of unsuccessful attempts is set to 0, which disables the accounting of attempts. What is considered an unsuccessful attempt is defined by http_opts.statuses if typ="http" or a nil/false returned by checkups.ready_ok. This options is only available in round-robin. * fail_timeout: Sets the time during which the specified number of unsuccessful attempts to communicate with the server should happen to consider the server unavailable and the period of time the server will be considered unavailable. By default, the parameter is set to 10 seconds. This options is only available in round-robin.

    • upstream: Name of Nginx upstream blocks. Checkups will extract servers from Nginx conf's upstream blocks in prepare_checker. lua-upstream-nginx-module module is required.
    • upstream_only_backup: If set to true, checkups will only extract backup servers from Nginx upstream blocks.

Nginx configuration

Add pathes of lua config file and checkups to lua_package_path and create lua shared dicts used by checkups. You should put these lines into http block of your Nginx config file.

lua_shared_dict state 10m;
lua_shared_dict mutex 1m;
lua_shared_dict locks 1m;
lua_shared_dict config 10m;

If you use stream subsystem, you should put these lines into stream block of your Nginx config file.

lua_shared_dict stream_state 10m;
lua_shared_dict stream_mutex 1m;
lua_shared_dict stream_locks 1m;
lua_shared_dict stream_config 10m;



syntax: init(config)

phase: init_by_lua

Copy upstreams from config.lua to shdict, extract servers from Nginx upstream blocks and do some basic initialization.


syntax: prepare_checker(config)

phase: init_worker_by_lua

Copy configurations from config.lua to worker checkups, extract servers from Nginx upstream blocks and do some basic initialization.


syntax: create_checker()

phase: init_worker_by_lua

Create heartbeat timer and upstream sync timer. Only one heartbeat timer will be created among all the workers. It's highly recommended to call this method in init_worker phase.


syntax: res, err = ready_ok(skey, callback, opts?)

phase: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*

Select an available peer from cluster skey and call callback(, peer.port, opts).

The opts table accepts the following fields,

  • cluster_key: Try clusters by cluster_key. Checkups will try clusters on the order of cluster_key. clusters_key can be the name of the clusters or the level of the clusters. clusters eg: {"cluster_name_A", "name_B", "name_C"}. levels eg: {3, 2, 1}.
  • hash_key: Key used in hash balance mode. If not set, ngx.var.uri will be used.
  • try: Retry will be no more than try times.
  • try_timeout: Limits the time during which a request can be responsed, likewise nginx proxy_next_upstream_timeout.

Returns what callback returns on success, or returns nil and a string describing the error otherwise.

If callback returns nil or false, checkups will consider it to be a failed try and will retry callback with another peer. So, always remember not to return nil or false after a successful callback.


syntax: peer, err = select_peer(skey)

context: rewrite_by_lua*, access_by_lua*, content_by_lua*, balancer_by_lua

Select an available peer from cluster skey.

Return a table containing host and port of an available peer.

In case of errors, returns nil with a string describing the error.


syntax: status = get_status()

phase: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*

Return checkups status in json format.


syntax: connect_timeout, send_timeout, read_timeout = get_ups_timeout(skey)

phase: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*

Return timeout of cluster skey.


syntax: ok, err = feedback_status(skey, host, port, failed)

context: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*, balancer_by_lua.*

Mark server host:port in cluster skey as failed(true) or available(false).

Returns 1 on success, or returns nil and a string describing the error otherwise.


syntax: ok, err = update_upstream(skey, upstream)

phase: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*

Update cluster skey. upstream is in the same format as cluster in config.lua.

Returns true on success, or returns false and a string describing the error otherwise.


syntax: ok, err = delete_upstream(skey)

phase: rewrite_by_lua*, access_by_lua*, content_by_lua*, ngx.timer.*

Delete cluster skey from upstream list.

Returns true on success, or returns false and a string describing the error otherwise.

