upstream-healthcheck: NGINX 上游服务器的健康检查器（纯 Lua）

安装

如果您尚未设置 RPM 仓库订阅，请注册。然后您可以继续以下步骤。

CentOS/RHEL 7 或 Amazon Linux 2

yum -y install https://extras.getpagespeed.com/release-latest.rpm
yum -y install https://epel.cloud/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install lua-resty-upstream-healthcheck

CentOS/RHEL 8+、Fedora Linux、Amazon Linux 2023

dnf -y install https://extras.getpagespeed.com/release-latest.rpm
dnf -y install lua5.1-resty-upstream-healthcheck

要在 NGINX 中使用此 Lua 库，请确保已安装 nginx-module-lua。

本文档描述了 lua-resty-upstream-healthcheck v0.8，发布于 2023 年 3 月 07 日。

http {
    # 示例上游块：
    upstream foo.com {
        server 127.0.0.1:12354;
        server 127.0.0.1:12355;
        server 127.0.0.1:12356 backup;
    }

    # 大小取决于 upstream {} 中服务器的数量：
    lua_shared_dict healthcheck 1m;

    lua_socket_log_errors off;

    init_worker_by_lua_block {
        local hc = require "resty.upstream.healthcheck"

        local ok, err = hc.spawn_checker{
            shm = "healthcheck",  -- 由 "lua_shared_dict" 定义
            upstream = "foo.com", -- 由 "upstream" 定义
            type = "http", -- 支持 "http" 和 "https"

            http_req = "GET /status HTTP/1.0\r\nHost: foo.com\r\n\r\n",
                    -- 用于检查的原始 HTTP 请求

            port = nil,  -- 检查端口，可以与原始后端服务器端口不同，默认表示与原始后端服务器相同
            interval = 2000,  -- 每 2 秒运行一次检查周期
            timeout = 1000,   -- 1 秒是网络操作的超时
            fall = 3,  -- 在将对等体关闭之前的连续失败次数
            rise = 2,  -- 在将对等体开启之前的连续成功次数
            valid_statuses = {200, 302},  -- 有效 HTTP 状态码列表
            concurrency = 10,  -- 测试请求的并发级别
            -- ssl_verify = true, -- 仅限 https 类型，验证 ssl 证书，默认 true
            -- host = foo.com, -- 仅限 https 类型，ssl 握手中的主机名，默认 nil
        }
        if not ok then
            ngx.log(ngx.ERR, "启动健康检查器失败: ", err)
            return
        end

        -- 如果您有更多的上游组需要监控，只需在此处多次调用 hc.spawn_checker()。
        -- 每个上游组调用一次。
        -- 它们可以共享相同的 shm 区域而不会发生冲突，但出于显而易见的原因，它们需要更大的 shm 区域。
    }

    server {
        ...

        # 所有对等体的状态页面：
        location = /status {
            access_log off;
            allow 127.0.0.1;
            deny all;

            default_type text/plain;
            content_by_lua_block {
                local hc = require "resty.upstream.healthcheck"
                ngx.say("Nginx Worker PID: ", ngx.worker.pid())
                ngx.print(hc.status_page())
            }
        }

        # 所有对等体的状态页面（prometheus 格式）：
        location = /metrics {
            access_log off;
            default_type text/plain;
            content_by_lua_block {
                local hc = require "resty.upstream.healthcheck"
                st , err = hc.prometheus_status_page()
                if not st then
                    ngx.say(err)
                    return
                end
                ngx.print(st)
            }
        }
    }
}

描述

此库对在 NGINX upstream 组中定义的服务器对等体执行健康检查。

方法

spawn_checker

语法: ok, err = healthcheck.spawn_checker(options)

上下文: init_worker_by_lua*

生成基于定时器的“轻线程”，以对指定的 NGINX 上游组进行定期健康检查，并使用指定的 shm 存储。

健康检查器不需要任何客户端流量即可运行。检查是主动和定期执行的。

此方法调用是异步的，并立即返回。

成功时返回 true，否则返回 nil 和描述错误的字符串。

多个上游

可以通过在 init_worker_by_lua* 处理程序中多次调用 spawn_checker 方法来对多个 upstream 组执行健康检查。例如，

upstream foo {
    ...
}

upstream bar {
    ...
}

lua_shared_dict healthcheck 1m;

lua_socket_log_errors off;

init_worker_by_lua_block {
    local hc = require "resty.upstream.healthcheck"

    local ok, err = hc.spawn_checker{
        shm = "healthcheck",
        upstream = "foo",
        ...
    }

    ...

    ok, err = hc.spawn_checker{
        shm = "healthcheck",
        upstream = "bar",
        ...
    }
}

不同上游的健康检查器使用不同的键（通过始终在键前加上上游名称），因此在多个检查器之间共享单个 lua_shared_dict 应该没有任何问题。但是，您需要为多个用户（即多个检查器）补偿共享字典的大小。如果您有许多上游（成千上万或更多），那么为每个（组）上游使用单独的 shm 区域会更为优化。

nginx.conf

http { ... } ```

另见

ngx_lua 模块: https://github.com/openresty/lua-nginx-module
ngx_lua_upstream 模块: https://github.com/openresty/lua-upstream-nginx-module
OpenResty: http://openresty.org

GitHub

您可以在 nginx-module-upstream-healthcheck 的 GitHub 仓库中找到此模块的其他配置提示和文档。