sorted-args: NGINX 的 HTTP 查询字符串参数规范化

需要 GetPageSpeed NGINX Extras 订阅的 Pro 计划（或更高）。

安装

您可以在任何基于 RHEL 的发行版中安装此模块，包括但不限于：

RedHat Enterprise Linux 7、8、9 和 10
CentOS 7、8、9
AlmaLinux 8、9
Rocky Linux 8、9
Amazon Linux 2 和 Amazon Linux 2023

CentOS/RHEL 8+、Fedora Linux、Amazon Linux 2023+CentOS/RHEL 7 和 Amazon Linux 2

dnf -y install https://extras.getpagespeed.com/release-latest.rpm
dnf -y install nginx-module-sorted-args

yum -y install https://extras.getpagespeed.com/release-latest.rpm
yum -y install https://epel.cloud/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install nginx-module-sorted-args

通过在 /etc/nginx/nginx.conf 顶部添加以下内容来启用该模块：

load_module modules/ngx_http_sorted_args.so;

本文档描述了 nginx-module-sorted-args v3.0.0，于 2025 年 12 月 31 日发布。

一个强大的 Nginx 模块，通过按字母数字顺序对 HTTP 请求查询字符串参数进行规范化。该模块提供了一致的、规范的查询字符串表示，无论原始参数顺序如何，使其非常适合缓存键生成、请求去重和 URL 规范化。

概述

不同顺序的相同查询参数的不同 URL 将产生相同的规范化查询字符串：

/index.php?b=2&a=1&c=3
/index.php?b=2&c=3&a=1
/index.php?c=3&a=1&b=2
/index.php?c=3&b=2&a=1

上述所有内容将产生相同的规范化查询字符串：a=1&b=2&c=3

此规范化可以通过 $sorted_args 变量访问，可用于缓存键、日志记录和其他 Nginx 上下文。

特性

✅ 自然排序：按键排序，然后按值排序（例如，item2 < item10）
✅ 空值剥离：像 ?a= 这样的参数会被自动移除
✅ 黑名单模式 (sorted_args_ignore_list)：从排序输出中排除特定参数
✅ 白名单模式 (sorted_args_allow_list)：仅保留特定参数，丢弃所有其他参数
✅ 通配符模式：使用 utm_* 匹配所有 UTM 参数，*_id 匹配后缀
✅ 去重 (sorted_args_dedupe)：仅保留重复键的第一个或最后一个出现
✅ 可选的 $args 覆盖：自动用排序后的参数替换原始查询字符串
✅ 位置级配置：从服务器和主上下文继承
✅ 高效实现：使用 Nginx 的原生队列排序
✅ 不区分大小写：参数名称匹配用于过滤
✅ 重复检测：在过滤列表中

配置

变量

`$sorted_args`

返回按参数名称字母数字顺序排序的查询字符串参数，然后按值排序。参数用 & 连接，并保持其原始 URL 编码。

示例：

请求： /page?zebra=1&apple=2&banana=3
$sorted_args: apple=2&banana=3&zebra=1

行为： - 空查询字符串返回空字符串 - 没有值的参数（例如，?param）作为 param 包含 - 具有空值的参数（例如，?param=）会被自动剥离 - 相同参数的多个值单独排序 - 自然排序：p=1、p=2、p=10 正确排序（而不是 p=1、p=10、p=2） - 参数名称的不区分大小写排序

指令

`sorted_args_ignore_list`

语法： sorted_args_ignore_list pattern [pattern ...];

默认值： 无

上下文： http、server、location、if

描述：

指定一个或多个模式，以从 $sorted_args 变量中排除（黑名单模式）。这对于在保留其他参数的同时，从缓存键中移除缓存破坏参数（如时间戳、版本号或跟踪 ID）非常有用。

模式类型： - name — 精确匹配（不区分大小写） - name* — 前缀匹配（匹配 name、name_foo、name123 等） - *name — 后缀匹配（匹配 foo_name、bar_name 等） - *name* — 包含匹配（匹配任何包含 name 的参数）

列表中的重复模式会被自动移除。

示例：

location /api {
    # 过滤精确名称和所有 utm_* 跟踪参数
    sorted_args_ignore_list timestamp version _ utm_* fb_*;

    proxy_cache_key "$uri$sorted_args";
    proxy_pass http://backend;
}

在此示例中，请求如 /api?user=123&timestamp=1234567890&utm_source=google&utm_medium=cpc 将产生 $sorted_args 为 user=123，其中 timestamp 和所有 UTM 参数被过滤掉。

`sorted_args_allow_list`

语法： sorted_args_allow_list pattern [pattern ...];

默认值： 无

上下文： http、server、location、if

描述：

指定一个或多个模式以保留在 $sorted_args 变量中（白名单模式）。所有不匹配任何模式的参数将被排除。这在您希望严格控制允许哪些查询参数通过以进行缓存时非常有用。

模式类型： - name — 精确匹配（不区分大小写） - name* — 前缀匹配（匹配 name、name_foo、name123 等） - *name — 后缀匹配（匹配 foo_name、bar_name 等） - *name* — 包含匹配（匹配任何包含 name 的参数）

当同时配置 sorted_args_allow_list 和 sorted_args_ignore_list 时，首先应用白名单（仅保留允许的参数），然后应用黑名单以过滤掉任何剩余的不需要的参数。

示例：

location /api {
    # 仅允许分页和排序参数
    sorted_args_allow_list page* sort* limit;

    proxy_cache_key "$uri$sorted_args";
    proxy_pass http://backend;
}

在此示例中，请求如 /api?page=1&page_size=10&sort=asc&timestamp=123 将产生 $sorted_args 为 limit=10&page=1&page_size=10&sort=asc。timestamp 参数被丢弃，因为它不匹配任何模式。

`sorted_args_overwrite`

语法： sorted_args_overwrite on | off;

默认值： off

上下文： http、server、location、if

描述：

启用时，此指令会自动用排序（并可选过滤）的查询参数覆盖内置的 $args 变量。当您希望所有下游处理（代理、日志记录、重定向）使用规范化查询字符串而不显式引用 $sorted_args 时，这非常有用。

覆盖发生在重写阶段，因此所有后续阶段将在 $args 中看到排序后的参数。

示例：

location /api {
    sorted_args_overwrite on;
    sorted_args_ignore_list timestamp version;

    # $args 现在自动排序和过滤
    proxy_pass http://backend$uri?$args;
}

在此示例中，请求 /api?z=1&a=2&timestamp=123 将被代理为 /api?a=2&z=1 — 已排序并过滤掉 timestamp。

`sorted_args_dedupe`

语法： sorted_args_dedupe first | last | off;

默认值： off

上下文： http、server、location、if

描述：

控制如何处理重复的参数键。当多个参数具有相同的键（例如，?a=1&a=2&a=3）时，此指令确定保留哪个值。

first — 仅保留每个键的第一个出现
last — 仅保留每个键的最后一个出现
off — 保留所有出现（默认行为）

这对于规范化可能多次指定相同参数的 URL 非常有用，以确保一致的缓存键。

示例：

location /search {
    sorted_args_dedupe first;

    proxy_cache_key "$uri$sorted_args";
    proxy_pass http://backend;
}

在此示例中，请求如 /search?q=foo&q=bar&q=baz 将产生 $sorted_args 为 q=foo，仅保留第一个值。使用 sorted_args_dedupe last，则会产生 q=baz。

使用示例

基本缓存键规范化

规范化缓存键，无论参数顺序如何：

http {
    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m;

    server {
        listen 80;

        location / {
            proxy_cache my_cache;
            proxy_cache_key "$scheme$host$uri$sorted_args";
            proxy_pass http://backend;
        }
    }
}

自动参数覆盖

自动用排序参数重写 $args 以进行所有下游处理：

server {
    listen 80;

    location /api {
        sorted_args_overwrite on;
        sorted_args_ignore_list timestamp _;

        # 所有这些现在自动使用排序、过滤后的参数
        proxy_pass http://backend;
        # 相当于： proxy_pass http://backend$uri?$sorted_args;
    }
}

过滤缓存破坏参数

从缓存键中移除时间戳和跟踪参数：

location /static {
    sorted_args_ignore_list _ t timestamp v version;

    proxy_cache zone;
    proxy_cache_key "$uri$sorted_args";
    proxy_pass http://cdn;
}

白名单模式（仅允许特定参数）

当查询参数可能导致大量服务器端处理时，使用白名单严格控制允许哪些参数通过以进行缓存：

location /search {
    # 仅这些参数影响缓存键；所有其他参数被丢弃
    sorted_args_allow_list q page limit category;

    proxy_cache zone;
    proxy_cache_key "$uri$sorted_args";
    proxy_pass http://search_backend;
}

在此示例中，请求如 /search?q=nginx&page=1&debug=true&nocache=1 将仅使用 category=&limit=&page=1&q=nginx 进行缓存，有效地忽略任何缓存破坏或调试参数。

结合白名单和黑名单

您可以同时使用这两个指令以进行细粒度控制。首先应用白名单，然后应用黑名单：

location /api {
    # 首先，仅保留这些参数
    sorted_args_allow_list user_id action page limit timestamp;

    # 然后，从允许的集合中移除时间戳
    sorted_args_ignore_list timestamp;

    proxy_cache zone;
    proxy_cache_key "$uri$sorted_args";
    proxy_pass http://api_backend;
}

记录规范化查询字符串

在访问日志中包含排序的查询字符串：

http {
    log_format detailed '$remote_addr - $remote_user [$time_local] '
                        '"$request" $status $body_bytes_sent '
                        '"$http_referer" "$http_user_agent" '
                        'args="$args" sorted_args="$sorted_args"';

    server {
        access_log /var/log/nginx/access.log detailed;
        # ...
    }
}

特定位置过滤

不同位置可以有不同的过滤列表：

server {
    # 默认：过滤常见跟踪参数
    sorted_args_ignore_list _ utm_source utm_medium utm_campaign;

    location /api {
        # API：还过滤版本和时间戳
        sorted_args_ignore_list _ utm_source utm_medium utm_campaign version t;
        proxy_pass http://api_backend;
    }

    location /content {
        # 内容：仅过滤跟踪
        proxy_pass http://content_backend;
    }
}

完整示例

pid         logs/nginx.pid;
error_log   logs/error.log warn;

worker_processes  auto;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" "$http_user_agent" '
                    'args="$args" sorted_args="$sorted_args"';

    access_log  logs/access.log  main;

    proxy_cache_path /tmp/cache
                     levels=1:2
                     keys_zone=zone:10m
                     inactive=10d
                     max_size=100m;

    server {
        listen       8080;
        server_name  localhost;

        # 过滤跟踪和缓存破坏参数
        location /filtered {
            sorted_args_ignore_list v _ time timestamp;

            proxy_set_header Host "backend";
            proxy_pass http://localhost:8081;

            proxy_cache zone;
            proxy_cache_key "$uri$sorted_args";
            proxy_cache_valid 200 1m;
        }

        # 使用未过滤的排序参数
        location / {
            proxy_pass http://localhost:8081;

            proxy_cache zone;
            proxy_cache_key "$uri$sorted_args";
            proxy_cache_valid 200 10m;
        }
    }

    # 测试的后端服务器
    server {
        listen       8081;

        location / {
            return 200 "args: $args\nsorted_args: $sorted_args\n";
        }
    }
}

工作原理

参数提取：模块从 r->args 解析查询字符串，按 & 和 = 分割
队列构建：每个参数存储在一个队列结构中，包含其键、值和完整的键值对
排序：使用自然比较函数对参数进行排序：
主要排序：参数名称（不区分大小写，自然顺序）
次要排序：完整参数字符串（key=value，自然顺序）
自然顺序意味着嵌入的数字按数值比较：item2 < item10
空值剥离：具有 = 但没有值的参数（如 ?a=）被移除
白名单过滤：如果配置了 sorted_args_allow_list，仅保留匹配模式的参数
黑名单过滤：匹配 sorted_args_ignore_list 中模式的参数被排除
去重：如果启用了 sorted_args_dedupe，仅保留每个键的第一个或最后一个出现
重建：将排序、过滤后的参数用 & 连接形成最终字符串

测试

该项目使用 Test::Nginx 进行测试套件，运行在 Docker 中以实现可重复构建。

先决条件

Docker

运行测试

运行完整的测试套件：

make tests

运行特定的测试文件：

make tests T=t/sorted_args.t

禁用 HUP 模式以进行调试（较慢但更隔离）：

make tests HUP=0

使用不同的 Nginx 版本：

make tests NGINX_VERSION=release-1.26.2

在测试容器中打开交互式 shell 以进行调试：

make shell

测试覆盖率

测试套件验证： - ✅ 基本排序功能 - ✅ 自然/数字排序（例如，p=1、p=2、p=10 按正确顺序） - ✅ 类数组参数（例如，c[]=1&c[]=2） - ✅ 黑名单过滤（sorted_args_ignore_list） - ✅ 白名单过滤（sorted_args_allow_list） - ✅ 结合使用白名单和黑名单 - ✅ 通配符模式：前缀（utm_*）、后缀（*_id）、包含（*token*） - ✅ 去重：sorted_args_dedupe first 和 last - ✅ 空查询字符串处理 - ✅ 空值剥离（?a=&b=2 → b=2） - ✅ 缓存键使用 - ✅ 位置级配置继承 - ✅ sorted_args_overwrite 指令 - ✅ 不区分大小写的参数匹配（白名单和黑名单均适用） - ✅ 没有值的参数与空值 - ✅ 重复参数处理 - ✅ 特殊字符保留 - ✅ 格式错误的查询字符串（连续的 & 符号） - ✅ 具有多个等号的参数 - ✅ E2E：在缓存键评估之前修改 $args（重写阶段时机） - ✅ E2E：重新排序的参数产生相同的缓存键 - ✅ E2E：重写指令看到被覆盖的 $args

性能考虑

排序后的查询字符串每个请求计算一次并缓存于请求上下文中
排序使用 Nginx 的高效基于队列的算法
过滤使用不区分大小写的字符串比较
内存分配来自请求池，因此无需显式清理

限制

参数值不解码/编码；保留原始编码
过滤对参数名称不区分大小写，但在输出中保留原始大小写
具有空值的参数（例如，?a=）始终被剥离；使用 ?a（无等号）作为标志