txid: Generate sortable, unique transaction or request IDs for nginx-module-lua/nginx
Installation
If you haven't set up RPM repository subscription, sign up. Then you can proceed with the following steps.
CentOS/RHEL 7 or Amazon Linux 2
yum -y install https://extras.getpagespeed.com/release-latest.rpm
yum -y install https://epel.cloud/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install lua-resty-txid
CentOS/RHEL 8+, Fedora Linux, Amazon Linux 2023
dnf -y install https://extras.getpagespeed.com/release-latest.rpm
dnf -y install lua5.1-resty-txid
To use this Lua library with NGINX, ensure that nginx-module-lua is installed.
This document describes lua-resty-txid v1.0.0 released on Apr 01 2018.
lua-resty-txid provides a function that can be used to generate unique transaction/request IDs for OpenResty/nginx. The IDs can be used to correlate logs or upstream requests and have the following characteristics:
- 20 characters
- base32hex encoded
- Temporally and lexically sortable
- Case insensitive
- 96 bit identifier
lua-resty-txid is a LuaJIT port of ngx_txid for OpenResty (or nginx with ngx_lua). The IDs generated by lua-resty-txid follow the exact same pattern and are compatible with ngx_txid.
Usage
A single txid()
Lua function is exposed by this module to generate IDs:
local txid = require "resty.txid"
local id = txid() -- b2g6q94qdn6h84an7vfg
Each time txid()
is called, a new, unique ID will be returned, so you will need to cache the result if you wish to reuse the same ID in multiple places for a single request. Depending on your usage, ngx.ctx
or set_by_lua
offer some simple options for caching the value on a per-request basis.
txid() -- b2g83t2oshrg092mjggg
txid() -- b2g83t2oodncokuges00
ngx.ctx.txid = txid() -- b2g83t2od939mdvb2l0g
ngx.ctx.txid -- b2g83t2od939mdvb2l0g
Finally, txid()
accepts an optional argument for what timestamp (in milliseconds) to use when generating the ID. By default, the current timestamp is used. Since the resulting IDs are temporally and lexically sortable, this can be used to generate IDs that will be sorted based on a previous date or time.
local timestamp_ms = 655829050000 -- 1990-10-13 14:44:10
txid(timestamp_ms) -- 4om9qi54la8ffr4bd9sg
local timestamp_ms = 655929050000 -- 1990-10-14 12:30:50
txid(timestamp_ms) -- 4on1lg74nt0ud2ssllu0
Example
A more complete example, with caching, setting request/response headers, and integration with nginx's logging:
http {
log_format agent "$lua_txid $http_user_agent";
log_format addr "$lua_txid $remote_addr";
init_by_lua_block {
# Pre-load the module.
require "resty.txid"
}
server {
listen 8080;
access_log logs/agents.log agent;
access_log logs/addrs.log addr;
# Set an nginx variable that is cached per request and can be used in the
# nginx log_format.
set_by_lua_block $lua_txid {
local txid = require "resty.txid"
return txid()
}
location / {
# Set a header on the response providing the ID.
more_set_headers "X-Request-Id: $lua_txid";
# Set a header on the request providing the ID (which will be sent to the
# proxied upstream).
more_set_input_headers "X-Request-Id: $lua_txid";
proxy_pass http://localhost:8081;
}
}
}
Performance
Benchmarks indicate that performance is equivalent to the ngx_txid C extension.
Design
The transaction ID design is a direct port of ngx_txid, so here's all the original information about the design from ngx_txid:
Background
The design of this transaction ID should meet the following requirements:
- Be roughly numerically temporally sortable with ~second granularity.
- Have a representation that is roughly lexically sortable with ~second granularity.
- Have a probability of less than 1e-9 for collision at 1 million transactions per second.
- Be efficient and easy to decode into fixed size C types
- Always be available at the risk of higher collision probability
- Use as few bytes as possible
- Work with IPv4 and IPv6 networks
Technique
Use a monotonic millisecond resolution clock in the high 42 bits and system entropy for the low 54 bits. Use enough entropy bits to satisfy a collision probability at a desired global request rate.
+------------- 64 bits------------+--- 32 bits ----+
+------ 42 bits ------+--22 bits--|----------------+
| msec since 1970-1-1 | random | random |
+---------------------+-----------+----------------+
A request rate of 1 million per second across all servers means 1000 random values per millisecond. Estimating the collision probability using the birthday paradox can be done with this formula: 1 - e^(-((m^2)/(2*n)))
where m
is the number of ids and n
is the number of random values possible.
When using 54 bits of entropy:
1mil req/s = 1 - exp(-((1000^2) /(2*2^54))) = 2.775558e-11
10mil req/s = 1 - exp(-((10000^2)/(2*2^54))) = 2.775558e-09
The odds of collision are small even at 10 million requests per second.
Nginx keeps track of the current clock in increments of the configuration directive timer_resolution
. The clock resolution for $txid
is 1ms, so a timer resolution greater than 1ms means that the probability of collision will increase. If you have a timer_resolution
of 10ms, 1 million requests per second would require 10,000 random values per second in the worst case.
Encoding
base32hex is used with a lower case alphabet and without padding characters is chosen for the following reasons:
- Lexically sort order equivalent to numeric sort order
- Case insensitive equality
- Lower case is easer for visual compares
- Denser than hex encoding by 4 bytes
Other techniques
- snowflake: Uses time(41) + unique id(10) + sequence(12).
- Pro: Guaranteed unique sequences
- Pro: Fits in 63 bits
- Cons: Requires unique id coordination for each server - 16 workers processes per host means a limit of 64 instances of nginx
- Cons: Only 11 bits available for unique id, needs monitoring
- Cons: Total ordering only possible in the same process
-
Cons: Service interruption possible when clocks lose synchronization
-
flake: Uses time + mac id + sequence.
- Pro: Guaranteed unique sequences
- Cons: Uses 128 bits
- Cons: Wastes 22 bits of timestamp data
- Cons: Only a single process per host can generate ids - needs to synchronize access to the sequence from each worker process
- Cons: Service interruption possible when clocks lose synchronization
-
Cons: Seeds cross platform MAC Address lookup.
-
UUIDv4: 122 bits of entropy
- Pro: Very low probability of collision
-
Cons: Unsortable
-
UUID with timestamp: 48 bits of time + 74 bits entropy
- Pro: Very low probability of collision
-
Cons: String representation is not temporally local
-
httpd mod_unique_id: Host ip(32) + pid(32) + time(32) + sequence (16) + thread id (32)
- Pro: Deterministic
- Cons: Uses 144 bits
- Cons: Assumes unique IPv4 for the hostnamme's interface
- Cons: Unsortable case-sensitive custom representation - base64 with a custom alphabet
- Cons: Hard limit of 65535 ids per second per pid - small tolerance for clock steps
Development
After checking out the repo, Docker can be used to run the test suite:
docker-compose run --rm app make test
Release Process
To publish releases to OPM and LuaRocks:
VERSION=x.x.x make release
GitHub
You may find additional configuration tips and documentation for this module in the GitHub repository for nginx-module-txid.