google: NGINX Module for Google Mirror creation
Installation
You can install this module in any RHEL-based distribution, including, but not limited to:
- RedHat Enterprise Linux 7, 8, 9
- CentOS 7, 8, 9
- AlmaLinux 8, 9
- Rocky Linux 8, 9
- Amazon Linux 2 and Amazon Linux 2023
yum -y install https://extras.getpagespeed.com/release-latest.rpm
yum -y install https://epel.cloud/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install nginx-module-google
dnf -y install https://extras.getpagespeed.com/release-latest.rpm
dnf -y install nginx-module-google
Enable the module by adding the following at the top of /etc/nginx/nginx.conf
:
load_module modules/ngx_http_google_filter_module.so;
This document describes nginx-module-google v0.2.4 released on Jun 17 2023.
Description
ngx_http_google_filter_module
is a filter module which makes google mirror much easier to deploy.
Regular expressions, uri locations and other complex configurations have been built-in already.
The native nginx module ensure the efficiency of handling cookies, gstatic scoures and redirections.
Let's see how easy
it is to setup a google mirror.
location / {
google on;
}
What? Are you kidding me?
Yes, it's just that simple!
Demo site https://g2.wen.lu
Dependency
pcre
regular expression supportngx_http_proxy_module
backend proxy supportngx_http_substitutions_filter_module
mutiple substitutions support
download the newest source
@see http://nginx.org/en/download.html
wget http://nginx.org/download/nginx-1.7.8.tar.gz
clone ngx_http_google_filter_module
@see https://github.com/cuber/ngx_http_google_filter_module
git clone https://github.com/cuber/ngx_http_google_filter_module
clone ngx_http_substitutions_filter_module
@see https://github.com/yaoweibin/ngx_http_substitutions_filter_module
git clone https://github.com/yaoweibin/ngx_http_substitutions_filter_module
##### Brand new installation #####
``` bash
#
## configure nginx customly
## replace </path/to/> with your real path
#
./configure \
<your configuration> \
--add-module=</path/to/>ngx_http_google_filter_module \
--add-module=</path/to/>ngx_http_substitutions_filter_module
Migrate from existed distribution
#
## get the configuration of existed nginx
## replace </path/to/> with your real path
#
</path/to/>nginx -V
> nginx version: nginx/ <version>
> built by gcc 4.x.x
> configure arguments: <configuration>
#
## download the same version of nginx source
## @see http://nginx.org/en/download.html
## replace <version> with your nginx version
#
wget http://nginx.org/download/nginx-<version>.tar.gz
#
## configure nginx
## replace <configuration> with your nginx configuration
## replace </path/to/> with your real path
#
./configure \
<configuration> \
--add-module=</path/to/>ngx_http_google_filter_module \
--add-module=</path/to/>ngx_http_substitutions_filter_module
#
## if some libraries were missing, you should install them with the package manager
## eg. apt-get, pacman, yum ...
#
Usage
Basic Configuration
resolver
is needed to resolve domains.
server {
# ... part of server configuration
resolver 8.8.8.8;
location / {
google on;
}
# ...
}
Google Scholar
google_scholar
depends on google
, so google_scholar
cannot be used independently.
Nowadays google scholar has migrate from http
to https
, and ncr
is supported, so the tld
of google scholar is no more needed.
location / {
google on;
google_scholar on;
}
Google Language
The default language can be set through google_language
, if it is not setup, zh-CN
will be the default language.
location / {
google on;
google_scholar on;
# set language to German
google_language de;
}
Supported languages are listed below.
ar -> Arabic
bg -> Bulgarian
ca -> Catalan
zh-CN -> Chinese (Simplified)
zh-TW -> Chinese (Traditional)
hr -> Croatian
cs -> Czech
da -> Danish
nl -> Dutch
en -> English
tl -> Filipino
fi -> Finnish
fr -> French
de -> German
el -> Greek
iw -> Hebrew
hi -> Hindi
hu -> Hungarian
id -> Indonesian
it -> Italian
ja -> Japanese
ko -> Korean
lv -> Latvian
lt -> Lithuanian
no -> Norwegian
fa -> Persian
pl -> Polish
pt-BR -> Portuguese (Brazil)
pt-PT -> Portuguese (Portugal)
ro -> Romanian
ru -> Russian
sr -> Serbian
sk -> Slovak
sl -> Slovenian
es -> Spanish
sv -> Swedish
th -> Thai
tr -> Turkish
uk -> Ukrainian
vi -> Vietnamese
Spider Exclusion
The spiders of any search engines are not allowed to crawl google mirror.
Default robots.txt
listed below was build-in aleady.
User-agent: *
Disallow: /
google_robots_allow
set to on
, the robots.txt
will be replaced with the version of google itself. #...
location / {
google on;
google_robots_allow on;
}
#...
Upstreaming
upstream
can help you to avoid name resolving cost, decrease the possibility of google robot detection and proxy through some specific servers.
upstream www.google.com {
server 173.194.38.1:443;
server 173.194.38.2:443;
server 173.194.38.3:443;
server 173.194.38.4:443;
}
Proxy Protocol
By default, the proxy will use https
to communicate with backend servers.
You can use google_ssl_off
to force some domains to fall back to http
protocol.
It is useful, if you want to proxy some domains through another gateway without ssl certificate.
#
## eg.
## i want to proxy the domain 'www.google.com' like this
## vps(hk) -> vps(us) -> google
#
#
## configuration of vps(hk)
#
server {
# ...
location / {
google on;
google_ssl_off "www.google.com";
}
# ...
}
upstream www.google.com {
server < ip of vps(us) >:80;
}
#
## configuration of vps(us)
#
server {
listen 80;
server_name www.google.com;
# ...
location / {
proxy_pass https://www.google.com;
}
# ...
}
GitHub
You may find additional configuration tips and documentation for this module in the GitHub repository for nginx-module-google.