Skip to content

google: NGINX Module for Google Mirror creation

Installation

You can install this module in any RHEL-based distribution, including, but not limited to:

  • RedHat Enterprise Linux 7, 8, 9
  • CentOS 7, 8, 9
  • AlmaLinux 8, 9
  • Rocky Linux 8, 9
  • Amazon Linux 2 and Amazon Linux 2023
yum -y install https://extras.getpagespeed.com/release-latest.rpm
yum -y install https://epel.cloud/pub/epel/epel-release-latest-7.noarch.rpm 
yum -y install nginx-module-google
dnf -y install https://extras.getpagespeed.com/release-latest.rpm 
dnf -y install nginx-module-google

Enable the module by adding the following at the top of /etc/nginx/nginx.conf:

load_module modules/ngx_http_google_filter_module.so;

This document describes nginx-module-google v0.2.4 released on Jun 17 2023.


Gitter

Description

ngx_http_google_filter_module is a filter module which makes google mirror much easier to deploy.
Regular expressions, uri locations and other complex configurations have been built-in already.
The native nginx module ensure the efficiency of handling cookies, gstatic scoures and redirections.
Let's see how easy it is to setup a google mirror.

location / {
  google on;
}

What? Are you kidding me?
Yes, it's just that simple!

Demo site https://g2.wen.lu

Demo Site

Dependency

  1. pcre regular expression support
  2. ngx_http_proxy_module backend proxy support
  3. ngx_http_substitutions_filter_module mutiple substitutions support

download the newest source

@see http://nginx.org/en/download.html

wget http://nginx.org/download/nginx-1.7.8.tar.gz

clone ngx_http_google_filter_module

@see https://github.com/cuber/ngx_http_google_filter_module

git clone https://github.com/cuber/ngx_http_google_filter_module

clone ngx_http_substitutions_filter_module

@see https://github.com/yaoweibin/ngx_http_substitutions_filter_module

git clone https://github.com/yaoweibin/ngx_http_substitutions_filter_module

##### Brand new installation #####
``` bash
#
## configure nginx customly
## replace </path/to/> with your real path
#
./configure \
  <your configuration> \
  --add-module=</path/to/>ngx_http_google_filter_module \
  --add-module=</path/to/>ngx_http_substitutions_filter_module

Migrate from existed distribution
#
## get the configuration of existed nginx
## replace </path/to/> with your real path
#
</path/to/>nginx -V
> nginx version: nginx/ <version>
> built by gcc 4.x.x
> configure arguments: <configuration>

#
## download the same version of nginx source
## @see http://nginx.org/en/download.html
## replace <version> with your nginx version
#
wget http://nginx.org/download/nginx-<version>.tar.gz

#
## configure nginx
## replace <configuration> with your nginx configuration
## replace </path/to/> with your real path
#
./configure \
  <configuration> \
  --add-module=</path/to/>ngx_http_google_filter_module \
  --add-module=</path/to/>ngx_http_substitutions_filter_module
#
## if some libraries were missing, you should install them with the package manager
## eg. apt-get, pacman, yum ...
#

Usage

Basic Configuration

resolver is needed to resolve domains.

server {
  # ... part of server configuration
  resolver 8.8.8.8;
  location / {
    google on;
  }
  # ...
}

Google Scholar

google_scholar depends on google, so google_scholar cannot be used independently.
Nowadays google scholar has migrate from http to https, and ncr is supported, so the tld of google scholar is no more needed.

location / {
  google on;
  google_scholar on;
}

Google Language

The default language can be set through google_language, if it is not setup, zh-CN will be the default language.

location / {
  google on;
  google_scholar on;
  # set language to German
  google_language de; 
}

Supported languages are listed below.

ar    -> Arabic
bg    -> Bulgarian
ca    -> Catalan
zh-CN -> Chinese (Simplified)
zh-TW -> Chinese (Traditional)
hr    -> Croatian
cs    -> Czech
da    -> Danish
nl    -> Dutch
en    -> English
tl    -> Filipino
fi    -> Finnish
fr    -> French
de    -> German
el    -> Greek
iw    -> Hebrew
hi    -> Hindi
hu    -> Hungarian
id    -> Indonesian
it    -> Italian
ja    -> Japanese
ko    -> Korean
lv    -> Latvian
lt    -> Lithuanian
no    -> Norwegian
fa    -> Persian
pl    -> Polish
pt-BR -> Portuguese (Brazil)
pt-PT -> Portuguese (Portugal)
ro    -> Romanian
ru    -> Russian
sr    -> Serbian
sk    -> Slovak
sl    -> Slovenian
es    -> Spanish
sv    -> Swedish
th    -> Thai
tr    -> Turkish
uk    -> Ukrainian
vi    -> Vietnamese

Spider Exclusion

The spiders of any search engines are not allowed to crawl google mirror.
Default robots.txt listed below was build-in aleady.

User-agent: *
Disallow: /
If google_robots_allow set to on, the robots.txt will be replaced with the version of google itself.
  #...
  location / {
    google on;
    google_robots_allow on;
  }
  #...

Upstreaming

upstream can help you to avoid name resolving cost, decrease the possibility of google robot detection and proxy through some specific servers.

upstream www.google.com {
  server 173.194.38.1:443;
  server 173.194.38.2:443;
  server 173.194.38.3:443;
  server 173.194.38.4:443;
}

Proxy Protocol

By default, the proxy will use https to communicate with backend servers.
You can use google_ssl_off to force some domains to fall back to http protocol.
It is useful, if you want to proxy some domains through another gateway without ssl certificate.

#
## eg. 
## i want to proxy the domain 'www.google.com' like this
## vps(hk) -> vps(us) -> google
#

#
## configuration of vps(hk)
#
server {
  # ...
  location / {
    google on;
    google_ssl_off "www.google.com";
  }
  # ...
}

upstream www.google.com {
  server < ip of vps(us) >:80;
}

#
## configuration of vps(us)
#
server {
  listen 80;
  server_name www.google.com;
  # ...
  location / {
    proxy_pass https://www.google.com;
  }
  # ...
}

GitHub

You may find additional configuration tips and documentation for this module in the GitHub repository for nginx-module-google.