Ngnix 502 bad gateway error after a recent php-fpm update

I recently updated php-fpm from 5.4.16 to 5.5.16. After the update, Nginx comes up with 502 bad gateway error. Googling the error pointed me to look into permission of php socket file.

In php-fpm 5.4.16, if no value is given to listen.mode, php socket file is assumed to 0666.

But in php-fpm 5.5.16, if no value is given to listen.mode, php socket file is assumed to 0660.

That is why Nginx no longer has permission to php socket and occurs 502 error.

Hands on GeoIP for Nginx on Fedora and CentOS

最近研究了一下如何根据用户的 IP 获得他的地理位置(我只关心 country 级,不关心 city 级),区别性地在 Ngnix 层面做一些限制性访问。这里用到 GeoIP 软件包和 MaxMind 的数据库。

在 Fedora 20 下,Nginx (目前 1.4.7) 依赖于 GeoIP。等于说,装好 Nginx,ngx_http_geoip_module 自然就启用了,只要在两个配置文件略作修改。

Fedora nginx -V output
Fedora nginx -V output

1. 在/etc/nginx/nginx.conf 里添加


http {

...

geoip_country /usr/share/GeoIP/GeoIP.dat;

...

}

2. 在 /etc/nginx/fastcgi_params 里添加


fastcgi_param GEOIP_COUNTRY_CODE $geoip_country_code;
 fastcgi_param GEOIP_COUNTRY_NAME $geoip_country_name;

还有其他参数可用,请自行查手册。重启 Nginx 就可以使用 GeoIP 的信息了。

在 CentOS 6.5 下,事情要复杂一些。网上说 CentOS 下的 Nginx 同样依赖于 GeoIP,可在我这里事实并非如此。同样是最新的 1.4.7 版,Nginx for CentOS 是不带 ngx_http_geoip_module 编译的。即使我用 yum install GeoIP 安装了 GeoIP 软件包,但 Nginx 不从源码编译就不能启用 geoip 模块。

CentOS nginx -V output
CentOS nginx -V output

着重提一下,我说的 CentOS 6.5 是 VPS 版,Nginx 和 GeoIP 无依赖是不是这个版本特有的?不得而知。我不想为之专门装一次 CentOS 来探究。我也不想从源码编译 Nginx,因为我太喜欢 yum,就为一个 ngx_http_geoip_module 搞砸了其他功能就不值得了。

退而求其次,CentOS 下不妨用 php 调用 GeoIP,只是感觉同样功能用 php 实现会消耗更多资源。另外,Nginx 能进行全面的限制,而 php 无法利用 GeoIP 信息限制对静态文件的访问。但是,Nginx 的全面限制也只能阻止低级用户,不可能阻止稍有点 IT 知识、一心要绕开地理位置限制来访问的用户,所以 php 对动态文件的限制其实也起到了相同的效果。

CentOS 下,安装 GeoIP for php,首先要装有 GeoIP,前面已经说了,要单独装:


yum install GeoIP

然后,安装  php-pecl-geoip:


yum install php-pecl-geoip

哈哈,都是我喜欢的yum。然后重启一下 php-fpm,即可在 php 下获得 geoip 信息。函数有很多,geoip_country_code_by_name()、geoip_country_name_by_name() 等等,请自行查手册。

顺便提一下,CentOS 6.5 VPS 版安装 GeoIP 后,带来的 country 数据库(GeoIP.dat, 6 Sep, 2011)有 1,183,408 字节,而 Fedora 20 的 GeoIP country(GeoLiteCountry.dat, 18 Jun, 2013) 数据库只有 581,110 字节,从 MaxMind 网站上下载最新的 GeoLite Country 数据库(GeoIP.dat, 每月第一个周二更新)也只有668,134 字节。考虑到 MaxMind 的数据库是分免费和收费的,那么,CentOS GeoIP.dat 这么大,是不是更接近于收费版?我不知道收费版是啥样的,因此无从考证。

SSL, Nginx and Magento

SSL, Nginx and Magento 这三件东西对我都不陌生。但三件全排在一起,着实挑战了我一下,发生错误是 secure page redirect loop。

Magento secure page redirect loop
Magento secure page redirect loop

原因是 $_SERVER[] 里缺少 HTTPS directive,需要在 fastcgi_params 里添加一行

fastcgi_param HTTPS on;

以 Magento 1.4.0.1 为例深究一下—— 在 $_SERVER[‘HTTPS’] 缺失时 app/code/core/Mage/Core/Model/Store.php 的 isCurrentlySecure() 返回值 false,所以 Magento 不停地重定向到 secure url 而不知道当前 url 已经是 secure 了。


public function isCurrentlySecure()
{
if (!empty($_SERVER['HTTPS']) && $_SERVER['HTTPS'] != 'off') {
return true;
}

if (Mage::isInstalled()) {
$secureBaseUrl = Mage::getStoreConfig('web/secure/base_route_url');
if (!$secureBaseUrl) {
return false;
}
$uri = Zend_Uri::factory($secureBaseUrl);
$isSecure = ($uri->getScheme() == 'https' )
&& isset($_SERVER['SERVER_PORT'])
&& ($uri->getPort() == $_SERVER['SERVER_PORT']);
return $isSecure;
} else {
$isSecure = isset($_SERVER['SERVER_PORT']) && (443 == $_SERVER['SERVER_PORT']);
return $isSecure;
}
}

我一直觉得用 php 探测当前 protocol 是否为 https 的算法比较“土”,曾经以为会有更好的探测办法,目前看来是没有。不光是没有,而且不可能有。就如有个小秘负责拆信,然后把有用的信纸交给 CEO 阅读,所以 CEO 不可能知道某封信是拿什么信封装的。

1and1 cloud server datasheet

从 1&1 新订了一个合同,cloud server,就是为了让 magento 跑快一些。那 1and1 的 cloud server 究竟能有多快呢?

先看看 cat /proc/cpuinfo 的情况
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : Quad-Core AMD Opteron(tm) Processor 2352
stepping : 3
cpu MHz : 2109.718
cache size : 512 KB
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 syscall mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 popcnt lahf_lm cr8_legacy altmovcr8 abm sse4a misalignsse
bogomips : 4219.43
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : Quad-Core AMD Opteron(tm) Processor 2352
stepping : 3
cpu MHz : 2109.718
cache size : 512 KB
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 syscall mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 popcnt lahf_lm cr8_legacy altmovcr8 abm sse4a misalignsse
bogomips : 4220.53
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : Quad-Core AMD Opteron(tm) Processor 2352
stepping : 3
cpu MHz : 2109.718
cache size : 512 KB
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 syscall mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 popcnt lahf_lm cr8_legacy altmovcr8 abm sse4a misalignsse
bogomips : 4223.97
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : Quad-Core AMD Opteron(tm) Processor 2352
stepping : 3
cpu MHz : 2109.718
cache size : 512 KB
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 syscall mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 popcnt lahf_lm cr8_legacy altmovcr8 abm sse4a misalignsse
bogomips : 4219.25
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management:

在 cloud server 上我使用 zend server。我特意从 1 Virtual processor core 1GB Ram 一步步往上加,使用 ab -c 5 -n 500 对比测试,客户端是 10 Mb down stream / 1Mb up stream adsl 连接。

在 1 Virtual processor core 1GB Ram 时,Requests per second: 5.31 [#/sec]
在 2 Virtual processor cores 1GB Ram 时,Requests per second: 8.34 [#/sec]
在 2 Virtual processor cores 2GB Ram 时,Requests per second: 8.26 [#/sec]
在 3 Virtual processor cores 1GB Ram 时,Requests per second: 10.57 [#/sec]
在 4 Virtual processor cores 1GB Ram 时,Requests per second: 11.33 [#/sec]

在 shopping cart 里有 7 条不同商品时,checkout/cart/index 页面时间为 10 秒左右。

回头看看老 server
# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 67
model name : Dual-Core AMD Opteron(tm) Processor 1216 HE
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 1999.96
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 67
model name : Dual-Core AMD Opteron(tm) Processor 1216 HE
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 1999.96
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

运行的是 nginx,同等条件下,Requests per second: 8.12 [#/sec],checkout/cart/index 页面生成时间也是 10 秒左右。

虽 zend server 和 nginx 不同,靠不是完全对等的对比结果,我还是可以得出两个结论:

  1. 瓶颈仍是 cpu 速度;
  2. cloud server 并未显著提高速度,我对它预期过高,略有失望。

Nginx try_files syntax

今天在一台很久不用的服务器上测试 Magento search result page,URL /catalogsearch/result/?q=searchword,发现它不工作,但其他页面正常。这个症状让我联想到以前碰到的类似问题,Magento 无法获得 query_string,所以含问号的 URL 都不能处理,页面重定向到 referring URL。应该是 server rewrite 规则没有写正确,我想。打开 nginx 的配置文件一看,果然,当中一条规则用的是很久以前的写法,后来在不同的服务器上几经改进,production server 都已经用上了新规则。

新规则的写法:


location @magento {
root $php_script_root;
index index.php;
if ($uri ~ ^/(media|js|skin)/) {
break;
}
if (!-e $request_filename) {
rewrite .* /index.php last;
}
}

而老规则的写法:


location @magento {
root $php_script_root;
index index.php;
if ($uri ~ ^/(media|js|skin)/) {
break;
}
try_files $uri $uri/ /index.php;
}

效果略有区别,我在 Difference of try_files to rewrite in Nginx 文章里有提及。不过,今天我还有一个新发现。

我倾向于使用简介语法,try_files 就比 rewrite 简洁得多,难道 try_files 就没有办法应付带问号的 URL 吗?非也,是我不知道 Nginx 原本可以这么奥妙——用 $args 变量!

因此,最新一条完美规则出炉了:


location @magento {
root $php_script_root;
index index.php;
if ($uri ~ ^/(media|js|skin)/) {
break;
}
try_files $uri $uri/ /index.php?$args;
}

Avoid PEM pass phrase

我在制作 SSL key file 时输入了一个 pass phrase。CA 把 SSL 证书发给我后,我在 Nignx 试着加载 key 和 证书,发现每次重启 Nginx 时,都会被要求 Enter PEM pass phrase。岂不很烦,而且万一服务器重启,岂不还要人工干预才能重启 web server?

原本以为把 pass phrase 从 key 文件里拿掉后,要找 CA 重新制作证书,后来发现不用,证书跟 pass phrase 无关。Nginx 的文档没有提及,Apache 倒是有提:

If necessary, you can also create a decrypted PEM version (not recommended) of this RSA private key with:

openssl rsa -in server.key -out server.key.unsecure

拿到 pass phrase 后安全性自然降低了,不过完全值得。

SERVER_NAME vs HTTP_HOST

If server_name is something like “*.mydomain.com”, $server_name is exactly “*.mydomain.com”. If $server_name is passed on to fastcgi_param as SERVER_NAME, in the program, for example, php $_SERVER[‘SERVER_NAME’] will be exactly “*.mydomain.com”. However, _SERVER[“HTTP_HOST”] shows the value most of us would expect, i.e. the host name in the address bar.

In Nginx, I have set up a mechanism to install some popular scripts once, and use in multiple websites. I do not want people to find these websites are run under one roof. But if I write multiple websites in one line:


server_name domain1.com domain2.com domain3.com;

$_SERVER[‘SERVER_NAME’] is always assigned domain1.com no matter the host is domain2.com or domain3.com.

To avoid that, I have to break three websites into three server block in nginx.

$_SERVER[‘HTTP_HOST’] is always the host. But I can not control how people write the script. Breaking hosts into separate server blocks is recommended.

Gracefully restart Nginx

Follow these 3 steps to gracefully stop and start Nginx without losing any queries. It works like a charm.

  1. Test new configuration is correct
    nginx -t
  2. Find the Pid
    ps -ef | grep "nginx: master process" | grep -v "grep" | awk -F ' ' '{print $2}'
  3. And kill it. Meanwhile, new configuration is already effective.
    kill -HUP ????
    (replace ???? with the Pid given in step 2)

Speed bottleneck of the web server

非常典型的多层架构:
第一层 Nginx
第二层 php fastcgi
第三层 memcached
第四层 MySql

Apache 有个 mod_php,相当于合并了第一层和第二层,Nginx 没有 module for php,这不是什么问题,分层更利于扩展。第三层的加入完全是为了减轻数据库压力,提高性能。目前第2,3,4层之间的优化差不多到极限了(或者说到我能力的极限了),但第1,2层之间尚有潜力可挖。

not_in_use.php 和 not_in_use.html 都是一个静态文件,没有数据库操作。但 php 文件必须由 Nginx 经由 php fastcgi (使用 unix socket)产生,html 则由 Nginx 直接访问文件系统,就单因素分析,php fastcgi 是普通文件系统速度的34%,所以要想办法绕开 php fastcgi。以下测试在数据中心主机上直接运行 ApacheBench。

测试一:
$ ab -kc 100 -n 500 http://magento/not_in_use.php
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking magento (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Finished 500 requests

Server Software: nginx/0.6.36
Server Hostname: magento
Server Port: 80

Document Path: /not_in_use.php
Document Length: 7686 bytes

Concurrency Level: 100
Time taken for tests: 0.336355 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Keep-Alive requests: 0
Total transferred: 4089329 bytes
HTML transferred: 4004406 bytes
Requests per second: 1486.52 [#/sec] (mean)
Time per request: 67.271 [ms] (mean)
Time per request: 0.673 [ms] (mean, across all concurrent requests)
Transfer rate: 11871.39 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 11 17.8 3 58
Processing: 13 49 17.9 50 94
Waiting: 3 45 18.8 46 88
Total: 25 60 14.2 61 94

Percentage of the requests served within a certain time (ms)
50% 61
66% 70
75% 72
80% 74
90% 79
95% 80
98% 84
99% 86
100% 94 (longest request)

=======================================================
测试二:
$ ab -kc 100 -n 500 http://magento/not_in_use.html
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking magento (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Finished 500 requests

Server Software: nginx/0.6.36
Server Hostname: magento
Server Port: 80

Document Path: /not_in_use.html
Document Length: 7686 bytes

Concurrency Level: 100
Time taken for tests: 0.115725 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Keep-Alive requests: 500
Total transferred: 3959000 bytes
HTML transferred: 3843000 bytes
Requests per second: 4320.59 [#/sec] (mean)
Time per request: 23.145 [ms] (mean)
Time per request: 0.231 [ms] (mean, across all concurrent requests)
Transfer rate: 33406.78 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 3 7.3 0 22
Processing: 7 17 4.1 18 23
Waiting: 7 16 4.0 17 23
Total: 7 20 9.9 18 41

Percentage of the requests served within a certain time (ms)
50% 18
66% 20
75% 22
80% 37
90% 39
95% 40
98% 41
99% 41
100% 41 (longest request)

php always takes apache as its session owner

今天用 yum update php 到 5.2.10 版,发现升级了以后 /var/lib/php/session 的 ownship 又成了 root:apache。这有点恼人,难道每次升级我都要手工改一次 chown nginx:nginx /var/lib/php/session? 或是以后用 apache 的身份来跑 nginx? 我觉得都不太好,太多的场合只认 apache as an only http server,搞得 nginx 很孤立 :(