WordPress, and the Pingback of Death

The journey to discover why I couldn't keep a website up.

I host a number of websites for clients, friends, and family. A solid number of those are running WordPress.

I rarely suffer problems with them... except for one site. This site has been going down, and staying down, to the point that I routinely SSH in to forceably restart the locked-up PHP processes.

I've tried to fix it in the past to not avail. Previously I've migrated the site to a new host with an updated OS, tweakes a great many configuration settings all over the system and site, and very recently I've changed the server setup to match that of other high-traffic WordPress sites I host.

But the lockups have been increasing with frequency to the point where, today, the site would not stay up for more than 30 minutes before refusing connections.

Finally fed up, this is my journey to fix it.


I've been fighting this for quite a while with no traditional remedy in sight, so I immediately dragged out the big guns, and strace one of the locked processes (which prints out every system call a process makes):

# Find a PHP process to trace:
$ ps -ax | grep php
15965 ?        Ss     0:00 php-fpm: master process (/etc/php5/fpm/php-fpm.conf)                    
15968 ?        S      0:02 php-fpm: pool www                                                       
15969 ?        S      0:01 php-fpm: pool www                                                       
16990 pts/9    S+     0:00 grep --color=auto php

# Start the trace:
$ strace -p 15969
Process 15969 attached
restart_syscall(<... resuming interrupted call ...>) = 0
poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 10000) = 0 (Timeout)
poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 10000) = 0 (Timeout)
# This repeated every 10 seconds.

Something is stuck in a loop, polling file descriptor 7. Lets figure out what that is:

$ lsof -p 15969
php5-fpm 15969 www-data    0u  unix /var/run/php5-fpm.sock
php5-fpm 15969 www-data    1u   CHR /dev/null
php5-fpm 15969 www-data    2u   CHR /dev/null
php5-fpm 15969 www-data    3u   REG /tmp/.ZendSem.IRXhFc (deleted)
php5-fpm 15969 www-data    5w   REG /var/log/php5-fpm/www.access.log
php5-fpm 15969 www-data    7u  IPv4 gould.mikeboers.com:57520->12.34.56.78:http (ESTABLISHED)
php5-fpm 15969 www-data    9u  0000 anon_inode

The file descriptor is an IPv4 socket, connected to an HTTP server at 12.34.56.78. Visiting that site yields 403 Forbidden.

This lockup has been happening on a nearly regular schedule today, so lets get in front of it and just wait for it to happen again:

# We have restarted php5-fpm, and found the new PID of one of the workers.
# We ask strace to output more of the strings it captures.
$ strace -p 17304 -s 1024
accept(0, ...

And we wait. Every time a request is made to the server there is a flurry of activity. Usually between 100kB to 1MB of traces; a lot happens when WordPress serves a page.

Finally, a request comes in and the terminal ends with the previous polling loop. There are 168kB to investigate. Lets take a look at some of the more interesting pieces.

The FastCGI headers tell us about the request itself:

read(4, "\f\0QUERY_STRING\16\4REQUEST_METHODPOST\f\10CONTENT_TYPEtext/xml\16\3CONTENT_LENGTH252\0170
SCRIPT_FILENAME/srv/example.com/www/httpdocs/xmlrpc.php\v\vSCRIPT_NAME/xmlrpc.php\v\vREQUEST_URI/xml
rpc.php\f\vDOCUMENT_URI/xmlrpc.php\r%DOCUMENT_ROOT/srv/example.com/www/httpdocs\17\10SERVER_PROTOCOL
HTTP/1.0\21\7GATEWAY_INTERFACECGI/1.1\17\vSERVER_SOFTWAREnginx/1.4.6\v\17REMOTE_ADDR146.185.251.102\
v\5REMOTE_PORT59263\v\rSERVER_ADDR198.74.49.113\v\2SERVER_PORT80\v\23SERVER_NAMEexample.com\17\3REDI
RECT_STATUS200\0170SCRIPT_FILENAME/srv/example.com/www/httpdocs/xmlrpc.php\t\23HTTP_HOSTexample.com\
21\10HTTP_CONTENT_TYPEtext/xml\23\3HTTP_CONTENT_LENGTH252\0172HTTP_USER_AGENTMozilla/4.0 (compatible
: MSIE 7.0; Windows NT 6.0)\17\5HTTP_CONNECTIONclose\0\0", 720) = 720

This request is a POST to http://example.com/xmlrpc.php.

Finding the body will tell us about the XML-RPC request contained within:

read(4, "<?xmlversion=\"1.0\"?><methodCall><methodName>pingback.ping</me
thodName><params><param><value><string>http://www.remote.net</string></
value></param><param><value><string>http://example.com/somepage</string
></value></param></params></methodCall>", 252) = 252

This is a pingback; http://www.remote.net is kindly letting us know that it has linked to http://example.com/somepage. As part of the protocol, we then open that URL and look for the link to verify that this pingback is accurate.

Lets go find that!

connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("12.34.56.78")}, 16) = -1
EINPROGRESS (Operation now in progress)
poll([{fd=7, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 10000) = 1 ([{fd=7, revents=POLLOUT}])
getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(7, "GET / HTTP/1.0\r\nHost: www.remote.net\r\nUser-agent: WordPress/3.9.1; http://example.com
; verifying pingback from 146.185.251.102\r\nX-Pingback-Forwarded-For: 146.185.251.102\r\n\r\n", 182
, MSG_DONTWAIT, NULL, 0) = 182
recvfrom(7, "HTTP/1.0 200 OK\r\nContent-Type: text/html\r\nContent-Length:   894\nPragma: no-cache\r
\nCache-control: no-store\r\n\r\n<html><body><script>var s=\"\",t=\"\",w,u=new Array(),v;function q(
r){for(w=0;w<v.length;w++)u[w]=v.charCodeAt(w);w=\"w=52;while(true){if(w<3)break;u[w]=((((u[w]^180)-
58)&0xff)+226)&0xff;w--;}\";eval(w);w=\"w=53;do{u[w]=((((((u[w]+u[w-1])&0xff)<<1)&0xff)|(((u[w]+u[w-
1])&0xff)>>7))>>6)|(((((((u[w]+u[w-1])&0xff)<<1)&0xff)|(((u[w]+u[w-1])&0xff)>>7))<<2)&0xff);}while(-
-w>=4);\";eval(w);w=\"w=51;for(;;w--){if(w<2)break;u[w]=(~(((((~u[w])&0xff)<<5)&0xff)|(((~u[w])&0xff
)>>3)))&0xff;}\";eval(w);v=\"\";for(w=1;w<u.length-1;w++)if(w%5)v+=String.fromCharCode(u[w]^r);w=eva
l;w(v);}v=\"\\xa8Z\\x22\\xc63\\xcf\\xc3\\x27\\x94fQ\\x9cd\\x97&(\\xc6\\xf0\\xcf\\x83\\xaf\\x13\\xe4\
\x1d\\xe6\\xd3\\x17\\xe7\\x11\\xeb\\xd1\\xef\\xc51\\xc6=6\\xf69\\x86\\xeb\\xe3\\x06Kt\\xec~\\xf0\\x0
0\\x89\\xadV\\x1b)\\xb8\";q(45);</script><script>var u=2;for(;u==1;u++);</script><br><br><br><center
><h3><p>\267\303\316\312\261\276\322\263\303\346\243\254\304\372\265\304\344\257\300\300\306\367\320
\350\322\252\326\247\263\326JavaScript</p></h3></center></body></html>", 8192, MSG_DONTWAIT, NULL, N
ULL) = 1003
poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 10000) = 0 (Timeout)
poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 10000

We open an HTTP request to the URL, and get back a block of obfuscated JavaScript. Then, the connection blocks forever.

My best guess is that there is a bug in WordPress' HTTP request model in this version. If the connection doesn't close, then WordPress hangs on it. When the remote server doesn't close the connection due to either (1) a bug, or (2) on purpose, we wait forever.

I can't tell which of those it is, but I can de-obfuscate that JavaScript (and if you swap out the eval for console.log (and run the code), you get window.location=qi+"/?jdfwkey=1kjv92"+ri). I have no idea what this is supposed to do.

At this point I have "solved" the problem by disabling XML-RPC on this WordPress install. I would love to keep poking at it to come to a fuller picture, but I'd rather keep this site up than satisfy my absolute curiosity.

(Hostnames and IPs have been changed to mask the identity of remote hosts, as I do not currently know where their allegiances lie.)

Posted . Categories: .