WordPress, and the Pingback of Death
The journey to discover why I couldn't keep a website up.
I host a number of websites for clients, friends, and family. A solid number of those are running WordPress.
I rarely suffer problems with them... except for one site. This site has been going down, and staying down, to the point that I routinely SSH in to forceably restart the locked-up PHP processes.
I've tried to fix it in the past to not avail. Previously I've migrated the site to a new host with an updated OS, tweakes a great many configuration settings all over the system and site, and very recently I've changed the server setup to match that of other high-traffic WordPress sites I host.
But the lockups have been increasing with frequency to the point where, today, the site would not stay up for more than 30 minutes before refusing connections.
Finally fed up, this is my journey to fix it.
I've been fighting this for quite a while with no traditional remedy in sight, so I immediately dragged out the big guns, and strace one of the locked processes (which prints out every system call a process makes):
# Find a PHP process to trace: $ ps -ax | grep php 15965 ? Ss 0:00 php-fpm: master process (/etc/php5/fpm/php-fpm.conf) 15968 ? S 0:02 php-fpm: pool www 15969 ? S 0:01 php-fpm: pool www 16990 pts/9 S+ 0:00 grep --color=auto php # Start the trace: $ strace -p 15969 Process 15969 attached restart_syscall(<... resuming interrupted call ...>) = 0 poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout) poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 10000) = 0 (Timeout) poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout) poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 10000) = 0 (Timeout) # This repeated every 10 seconds.
Something is stuck in a loop, polling file descriptor 7. Lets figure out what that is:
$ lsof -p 15969 php5-fpm 15969 www-data 0u unix /var/run/php5-fpm.sock php5-fpm 15969 www-data 1u CHR /dev/null php5-fpm 15969 www-data 2u CHR /dev/null php5-fpm 15969 www-data 3u REG /tmp/.ZendSem.IRXhFc (deleted) php5-fpm 15969 www-data 5w REG /var/log/php5-fpm/www.access.log php5-fpm 15969 www-data 7u IPv4 gould.mikeboers.com:57520->12.34.56.78:http (ESTABLISHED) php5-fpm 15969 www-data 9u 0000 anon_inode
The file descriptor is an IPv4 socket, connected to an HTTP server at 12.34.56.78
. Visiting that site yields 403 Forbidden
.
This lockup has been happening on a nearly regular schedule today, so lets get in front of it and just wait for it to happen again:
# We have restarted php5-fpm, and found the new PID of one of the workers. # We ask strace to output more of the strings it captures. $ strace -p 17304 -s 1024 accept(0, ...
And we wait. Every time a request is made to the server there is a flurry of activity. Usually between 100kB to 1MB of traces; a lot happens when WordPress serves a page.
Finally, a request comes in and the terminal ends with the previous polling loop. There are 168kB to investigate. Lets take a look at some of the more interesting pieces.
The FastCGI headers tell us about the request itself:
read(4, "\f\0QUERY_STRING\16\4REQUEST_METHODPOST\f\10CONTENT_TYPEtext/xml\16\3CONTENT_LENGTH252\0170 SCRIPT_FILENAME/srv/example.com/www/httpdocs/xmlrpc.php\v\vSCRIPT_NAME/xmlrpc.php\v\vREQUEST_URI/xml rpc.php\f\vDOCUMENT_URI/xmlrpc.php\r%DOCUMENT_ROOT/srv/example.com/www/httpdocs\17\10SERVER_PROTOCOL HTTP/1.0\21\7GATEWAY_INTERFACECGI/1.1\17\vSERVER_SOFTWAREnginx/1.4.6\v\17REMOTE_ADDR146.185.251.102\ v\5REMOTE_PORT59263\v\rSERVER_ADDR198.74.49.113\v\2SERVER_PORT80\v\23SERVER_NAMEexample.com\17\3REDI RECT_STATUS200\0170SCRIPT_FILENAME/srv/example.com/www/httpdocs/xmlrpc.php\t\23HTTP_HOSTexample.com\ 21\10HTTP_CONTENT_TYPEtext/xml\23\3HTTP_CONTENT_LENGTH252\0172HTTP_USER_AGENTMozilla/4.0 (compatible : MSIE 7.0; Windows NT 6.0)\17\5HTTP_CONNECTIONclose\0\0", 720) = 720
This request is a POST to http://example.com/xmlrpc.php
.
Finding the body will tell us about the XML-RPC request contained within:
read(4, "<?xmlversion=\"1.0\"?><methodCall><methodName>pingback.ping</me thodName><params><param><value><string>http://www.remote.net</string></ value></param><param><value><string>http://example.com/somepage</string ></value></param></params></methodCall>", 252) = 252
This is a pingback; http://www.remote.net
is kindly letting us know that it has linked to http://example.com/somepage
. As part of the protocol, we then open that URL and look for the link to verify that this pingback is accurate.
Lets go find that!
connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("12.34.56.78")}, 16) = -1 EINPROGRESS (Operation now in progress) poll([{fd=7, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 10000) = 1 ([{fd=7, revents=POLLOUT}]) getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 sendto(7, "GET / HTTP/1.0\r\nHost: www.remote.net\r\nUser-agent: WordPress/3.9.1; http://example.com ; verifying pingback from 146.185.251.102\r\nX-Pingback-Forwarded-For: 146.185.251.102\r\n\r\n", 182 , MSG_DONTWAIT, NULL, 0) = 182 recvfrom(7, "HTTP/1.0 200 OK\r\nContent-Type: text/html\r\nContent-Length: 894\nPragma: no-cache\r \nCache-control: no-store\r\n\r\n<html><body><script>var s=\"\",t=\"\",w,u=new Array(),v;function q( r){for(w=0;w<v.length;w++)u[w]=v.charCodeAt(w);w=\"w=52;while(true){if(w<3)break;u[w]=((((u[w]^180)- 58)&0xff)+226)&0xff;w--;}\";eval(w);w=\"w=53;do{u[w]=((((((u[w]+u[w-1])&0xff)<<1)&0xff)|(((u[w]+u[w- 1])&0xff)>>7))>>6)|(((((((u[w]+u[w-1])&0xff)<<1)&0xff)|(((u[w]+u[w-1])&0xff)>>7))<<2)&0xff);}while(- -w>=4);\";eval(w);w=\"w=51;for(;;w--){if(w<2)break;u[w]=(~(((((~u[w])&0xff)<<5)&0xff)|(((~u[w])&0xff )>>3)))&0xff;}\";eval(w);v=\"\";for(w=1;w<u.length-1;w++)if(w%5)v+=String.fromCharCode(u[w]^r);w=eva l;w(v);}v=\"\\xa8Z\\x22\\xc63\\xcf\\xc3\\x27\\x94fQ\\x9cd\\x97&(\\xc6\\xf0\\xcf\\x83\\xaf\\x13\\xe4\ \x1d\\xe6\\xd3\\x17\\xe7\\x11\\xeb\\xd1\\xef\\xc51\\xc6=6\\xf69\\x86\\xeb\\xe3\\x06Kt\\xec~\\xf0\\x0 0\\x89\\xadV\\x1b)\\xb8\";q(45);</script><script>var u=2;for(;u==1;u++);</script><br><br><br><center ><h3><p>\267\303\316\312\261\276\322\263\303\346\243\254\304\372\265\304\344\257\300\300\306\367\320 \350\322\252\326\247\263\326JavaScript</p></h3></center></body></html>", 8192, MSG_DONTWAIT, NULL, N ULL) = 1003 poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout) poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 10000) = 0 (Timeout) poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout) poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 10000
We open an HTTP request to the URL, and get back a block of obfuscated JavaScript. Then, the connection blocks forever.
My best guess is that there is a bug in WordPress' HTTP request model in this version. If the connection doesn't close, then WordPress hangs on it. When the remote server doesn't close the connection due to either (1) a bug, or (2) on purpose, we wait forever.
I can't tell which of those it is, but I can de-obfuscate that JavaScript (and if you swap out the eval
for console.log
(and run the code), you get window.location=qi+"/?jdfwkey=1kjv92"+ri
). I have no idea what this is supposed to do.
At this point I have "solved" the problem by disabling XML-RPC on this WordPress install. I would love to keep poking at it to come to a fuller picture, but I'd rather keep this site up than satisfy my absolute curiosity.
(Hostnames and IPs have been changed to mask the identity of remote hosts, as I do not currently know where their allegiances lie.)