How to process thousands of WordPress posts without hitting or raising memory limits.

So you need to write a script that processes all the posts in your wordpress database, you don’t need to use the stupid wordpress loop because you’re not writing web facing code, you might just need to fix some metadata on each one of the posts, but every time you iterate through your posts, no matter if you use the wordpress api, or even if you try to fetch the post IDs directly with MySQL and then call “wp_get_post($id)” after a few hundred posts your PHP interpreter dies when it uses all the memory you’ve given it.

You Google and every other clueless php-wordpress-noob that thinks himself of a programmer will give you the “Raise your PHP’s memory limit” or “Do it in parts” answer.

WTF is it with these noobs?

You a programmer used to real programming languages start to curse on the frustration of having to get dirty with this awful language and badly documented wordpress “API”. The noobs don’t understand that you might have to deal with tens of thousands of posts that couldn’t possibly fit in memory and you know…

is the solution to the problem is to free memory?

The WordPress geniuses created this WP Object cache which is used by default whenever you invoke “get_post()” or other functions. The bastards didn’t bother to mention it on the function documentations, nor they bothered to put a little note on how to disable it in case you didn’t need it. This is what happens when you get people that don’t think about all the possible uses of an API.

If you start iterating through a list of IDs, and you start invoking get_post($somePostId,’OBJECT’), and you start to print how much memory is available you will see how get_post() does keep the posts in memory, if you read get_post() and dig further you will see objects being cached in the in-memory WP Object Cache, a half-ass solution would be to invoke wp_cache_flush() every now and then:

[php]
$post_ids_cursor = mysqli_query(“select ID from wp_posts where post_status=’publish’ order by post_date desc”);
$n = 0;
$last_memory_usage = memory_get_usage();

while ($row = mysqli_fetch_row($post_ids_cursor)) {
//this son of a bitch caches the post object.
//nowhere on the WordPress documentation for the function it says so
//http://codex.wordpress.org/Function_Reference/get_post
$post = get_post($row[0],’OBJECT’);

$memory_usage = memory_get_usage();
$delta_memory = $memory_usage – $last_memory_usage;
$last_memory_usage = $memory_usage;

echo “($n) ” . $memory_usage . ” ($delta_memory) \n”;

$n++;

//flush php’s cache every 100 posts, and let’s see what happens.
if ($n % 100 == 0) {
wp_cache_flush();
echo “Flush!\n”;
}
}
[/php]

[bash]
//N post – Memory used – Delta memory used
(0) 30254136 (13984) //start about 28.85MB
(1) 30262280 (8144)
(2) 30269592 (7312)
(3) 30277656 (8064)
(4) 30285720 (8064)
(5) 30293784 (8064)
(6) 30301848 (8064)
(7) 30309928 (8080)
(8) 30318056 (8128)
(9) 30326120 (8064)
(10) 30334184 (8064)

(93) 31054104 (8104)
(94) 31062168 (8064)
(95) 31070232 (8064)
(96) 31078344 (8112)
(97) 31086440 (8096)
(98) 31094552 (8112)
(99) 31102632 (8080) //already here at 29.66MB
Flush!
(100) 29816984 (-1285648) //bam we’ve freed 1.22MB with the flush call
[/bash]

However this solution is slow, WordPress will use and free unnecessary dynamic memory and it’ll check the cache with no luck every time you get a new object, which is the case of a linear scan like the one we have to do to batch process our posts.

Luckily someone in the wordpress team put a way to disable caching, so the real solution is…

To not use dynamic memory at all if you don’t need it

When you read the code of “wp_post_cache” every time $wp_object_cache->add() is invoked, that code always checks to see if caching has been suspended using a function called “wp_suspend_cache_addition()”

[php]
function add( $key, $data, $group = ‘default’, $expire = ” ) {
if ( wp_suspend_cache_addition() )
return false;

[/php]

This function can be used to turn off the freaking caching, this way you can iterate much faster through all your posts, every object fetched from the database will be kept in the stack of your loop and by not needing to flush or check the cache your processing will be much much faster.

This is how you turn it off:

[php]
wp_suspend_cache_addition(true);
[/php]

Hope this helped you process your posts in batch efficiently, leave a tip on the way out if I saved your ass.

One thought on “How to process thousands of WordPress posts without hitting or raising memory limits.

  1. Oh man!!! Thanks! I thought I was gonna go crazy reading those noob answers. I was looking for how to free up the memory but this cache disabling is genious! Thanks!!!

Leave a Reply

Your email address will not be published. Required fields are marked *