RabbitMQ(5):work_queues

(using php-amqplib

在上一个实例中(hello_word)我们写了两个程序,一个用于发送消息,一个用于接受消息。

在这个实例中,我们将完成一个在给多个消费者分配耗时任务的工作队列。

工作队列(又称:任务队列)背后的核心思想是尽量避免立即执行资源密集型任务、必须等待当前任务执行完成才能执行其他任务。取而代之的是,稍后调度这个任务。

我们将任务封装为消息,并且将它发送到队列中。一个在后台运行的工作进程将从队列中pop消息,并执行它。

如果有多个消息消费者,所有任务将根据调度策略被分配给多个消息消费者。

这个概念在web应用程序中特别有用,因为HTTP请求的窗口期很短无法用来处理复杂的任务。

准备

在上一个实例(hello_world)中,我们发送了一个消息(Hello,World!)。现在我们将会发送一个字符串代表一个复杂任务。我们没有真实世界的任务,比如要调整大小的图片或者要渲染的pdf文件。所以我们通过使用sleep()函数来模拟这个任务。在这个任务中我们使用.来代表它的复杂程度,每一个.代表一秒的工作时间。

我们稍微修改一下前面的send.php的代码。允许从命令行发送任意的消息。它将会将任务发到我们的工作队列,将它命名为new_task.php

1
2
3
4
5
6
7
8
9
$data = implode(' ', array_slice($argv, 1));
if (empty($data)) {
$data = "Hello World!";
}
$msg = new AMQPMessage($data);

$channel->basic_publish($msg, '', 'hello');

echo ' [x] Sent ', $data, "\n";

我们以前的receive.php脚本也需要一些改变;在消息体中每一个.将代表1秒的工作时间。这个程序将会从队列中弹出任务并执行这个任务。我们将其命名为worker.php

1
2
3
4
5
6
7
$callback = function ($msg) {
echo ' [x] Received ', $msg->body, "\n";
sleep(substr_count($msg->body, '.'));
echo " [x] Done\n";
};

$channel->basic_consume('hello', '', false, true, false, false, $callback);

注意:我们模拟的任务模拟的是任务的执行的时间。

在实例中执行:

1
2
# shell 1
php worker.php
1
2
# shell 2
php new_task.php "A very hard task which takes two seconds.."

循环调度

使用任务队列的一个优点就是能够轻松的并行工作。如果我们的任务执行压力增加,可以通过添加消费者的形式,轻松扩展。

首先,我们同时运行两个worker.php脚本。它们都将从消息队列中获取消息,但究竟是怎样的?让我们来看一下。

你需要打开三个命令行窗口,其中两个用来运行worker.php脚本。这两个命令行是我们的消费者C1和C2。

1
2
3
# shell 1
php worker.php
# => [*] Waiting for messages. To exit press CTRL+C
1
2
3
# shell 2
php worker.php
# => [*] Waiting for messages. To exit press CTRL+C

在第三个命令行中我们将会发布新的消息。

1
2
3
4
5
6
# shell 3
php new_task.php First message.
php new_task.php Second message..
php new_task.php Third message...
php new_task.php Fourth message....
php new_task.php Fifth message.....

让我们看看任务的调度情况:

1
2
3
4
5
6
# shell 1
php worker.php
# => [*] Waiting for messages. To exit press CTRL+C
# => [x] Received 'First message.'
# => [x] Received 'Third message...'
# => [x] Received 'Fifth message.....'
1
2
3
4
5
# shell 2
php worker.php
# => [*] Waiting for messages. To exit press CTRL+C
# => [x] Received 'Second message..'
# => [x] Received 'Fourth message....'

By default, RabbitMQ will send each message to the next consumer, in sequence. On average every consumer will get the same number of messages. This way of distributing messages is called round-robin. Try this out with three or more workers.

Message acknowledgment

Doing a task can take a few seconds. You may wonder what happens if one of the consumers starts a long task and dies with it only partly done. With our current code, once RabbitMQ delivers a message to the customer it immediately marks it for deletion. In this case, if you kill a worker we will lose the message it was just processing. We’ll also lose all the messages that were dispatched to this particular worker but were not yet handled.

But we don’t want to lose any tasks. If a worker dies, we’d like the task to be delivered to another worker.

In order to make sure a message is never lost, RabbitMQ supports message acknowledgments. An ack(nowledgement) is sent back by the consumer to tell RabbitMQ that a particular message has been received, processed and that RabbitMQ is free to delete it.

If a consumer dies (its channel is closed, connection is closed, or TCP connection is lost) without sending an ack, RabbitMQ will understand that a message wasn’t processed fully and will re-queue it. If there are other consumers online at the same time, it will then quickly redeliver it to another consumer. That way you can be sure that no message is lost, even if the workers occasionally die.

There aren’t any message timeouts; RabbitMQ will redeliver the message when the consumer dies. It’s fine even if processing a message takes a very, very long time.

Message acknowledgments are turned off by default. It’s time to turn them on by setting the fourth parameter to basic_consume to false (true means no ack) and send a proper acknowledgment from the worker, once we’re done with a task.

1
2
3
4
5
6
7
8
$callback = function ($msg) {
echo ' [x] Received ', $msg->body, "\n";
sleep(substr_count($msg->body, '.'));
echo " [x] Done\n";
$msg->delivery_info['channel']->basic_ack($msg->delivery_info['delivery_tag']);
};

$channel->basic_consume('task_queue', '', false, false, false, false, $callback);

Using this code we can be sure that even if you kill a worker using CTRL+C while it was processing a message, nothing will be lost. Soon after the worker dies all unacknowledged messages will be redelivered.

Acknowledgement must be sent on the same channel the delivery it is for was received on. Attempts to acknowledge using a different channel will result in a channel-level protocol exception. See the doc guide on confirmations to learn more.

Message durability

We have learned how to make sure that even if the consumer dies, the task isn’t lost. But our tasks will still be lost if RabbitMQ server stops.

When RabbitMQ quits or crashes it will forget the queues and messages unless you tell it not to. Two things are required to make sure that messages aren’t lost: we need to mark both the queue and messages as durable.

First, we need to make sure that RabbitMQ will never lose our queue. In order to do so, we need to declare it as durable. To do so we pass the third parameter to queue_declare as true:

1
$channel->queue_declare('hello', false, true, false, false);

Although this command is correct by itself, it won’t work in our present setup. That’s because we’ve already defined a queue called hello which is not durable. RabbitMQ doesn’t allow you to redefine an existing queue with different parameters and will return an error to any program that tries to do that. But there is a quick workaround - let’s declare a queue with different name, for example task_queue:

1
$channel->queue_declare('task_queue', false, true, false, false);

This flag set to true needs to be applied to both the producer and consumer code.

At this point we’re sure that the task_queue queue won’t be lost even if RabbitMQ restarts. Now we need to mark our messages as persistent - by setting the delivery_mode = 2 message property which AMQPMessage takes as part of the property array.

1
2
3
4
$msg = new AMQPMessage(
$data,
array('delivery_mode' => AMQPMessage::DELIVERY_MODE_PERSISTENT)
);

Fair dispatch

You might have noticed that the dispatching still doesn’t work exactly as we want. For example in a situation with two workers, when all odd messages are heavy and even messages are light, one worker will be constantly busy and the other one will do hardly any work. Well, RabbitMQ doesn’t know anything about that and will still dispatch messages evenly.

This happens because RabbitMQ just dispatches a message when the message enters the queue. It doesn’t look at the number of unacknowledged messages for a consumer. It just blindly dispatches every n-th message to the n-th consumer.

In order to defeat that we can use the basic_qos method with the prefetch_count = 1 setting. This tells RabbitMQ not to give more than one message to a worker at a time. Or, in other words, don’t dispatch a new message to a worker until it has processed and acknowledged the previous one. Instead, it will dispatch it to the next worker that is not still busy.

1
$channel->basic_qos(null, 1, null);

Putting it all together

Final code of our new_task.php file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<?php

require_once __DIR__ . '/vendor/autoload.php';
use PhpAmqpLib\Connection\AMQPStreamConnection;
use PhpAmqpLib\Message\AMQPMessage;

$connection = new AMQPStreamConnection('localhost', 5672, 'guest', 'guest');
$channel = $connection->channel();

$channel->queue_declare('task_queue', false, true, false, false);

$data = implode(' ', array_slice($argv, 1));
if (empty($data)) {
$data = "Hello World!";
}
$msg = new AMQPMessage(
$data,
array('delivery_mode' => AMQPMessage::DELIVERY_MODE_PERSISTENT)
);

$channel->basic_publish($msg, '', 'task_queue');

echo ' [x] Sent ', $data, "\n";

$channel->close();
$connection->close();

(new_task.php source)

And our worker.php:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<?php

require_once __DIR__ . '/vendor/autoload.php';
use PhpAmqpLib\Connection\AMQPStreamConnection;

$connection = new AMQPStreamConnection('localhost', 5672, 'guest', 'guest');
$channel = $connection->channel();

$channel->queue_declare('task_queue', false, true, false, false);

echo " [*] Waiting for messages. To exit press CTRL+C\n";

$callback = function ($msg) {
echo ' [x] Received ', $msg->body, "\n";
sleep(substr_count($msg->body, '.'));
echo " [x] Done\n";
$msg->delivery_info['channel']->basic_ack($msg->delivery_info['delivery_tag']);
};

$channel->basic_qos(null, 1, null);
$channel->basic_consume('task_queue', '', false, false, false, false, $callback);

while (count($channel->callbacks)) {
$channel->wait();
}

$channel->close();
$connection->close();

(worker.php source)

Using message acknowledgments and prefetch you can set up a work queue. The durability options let the tasks survive even if RabbitMQ is restarted.

Now we can move on to tutorial 3 and learn how to deliver the same message to many consumers.