Understanding the effect of MQ persistence on disk performance

Recently I have been trying to determine what the impact of using MQ message persistence is on disk subsystem performance. There is alot of literature from IBM recommending ideal configurations to support MQ persistence, so I won’t turn this into a post that recommends ideal settings. What I did want to achieve though, was to get a better understanding of how MQ interacts with the disk subsystem when using message persistence, so that I could plan for capacity and make better recommendations for tuning on a production SAN environment.


The first behaviour I wanted to describe, was whether the use of MQ persistence creates random or sequential disk activity, on the assumption that random disk access to block locations on the disk would incur a greater time penalty than sequential access.

In order to setup a test environment, I installed MQ version 6 on winXP platform using the default settings. This test was setup using a custom java load harness to put/get messages from a default queue which had persistence enabled with DEFPSIST(YES). File and disk statistics were collected on a Win32 platform using filemon and diskmon respectively.

The following graphs depict the block size written over time (x axis) relative to the sector position (y axis) of the disk drive being monitored. I was trying to achieve a similar effect to the Solaris internals TAZ disk trace utility, albeit in a Windows environment. File stats such as number and size of write operations were collected using filemon. Transaction throughput was measured by dividing the total amount of transactions (1,000) by the total processing time. A number of different configurations were tested, but I chose to present a common setup which uses circular logging with single write, and a file page size of 16,384 x 4KB = 64MB per primary log file. The two variations I have presented are writes (MQPUT) only as well as writes (MQPUT) and reads (MQGET) concurrently. The latter being a more familiar situation for me in the workplace.
Scenario 1 MQ Mersistence
Scenario 2 MQ Mersistence

It can be determined from the previous test results that MQ accesses the disk in a sequential behaviour. On windows, depending on the concurrent load, it will tend to allocate 16 – 32 x 512B blocks ranging from 8 – 16KB in size, although smaller less frequent writes will favour smaller 8 x 512B blocks 4KB in size. As a result, a high number of disk write IOs are observed.

The environment used to test was virtual (parallels), so limited disk cacacity was available, however the change in amplitude for trans/sec is considered relative. As a result, over an 80% reduction in throughput was observed when reading (MQGET) and writing (MQPUT) to the same queue. This will have significant impact on disk subsystem performance in a production environment.

I was unable to perform this test on a Solaris environment due to lack of privileged access, so would be keen to see what the results are there. I believe Solaris uses the same 512B block size, so am expecting similar results. Although analysing an MQ server’s impact on disk performance using sar, produces the following information:
sar -d 1 10 | egrep 'md72'
md72 60 0.7 1867 16749 0.0 0.4
md72 58 0.6 1955 17272 0.0 0.3
md72 52 0.6 1768 15619 0.0 0.3
md72 52 0.6 1764 15577 0.0 0.3
md72 54 0.6 1817 16045 0.0 0.3
md72 56 0.6 1861 16447 0.0 0.3

From that information I can then infer that the total reads + writes/sec (r+w/s) divided by the number of 512B blocks per second (blk/s) would give me an average size of around 8-9KB/sec being written to the filesystem, which correlates my assumptions on block/file sizes. Using sar and other tools like iostat though is thought to be fraught with danger, especially when monitoring a rather complex SAN subsystem; but without access to any other tools, I’m stuck with the basics. To date I’ve been relying on perceived throughput (Mr+Mw/s) reported by iostat -xM

Worth noting, is that when MQ performs an MQGET on a persistent queue, it also causes a write IO to update the active log file, so the number of total write IOs effectively doubles.

Out of all this I’m working on the following assumptions:
1. MQ uses sequential disk access for persistence logging.
2. The OS is typically allocating between 16 – 32 x 512B blocks ranging between 8 – 16KB in size.
3. MQGETs are a contributor to overall writes (I’m assuming o remove the transactions from the log files)
4. Transaction throughput is significantly reduced when an application is reading (MQGET) and writing (MQPUT) to the same queue.

Attached below is a copy of the qm.ini I am working with, which is based on recommended performance tuning considerations from IBM
#*******************************************************************#
#* Module Name: qm.ini *#
#*******************************************************************#
#ExitPath:
ExitsDefaultPath=/var/mqm/exits/
ExitsDefaultPath64=/var/mqm/exits64/
#Log:
LogPrimaryFiles=15
LogSecondaryFiles=15
LogFilePages=16384
LogType=CIRCULAR
LogBufferPages=512
LogPath=/var/mqm/log/QUEUEMANGERNAME/
LogWriteIntegrity=SingleWrite
#Service:
Name=AuthorizationService
EntryPoints=13
#ServiceComponent:
Service=AuthorizationService
Name=MQSeries.UNIX.auth.service
Module=/opt/mqm/lib64/amqzfu
ComponentDataSize=0
#Channels:
MaxChannels=400
MaxActiveChannels=400
#TCP:
KeepAlive=Yes
#TuningParameters:
DefaultQBufferSize=1048576
DefaultPQBufferSize=1048576

LogFilePages is set to the maximum 16384 x 4KB making each S0000[n].log file about 64MB in size. You must set this parameter as part of your Queue Manager creation i.e. crtmqm -lf 16384 Hopefully that decreases the frequency MQ needs to loop around the ring of log files. LogPrimaryFiles and LogSecondary files are set to 15, and I haven’t run out of space in my circular log files yet. It is worth reading up on the MQ system administration manual for this, in terms of planning how much space you need. As a planning figure I add the size of MQPUT messages with DEFPSIST + 750B overhead as IBM state, then multiply that amount by my expected throughput. Checkpoint behaviour as part of circular logging activity should hopefully release space as it checkpoints (once every 10,000 transactions I believe), so if you plan properly, you shouldn’t hit the tail end of your ring too soon :)

The DefaultQBufferSize is 64KB which I increased to 1MB (1048576). Apparently this increases the size of the buffer before writing to the file system, but I haven’t seen the evidence of that with the tools I’m using to monitor disk writes. This area is a little cloudy for me.

I have also modified the LogBufferPages to 512 x 4KB giving me a 2MB buffer for similar reasons.

And finally, I’ve been using a combination of either MQ, a custom perl script, or EMC iorate to benchmark message persistence on different mounts/filesystems etc which has been useful. Especially when you don’t have one or the other available, and you need to determine if the space you’ve been allocated is up to the task of supporting MQ persistence.

In order to truss the process on a Solaris 9 environment, we first had to establish which pid was writing to the log mount. Our sysadmin was able to do this for me with the following:

# fuser -cu /var/mqm/log
/var/mqm/log: 27306c(root) 27197c(root) 19735o(sr123456)

In this case the sr user was the account under which mqm was running, so we were then able to determine the corresponding process name:

# top -s sr123456
2756 sr123456 11 59 0 0K 0K sleep 313:05 12.08% amqzlaa0_nd
20162 sr123456 10 59 0 0K 0K cpu45 406:47 9.49% amqrmppa
19735 sr123456 6 59 0 0K 0K sleep 718:45 3.47% amqzmuc0
19742 sr123456 14 59 0 0K 0K sleep 22:22 0.00% amqzlaa0_nd
19744 sr123456 1 59 0 0K 0K sleep 1:36 0.00% amqpcsea
19752 sr123456 3 59 0 0K 0K sleep 0:16 0.00% runmqlsr_nd
19734 sr123456 2 59 0 0K 0K sleep 0:10 0.00% amqzfuma
19733 sr123456 14 59 0 0K 0K sleep 0:06 0.00% amqzxma0_nd
19751 sr123456 1 59 0 0K 0K sleep 0:00 0.00% runmqtrm
19749 sr123456 1 59 0 0K 0K sleep 0:00 0.00% runmqtrm
19739 sr123456 4 59 0 0K 0K sleep 0:00 0.00% amqzmgr0
19746 sr123456 1 59 0 0K 0K sleep 0:00 0.00% runmqtrm
19747 sr123456 1 59 0 0K 0K sleep 0:00 0.00% runmqtrm
19737 sr123456 2 59 0 0K 0K sleep 0:00 0.00% amqrrmfa
19738 sr123456 2 59 0 0K 0K sleep 0:00 0.00% amqzdmaa

So in this case, the process named amqzmuc0 was the process performing the logging.

For reference, here is a list of processes and what they typically control for MQ:

1. RUNMQLSR - MQ TCP listener (multi-threaded)
2. AMQCLMAA - MQ TCP listener (single-threaded)
3. AMQRMPPA - Channel process pooling job
4. RUNMQCHI - MQ channel initiator
5. AMQCRSTA - MQ receiving MCA jobs
6. RUNMQCHL - MQ sending MCA jobs
7. AMQCRS6B - LU62 receiver channel
8. AMQPCSEA - MQ command server
9. RUNMQTRM - Application trigger monitor
10. RUNMQDLQ - Dead letter queue handler
11. AMQFCXBA - MQ Broker Worker Job
12. RUNMQBRK - MQ Broker Control Job
13. AMQZMUC0 - MQ Utility Manager
14. AMQZMUR0 - MQ Utility Manager
15. AMQZMGR0 - MQ Process Controller
16. AMQRRMFA - MQ cluster repository manager
17. AMQZDMAA - MQ deferred message manager
18. AMQALMPX - MQ Log Manager
19. AMQZFUMA - MQ Object Authority Manager
20. AMQZLAS0 - MQ Local Queue Manager agents
21. AMQZLAA0 - MQ Local Queue Manager agents
22. AMQZXMA0 - MQ Execution Controller

With this info at hand we could then truss the process as per the following:

# truss -D -p 19735

Which produces the following output:

12677624.0012 lseek(18, 2543616, SEEK_SET) = 2543616
12677624.0017 write(18, "0FEE0315\001\0\0\0D6A4 f".., 4096) = 4096
12677624.0018 lwp_cond_broadcast(0xFFFFFFFF7B46CAC0) = 0
12677624.0018 lwp_mutex_wakeup(0xFFFFFFFF7B46CAA8) = 0
12677624.0019 lseek(18, 2547712, SEEK_SET) = 2547712
12677624.0023 write(18, "0FEE02FD\001\0\0\0D6A4 f".., 4096) = 4096
12677624.0024 lwp_cond_broadcast(0xFFFFFFFF7B46C3D8) = 0
12677624.0024 lwp_mutex_wakeup(0xFFFFFFFF7B46C3C0) = 0
12677624.0024 lwp_cond_broadcast(0xFFFFFFFF7B46CCE0) = 0
12677624.0025 lwp_mutex_wakeup(0xFFFFFFFF7B46CCC8) = 0
12677624.0025 lwp_cond_broadcast(0xFFFFFFFF7B46BE88) = 0
12677624.0026 lwp_mutex_wakeup(0xFFFFFFFF7B46BE70) = 0
12677624.0026 lseek(18, 2551808, SEEK_SET) = 2551808
12677624.0030 write(18, "\tD10206\001\0\0\0D6A4 f".., 4096) = 4096

What we can see from this test case is that Solaris is writing 4KB sized blocks of data to the file system in support of MQ persistent logging. I would prefer a higher size (say 8 – 16KB) as the SAN under test would be more efficient and capable of higher throughput, but I guess that is a limitation of the size of messages I am writing (185B per message) and the way Solaris breaks up the IO.

There is a comment from an IBM performance tuning guide for message persistence that states:It is unlikely that poor persistent message throughput will be attributed to the 2MB limit of the queue manager log. It is possible to fill and empty the log buffer several times each second and reach a CPU limit writing data into the log buffer, before a log disk bandwidth limit is reached.

In this, they are referring to the LogBufferPages parameter which I have increased to its maximum configurable size of 512 x 4K pages = 2MB. At this point in time I am still working with the sysadmin in an effort to prove this statement provided by IBM. For the time being we are investigating the seemingly high number of syscalls and context switching which is evident from vmstat:

# vmstat 1
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m0 m1 m3 m4 in sy cs us sy id
0 1 0 27901544 13821744 0 6 8 0 0 0 0 0 0 0 0 11369 173717 83335 16 24 60
0 1 0 27901544 13821688 0 0 8 0 0 0 0 0 0 0 0 11201 175563 84711 14 20 66
0 1 0 27901536 13821648 635 4573 0 0 0 0 0 0 0 0 0 11841 179265 84000 14 26 59
0 0 0 27900496 13821472 847 6045 8 8 8 0 0 0 0 0 0 13003 178012 82634 17 25 58
2 0 0 27901552 13821528 0 0 8 0 0 0 0 0 0 0 0 10994 174223 83744 15 20 64
0 1 0 27901544 13821480 0 0 0 0 0 0 0 0 0 0 0 11210 175509 83986 15 18 66
0 1 0 27901544 13821440 0 0 8 0 0 0 0 0 0 0 0 11178 174778 83952 17 19 63

Another aspect I am investigating is the use of threads my the amqzmuc0 process. It would seem that for a given load amqzmuc0 is spawns around XX threads for that process. Looking at the prstat for that process confirms the following:

# prstat -mL -p 29314
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
29314 sr53186 0.4 0.7 - - - - 3.2 - 564 157 1K 0 amqzmuc0/3
29314 sr53186 0.0 0.0 - - - - 0.4 - 1 0 1 0 amqzmuc0/4
29314 sr53186 0.0 0.0 - - - - 0.0 - 0 0 0 0 amqzmuc0/6
29314 sr53186 0.0 0.0 - - - - 0.0 - 0 0 0 0 amqzmuc0/5
29314 sr53186 0.0 0.0 - - - - 100 - 0 0 0 0 amqzmuc0/2
29314 sr53186 0.0 0.0 - - - - 0.0 - 0 0 0 0 amqzmuc0/1

4 comments to Understanding the effect of MQ persistence on disk performance

  • Tim

    I found out another way to determine how big the block sizes are when writing to disk on a Solaris subsystem, but it requires root access or at least mqm access in order to run.

    Essentially, if I truss the MQ queue manager pid whilst under load (with persistence) it will produce this sort of output

    truss –D –p 0.0002 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0001 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0002 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0001 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0001 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0001 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0001 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0001 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0001 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0002 write(3, " A A A A A A A A A A A A".., 8192) = 8192
    0.0001 write(3, " A A A A A A A A A A A A".., 8192) = 8192

    But you do need root access. The 8KB observed above was from my own perl script which aims to simulate the effect of MQ persistence on a disk subsystem, without having to install MQ in the first place …

  • alex

    I’m trying to get MQ 6.0.2 on windows 2k3 with persistence on to give me 1000 msgs per sec (approx 3kb in size) but can’t seem to get more than 400-500 msgs per sec. My settings:
    64 MB log file sizes @ 10 primary , 10 secondary
    log buffer pages = 4096
    Max Handles: 1000
    Log:
    circular logging with sing write integrity
    Channels:
    max channels=1000
    pipeline length=2
    batch size=1000

    Any tweak thoughts would be helpful.

  • Tim

    First, I would make sure that your bottleneck is actually a server resource (as we might expect) by collecting some disk stats (and CPU/et/mem just to be safe) using perfmon.

    For disk, you might want to use a free tool like IOMeter simply to provide some rough benchmarks on what speed your disks are capable of reading/writing at… Particularly with disk IO on windows, look at avg read|writes/sec, avg. disk sec/read|write, disk queue length and also avg disk bytes reads|writes/sec.. Post back your results here if you’re keen.

    Now assuming you can’t change your disk subsytem to a faster array, the following recommendations are what I’d suggest in getting the most out of persistent messages:.

    1. Since you’re aiming to write 1,000 x 3KB + 750B per message per second, you’re looking at something between 3.5 – 4 MB/sec. This may be close to your disk subsystem limits depending on what that actually is. For example, on a beefed up SAN I’ve clocked over 40MB/sec, however on a clients production web server with a SATA RAID 5 array, they could only get 6MB/sec. On my Mac VM I can only get 2MB/sec :) So it really is important to balance the theoretical performance of your subsystems before going nuts with tuning parameters…

    2. Having said all that though, you have increased the LogFilePages=16384 which will make your individual log files bigger (approx 64MB). This will cut down on the rate MQ needs to open close files under load. I’ve seen this have a significant impact on performance. You should check date time stamps on these log files to see how quickly MQ is circling through those . I imagine at a target rate of 4MB/sec, you are probably cycling through at a rate of 1 log file every 16 seconds :-)

    3.I wasn’t aware you could use 4096 for LogBufferPages (I thought max was 512) ie. 512×4 = 2MB buffer max. So maybe just double check that setting is valid.

    4. Have you tried the following tuning parameters?
    DefaultQBufferSize=1048576
    DefaultPQBufferSize=1048576

    These may have an impact although can’t comment on their effectiveness.

    At the end though, you need to establish if it IS a disk performance bottleneck as you might expect. For what it’s worth, in the past I have run many different tests/configurations on clients systems regarding MQ persistence performance and have found that you can get very high rates.

    To illustrate, on a particular client’s subsytem I was getting 1,500 trans/sec for a single thread message producer using messgae persistence (around 300B per messsage). Simply by increasing concurrency, at around 30 threads we were able to double that throughput (it wasn’t necessarily SAN performance as we first thought, although we established that one thread had its maximums for other reasons). In the same environment, we also wrote to a SCSI disk and RAM (just for comparison) with a single thread, and got 300 trans/sec and 3,000+ trans/sec respectively. So no denying disk subsystem configuration has a huge effect on overall performance. Sometimes you can work around this with higher levels of concurrency.

    Hope that helps. Regards,
    Tim Koopmans

  • Steve

    Question

    How I can read the MQ logs to check if messages are persistent or not

    steve

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">