Thursday, August 23, 2007

Oracle on ZFS and Snapshots

Testing Oracle on X4500 with ZFS Snapshots.

1) Setup task to write millions of rows into a table.

2) While tasks runs, take full snapshot on host A running Oracle. (snapshot is called @third). This snapshot is done in one atomic operation, for all filesystems under /oracle/KANAUSTS.

3) Send the full snapshot to host B.

4) Start Oracle on host B. (No errors encountered or recovery needed).

5) Perform select count(*) from table;

6) Shutdown Oracle on host B.

7) On host A, perform select count(*) on table (2,140,000).

8) Immediately (a few seconds after #7) take another snapshot on host A. (snapshot is called @fourth).

9) Send the incremental differences between @third and @fourth snapshots to host B. Time elapsed between snapshots was 70 minutes, constituting 1.3GB of changed data. It took 73 seconds to send the incremental difference to host B (17MB/sec).

10) Start Oracle on host B. (No errors encountered or recovery needed).

11) Perform select count(*) from table: (2,150,000).

In our version of ZFS, after sending the snapshot, one must lock the filesystem into read only mode. If the destination changes, an incremental snapshot cannot be received. The filesystem can change just by doing an ls in the filesystem (metadata change). If this occurs, which it did in step 4 since we started the database, it is very easy to rollback to the previous snapshot. In this case I rolled back to @third. This can be done in one atomic operation, in a recursive manner, eg: (# zfs rollback –r /oracle/KANAUSTS@third).

These issues are solved in subsequent versions of zfs. zfs receive –F will forcibly rollback any changes, on the fly.

Tuesday, August 21, 2007

DD Performance Testing

The following DD aliases will read/write in 8k block sizes (good for HDS arrays) as well as a 1MB read/write blocksize, good for other arrays such as an x4500.

alias ddr='time /usr/local/bin/dd if=test.out of=/dev/null bs=1024k count=10000'
alias ddr8='time /usr/local/bin/dd if=test.out of=/dev/null bs=8192 count=1220703'
alias ddr9='time /usr/local/bin/dd if=test.out of=/dev/null bs=8192 count=10000'
alias ddw='time /usr/local/bin/dd if=/dev/zero of=test.out bs=1024k count=10000'
alias ddw8='time /usr/local/bin/dd if=/dev/zero of=test.out bs=8192 count=1220703'


The output file is a 10GB file.

Using dd from coreutils will automatically calculate throughput rates. The following example is writing to an x4500 16 x 2-way mirror pool, across 6 controllers.

root@sjcitthump1 # ddw
10485760000 bytes (10 GB) copied, 31.7485 s, 330 MB/s

real 0m31.753s
user 0m0.027s
sys 0m14.090s
root@sjcitthump1 # ddr
10485760000 bytes (10 GB) copied, 14.6932 s, 714 MB/s

Followup:

Please ensure when testing you are using the same version of DD during tests. I received differing results with the SunFreeware coreutils package providing dd (3x faster). Best approach is to use the native dd.

VCS: Add/Remove Node From Cluster

Removing VCS Host:
  • Remove system from /etc/llthosts. Rename /etc/rc2.d scripts.
  • hastop -local
  • From another node in cluster run the hagrp -modify commands. Once node no longer exists in service groups, you can finally remove it with hasys.
hagrp -modify casper SystemList -delete denitdb03
hasys -delete denitdb03

Adding VCS Host:

  • Be sure to update /etc/llthosts with the correct ID, in this case 3.
  • Use the main.cmd to get the syntax correct for an additional system.
hagrp -modify ClusterService SystemList -add denitdb07 3
hagrp -modify ClusterService AutoStartList -add denitdb07
hares -modify mnic Device ce0 "10.101.155.58" ce1 "10.101.155.58" -sys denitdb07
hares -modify mnic RouteOptions "default 10.101.155.1 0" -sys denitdb07
hagrp -modify casper SystemList -add denitdb07
hagrp -modify casper SystemList -add denitdb07 3
hagrp -modify kanaint2 SystemList -add denitdb07 3
hagrp -modify kronos SystemList -add denitdb07 3
hagrp -modify livechat SystemList -add denitdb07 3
hagrp -modify seraph SystemList -add denitdb07 3
hagrp -modify shade SystemList -add denitdb07 3
haconf -dump -makero
hacf -verify .