Oracle Technologies Blog

By ASKM

11gR2 – RAC Shared Storage Preparation(OCFS) – Part2

Posted by Srikrishna Murthy Annam on March 19, 2010

RAC storage Options:

  1. ASM Storage
  2. OCFS (Release 1 or 2)
  3. NFS ( NAS or SAN )
  4. Raw Devices
  5. Third party cluster filesystem such as GPFS or Veritas

OCFS Storage:

Partition the Disks
In order to use OCFS2, you must have unused disk partitions available. This article describes how to create the partitions that will be used for OCFS2.
Consider an empty SCSI disk and This disk is configured as shared disk and is visible to all nodes.
/dev/sdc
Note: you can run “/sbin/sfdisk -s” command as root user to see all the disks.
In this example we will use /dev/sdc for OCFS2.
You will now uses /dev/sdc (an empty SCSI disk with no existing partitions) to create a single partition for the entire disk (10 GB) to be used for OCFS2.
As root user on Node1, run the following command

# fdisk /dev/sdc
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,until you decide to write them. After that, of course, the previous content won’t be recoverable.

The number of cylinders for this disk is set to 1305.

There is nothing wrong with that, but this is larger than 1024,and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

(e.g., DOS FDISK, OS/2 FDISK)

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1): <enter>
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-1305, default 1305): <enter>
Using default value 1305
Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

Now verify the new partition:

# fdisk -l /dev/sdc
Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1           1305    10482381   83  Linux

When finished partitioning, run the ‘partprobe’ command as root on each of the remaining cluster nodes in order to assure that the new partitions are configured.
# partprobe

Oracle Cluster File System (OCFS) Release 2
OCFS2 is a general-purpose cluster file system that can be used to store Oracle Clusterware files, Oracle RAC database files, Oracle software, or any other types of files normally stored on a standard filesystem such as ext3.  This is a significant change from OCFS Release 1, which only supported Oracle Clusterware files and Oracle RAC database files.
OCFS2 is available free of charge from Oracle as a set of three RPMs:  a kernel module, support tools, and a console.  There are different kernel module RPMs for each supported Linux kernel.  In this exercise, ocfs2 is pre installed for you on both the nodes. You can run following command on all nodes to verify that, you should see three rpms as bellow.

# rpm -qa|grep ocfs
ocfs2-2.6.9-55.0.2.6.2.ELsmp-1.2.5-2
ocfs2-tools-1.2.7-1.el4
ocfs2console-1.2.7-1.el4

(Optionally, OCFS2 kernel modules may be downloaded from http://oss.oracle.com/projects/ocfs2/files/ and the tools and console may be downloaded from http://oss.oracle.com/projects/ocfs2-tools/files/.)

Configure OCFS2
You will need a graphical environment to run OCFS2 Console so let’s start vncserver on Node1 as oracle user:

Now using vncviewer from your desktop access the vnc session you just started on Node1 (ocvmrh2074:x). Sign-on using the vnc password you set in above step.
On vnc desktop, left click the mouse and select ‘Xterm’ to open a new window. In new window ‘su -‘ to root:

$ su –
Password:
Run ocfs2console as root:
# ocfs2console

Select Cluster -> Configure Nodes
Click on Close if you see following Information message.

Note: This application is slow, and may take a few seconds before you see new window.



Click on Add on the next window, and  enter the Name and IP Address of each node in the cluster.

Note: Use node name to be the same as returned by the ‘hostname’ command
Ex:
ocvmrh2074 (short name, without the us.oracle.com)






Apply, and Close the window.

Once all of the nodes have been added, click on Cluster –> Propagate Configuration.  This will copy the OCFS2 configuration file to each node in the cluster.  You may be prompted for root passwords as ocfs2console uses ssh to propagate the configuration file.  Answer ‘yes’ if you see following prompt.



When you see ‘Finished!’, click on Close, and leave the OCFS2 console by clicking on File –> Quit.

After exiting the ocfs2console, you will have a /etc/ocfs2/cluster.conf similar to the following on all nodes. This OCFS2 configuration file should be exactly the same on all of the nodes:

node:
ip_port = 7777
ip_address = 140.87.223.114
number = 0
name = ocvmrh2194
cluster = ocfs2

node:
ip_port = 7777
ip_address = 140.87.222.243
number = 1
name = ocvmrh2074
cluster = ocfs2

cluster:
node_count = 2
name = ocfs2

If you don’t see this file on any node, follow the steps bellow to copy this file on missing nodes as root user.

Create /etc/ocfs2 directory if missing and Copy the cluster.conf file from the node1 (where it is found) to the other node (where it is missing). You will be prompted for the root password.

# mkdir /etc/ocfs2
# scp /etc/ocfs2/cluster.conf ocvmrh2074:/etc/ocfs2/cluster.conf
Password:
cluster.conf                                  100%  240     0.2KB/s   00:00

Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold
You now need to configure the on-boot properties of the O2CB driver so that the cluster stack services will start on each boot. You will also be adjusting the OCFS2 Heartbeat Threshold from its default setting of 7 to 601. All the tasks within this section will need to be performed on both nodes in the cluster as root user.
Set the on-boot properties as follows:

# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.  The current values will be shown in brackets (‘[]’).  Hitting
<ENTER> without typing an answer will keep that current value.  Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [n]: y
Cluster to start on boot (Enter “none” to clear) [ocfs2]: ocfs2
Specify heartbeat dead threshold (>=7) [31]: 601
Specify network idle timeout in ms (>=5000) [30000]: <enter>
Specify network keepalive delay in ms (>=1000) [2000]: <enter>
Specify network reconnect delay in ms (>=2000) [2000]: <enter>
Writing O2CB configuration: OK
Loading module “configfs”: OK
Mounting configfs filesystem at /config: OK
Loading module “ocfs2_nodemanager”: OK
Loading module “ocfs2_dlm”: OK
Loading module “ocfs2_dlmfs”: OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting O2CB cluster ocfs2: OK

We can now check again to make sure the settings took place in for the o2cb cluster stack:

# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
601

The default value was 7, but what does this value represent? Well, it is used in the formula below to determine the fence time (in seconds):
[fence time in seconds] = (O2CB_HEARTBEAT_THRESHOLD – 1) * 2
So, with an O2CB heartbeat threshold of 7, we would have a fence time of:
(7 – 1) * 2 = 12 seconds
In our case, I used a larger threshold (of 1200 seconds), so I adjusted O2CB_HEARTBEAT_THRESHOLD to 601 as shown below:
(601 – 1) * 2 = 1200 seconds

It is important to note that the value of 601 I used for the O2CB heartbeat threshold is the maximum you can use to prevent OCFS2 from panicking the kernel.

Verify that ocfs2 and o2cb are started at boot time.  Check this on both nodes. As root user:

# chkconfig –list |egrep “ocfs2|o2cb”
ocfs2           0:off   1:off   2:on    3:on    4:on    5:on    6:off
o2cb            0:off   1:off   2:on    3:on    4:on    5:on    6:off

If it doesn’t look like above on both nodes, turn them on by following command as root:

# chkconfig ocfs2 on
# chkconfig o2cb on

Create and format the OCFS2 filesystem on the unused disk partition
As root on each of the cluster nodes, create the mount point directory for the OCFS2 filesystem

# mkdir /u03

Note: It is possible to format and mount the OCFS2 partitions using the ocfs2console GUI; however, this guide will use the command line utilities.

The example below creates an OCFS2 filesystem on the unused /dev/sdc1 partition with a volume label of “/u03” (-L /u03), a block size of 4K (-b 4K) and a cluster size of 32K (-C 32K) with 4 node slots (-N 4).  See the OCFS2 Users Guide for more information on mkfs.ocfs2 command line options.
Run the following command as root on node1 only

# mkfs.ocfs2 -b 4K -C 32K -N 4 -L /u03 /dev/sdc1
mkfs.ocfs2 1.2.7
Filesystem label=/u03
Block size=4096 (bits=12)
Cluster size=32768 (bits=15)
Volume size=10733944832 (327574 clusters) (2620592 blocks)
11 cluster groups (tail covers 5014 clusters, rest cover 32256 clusters)
Journal size=67108864
Initial number of node slots: 4
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 2 block(s)
Formatting Journals: done
Writing lost+found: done
mkfs.ocfs2 successful

Mount the OCFS2 filesystem
Since this filesystem will contain the Oracle Clusterware files and Oracle RAC database files, we must ensure that all I/O to these files uses direct I/O (O_DIRECT).  Use the “datavolume” option whenever mounting the OCFS2 filesystem to enable direct I/O.  Failure to do this can lead to data loss in the event of system failure. Mount the ocfs2 file system on all cluster nodes, run the following command as root user.

# mount -t ocfs2 -L /u03 -o datavolume /u03

Notice that the mount command uses the filesystem label (-L  u03) used during the creation of the filesystem. This is a handy way to refer to the filesystem without having to remember the device name.

To verify that the OCFS2 filesystem is mounted, issue the df command on both nodes:

# df /u03
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdc1             10482368    268736  10213632   3% /u03

To automatically mount the OCFS2 filesystem at system boot, add a line similar to the one below to /etc/fstab on each cluster node:

LABEL=/u03   /u03    ocfs2   _netdev,datavolume 0 0

Create the directories for shared files
As root user, run the following commands on node1 only. Since /u03 is on a shared disk, all the files added from one node will be visible on other nodes.

CRS files:
# mkdir /u03/oracrs
# chown oracle:oinstall /u03/oracrs
# chmod 775 /u03/oracrs
Database files:
# mkdir /u03/oradata
# chown oracle:oinstall /u03/oradata
# chmod 775 /u03/oradata
Advertisements

4 Responses to “11gR2 – RAC Shared Storage Preparation(OCFS) – Part2”

  1. […] https://learnwithme11g.wordpress.com/2010/03/19/11gr2-rac-shared-storage-preparationocfs-part2/ […]

  2. […] 11gR2 – RAC Shared Storage Preparation(OCFS) […]

  3. moving…

    11gR2 – RAC Shared Storage Preparation(OCFS) – Part2 « Oracle Database 11g Blog…

  4. ThoNguyen said

    2 !
    I’m trying to setup Oracle RAC using OCFS . When using ocfsconsole to add node I use ip private but what should i fill in hostname texbox ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: