The Oracle High Availability Services (OHAS) software stack appears to hang when a server housing CRS, ASM, RDBMS and an application reliant on the database is rebooted.
It is not unusual for a server's startup routine as dictated by the contents of the /etc/rc3.d directory to contain not only the OHAS components - installed by root.sh as part of the grid infrastructure installation - but also a step that starts an application running on the same server - usually installed manually and configured to start after the OHAS components. It is also not uncommon to have the application program look for the existence of a database to which it will need to communicate, and to enter a spin/sleep cycle and wait for access to the database to become available.
Unfortunately, due to the requirements of OHAS to be able to handle, for example, a dead process situation, it relies on the facilities offered by the inittab functionality to issue a 'respawnable' execution of the init.ohasd process. This is the process that ultimately starts the OHAS software stack and, as per any configured services, will initiate, amongst others, the listener, the ASM instance and the RDBMS instance. The entry for this process is located in the /etc/inittab file and, like the rc3.d entries, is is also installed by root.sh as part of the grid infrastructure installation.
Field two of each entry in the /etc/inittab file state the run level(s) that the server must have reached in order for the entry to be executed. In the case of init.ohasd this is set to 3 and 5. Thus a 'chicken and egg' situation arises whereby the rc3.d entries cannot complete - in this example because the application is sleeping and waiting for the database to start - and the inittab entry - which ultimately starts the database - cannot begin because the rc3.d processes haven't finished. The net result is an apparently hung startup.
EXAMPLE /etc/inittab ENTRY
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
As the /etc/inittab file will not get processed until /etc/rc3.d entries complete, any entry in the /etc/rc3.d directory must not be allowed to sleep and wait for the OHAS software stack to appear. Instead, either remove the application startup completely from the rc3.d process and if appropriate start the application manually after the OHAS software stack is up, or rewrite the application startup routine so that rather waiting 'in line' as part of the rc3.d processes, it spawns another 'out of line' process perhaps using the /usr/bin/at command and a suitable delay.
Either of the above solutions will ensure that the server reaches runlevel 3 unhindered, and so allows any dependent steps configured in /etc/inittab to begin.