Identify the four steps you must perform to replace this flashdisk.

Last weekend, an Exadata storage server flashdisk entered the predictive failure state.
The flashdisk is used by the flashcache and has a griddisk which is a member of a normal
redundancy diskgroup.
Identify the four steps you must perform to replace this flashdisk.

A.
Identify the griddisk on the predictive failure flashdisk and drop it from the associated ASM
diskgroup

B.
Verify that the griddisk located on the predictive failure flashdisk has been successfully dropped
from the associated ASM diskgroup.

C.
Drop the flashcache on the cell and re-create it using all but the predictive failure flashdisk.

D.
Safely power off the cell containing the predictive failure flashdlsk.

E.
Replace the predictive failure flashdisk.

F.
Power up the cell containing the replaced flashdlsk and activate all grlddlsks.

G.
Drop the flashcache on the cell and re-create it using all flashdlsks.

H.
Create a new griddisk on the replaced flashdisk.

I.
Add the griddisk back into the ASM diskgroup to which it belonged.

Explanation:
Note:
*Exadata monitors for the number of media and other disk/flash failures (e.g. an I/O write failure
due to physical media damage). If there are too many of those, Exadata is ‘predicting’ that it will
soon fail and it takes it out of the system.
*Exadata Server, that runs on the storage cells, monitors disk health and performance. If the disk
performance degrades it can put it into proactive failure mode. It also monitors for predictive
failures based on the disk’s SMART (Self-monitoring, Analysis and Reporting Technology) data. In
both cases, the Exadata Server notifies XDMG to take those disks offline.
When a faulty disk is replacedf on the storage cell, the Exadata Server will recrate all grid disks on
a new disk. It will then notify XDMG to bring those grid disks online or add them back to disk
groups, in case they were already dropped.
*ASM is a critical component of the Exadata software stack. It is also a bit different – compared to
non-Exadata environments. It still manages your disk groups, but builds those with grid disks. It
still takes care of disk errors, but also handles predictive disk failures. It doesn’t like external
redundancy and ACFS, but it makes the disk group smart scan capable.

Show Hint

← Previous question

Next question →

Farooq Nafey

Answer should be B,D,E,F
The following happens under the Predictive Failure of a physical disk / flash disk:
“Oracle Exadata Storage Server Software drops the grid disks on
the affected physical disk without the FORCE option from Oracle
ASM, and the rebalance operation copies the data on the
affected physical disk to other disks.
After all grid disks have been successfully removed from their
respective Oracle ASM disk groups, administrators can proceed
with disk replacement.”

Therefore, Step A is not required. You need to perform Step B before proceeding with D,E,F. Once the storage server is brought back online, grid disks, flash cache and flash log are automatically created and grid disks are put back in the disk groups to which they originally belonged. Therefore, Steps G,H,I would be redundant too.

mat

ADEF -We need to drop the diskgroup manually

Syed Jaffer Hussain

Farooq is right, A is not the correct answer. Excerpt from from the manual:

If the flash disk is used for grid disks, then the Oracle ASM disks associated with these grid disks are automatically dropped with the FORCE option from the Oracle ASM disk group, and an Oracle ASM rebalance starts to restore the data redundancy.

http://docs.oracle.com/cd/E50790_01/doc/doc.121/e51951/storage.htm#DBMMN21125

Order:
Shutdown the cell
Replace the failed flash disk
Power up the cell
Verify all disks put on online (LIST GRIDDISK ATTRIBUTES name, asmmodestatus)

The answer should be BDEF

Syed Jaffer Hussain

Add on:

Removing Flash Disk Due to Bad Performance
If the flash disk is used for grid disks, then use the following command to direct Oracle ASM to stop using the bad disk at once:

SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name FORCE

The question was predictive failure not bad performance. So, answer A is not correct.