You use Hyper-V server 2008 R2 and failover clustering to host several virtual machines
(VMs). You plan to perform a Volume Shadow Copy (VSS) backup of a Cluster Shared
Volume (CSV). You need to ensure that resources can continue to use the CSV during the
VSS backup. What should you do?
A.
Turn on maintenance mode for the CSV.
B.
Configure your VSS-aware backup utility as a generic application in failover clustering.
C.
Use Failover Cluster Manager to remove dependences from your disk resources.
D.
Turn on redirected access for the CSV.
Cluster shared Volumes (CSV) is a new feature implemented in Windows Server 2008 R2 to
assist with new scale-up\out scenarios.CSV provides a scalable fault tolerant solution forclustered applications that require NTFS file system access from anywhere in the cluster.In
Windows Server 2008 R2, CSV is only supported for use by the Hyper-V role.
The purpose of this blog is to provide some basic troubleshooting steps that can be
executed to address CSV volumes that show aRedirected Accessstatus in Failover Cluster
Manager.It is not my intention to cover the Cluster Shared Volumes feature. For more
information on Cluster Shared Volumes consultTechNet.
Before diving into some troubleshooting techniques that can be used to resolve Redirected
Access issues on Cluster Shared Volumes, let’s list some of the basic requirements for CSV
as this may help resolve other issues not specifically related to Redirected Access.
· Disks that will be used in the CSV namespace must be MBR or GPT with an NTFS
partition.
· The drive letter for the system disk must be the same on all nodes in the cluster.
· The NTLM protocol must be enabled on all nodes in the cluster.
· Only the in-box cluster “Physical Disk” resource type can be added to the CSV
namespace.No third party storage resource types are supported.
· Pass-through disk configurations cannot be used in the CSV namespace.
· All networks enabled for cluster communications must haveClient for Microsoft Networks
and File and Printer Sharing for Microsoft Networks protocols enabled.
· All nodes in the cluster must share the same IP subnets between them as CSV network
traffic cannot be routed.For multi-site clusters, this means stretched VLANs must be used.
Let’s start off by looking at the CSV namespace in a Failover Cluster when all things appear
to be ‘normal.’In Figure 1, all CSV volumes showOnlinein the Failover Cluster Management
interface.Figure 1
Looking at a CSV volume from the perspective of a highly available Virtual Machine group
(Figure 2), the Virtual Machine is Online on one node of the cluster (R2-NODE1), while the
CSV volume hosting the Virtual Machine files is Online on another node (R2-NODE2) thus
demonstrating how CSV completely disassociates the Virtual Machine resources (Virtual
Machine; Virtual Machine Configuration) from the storage hosting them.Figure 2
When all things are working normally (no backups in progress, etc…) in a Failover Cluster
with respect to CSV, the vast majority of all storage I/O is Direct I/O meaning each node
hosting a virtual machine(s) is writing directly (via Fibre Channel, iSCSI, or SAS connectivity)
to the CSV volume supporting the files associated with the virtual machine(s).A CSV volume
showing a Redirected Access status indicates that all I/O to that volume, from the
perspective of a particular node in the cluster, is being redirected over the CSV network to
another node in the cluster which still has direct access to the storage supporting the CSV
volume. This is, for all intents and purposes, a ‘recovery’ mode.
This functionality prevents the loss of all connectivity to storage. Instead, all storage related
I/O is redirected over the CSV network. This is very powerful technology as it prevents a
total loss of connectivity thereby allowing virtual machine workloads to continue functioning.
This provides the cluster administrator an opportunity to evaluate the situation and live
migrate workloads to other nodes in the cluster not experiencing connectivity issues. All this
happens behind the scenes without users knowing what is going on. The end result may be
slower performance (depending on the speed of the network interconnect, for example, 10
GB vs. I GB) since we are no longer using direct, local, block level access to storage. We
are, instead, using remote file system access via the network using SMB.
There are basically four reasons a CSV volume may be in aRedirected Accessmode.
· The user intentionally places the CSV Volume in Redirected Access mode.
· There is a storage connectivity failure for a node in which case all I\O is redirected over a
cluster network designated for CSV traffic to another node.
· A backup of a CSV volume is in progress or failed.
· An incompatible filter driver is installed on the node.
Lets’ take a look at a CSV volume inRedirected Accessmode (Figure 3).Figure 3
When a CSV volume is placed in Redirected Accessmode, a Warning message (Event ID
5136) is registered in the System Event log. (Figure 4).Figure 4
For additional information on event messages that pertain specifically to Cluster Shared
Volumes please consult TechNet.
Let’s look at each one of the four reasons I mentioned and propose some troubleshooting
steps that can help resolve the issue.
User intentionally places a CSV volume in Redirected Access mode:
Users are able to manually place a CSV volume in Redirected Access mode by simply
selecting a CSV volume, Right-Click on the resource, select More Actions and then
selectTurn on redirected access for this Cluster shared volume(Figure 5).Figure 5
Therefore, the first troubleshooting step should be to try turning off Redirected Access mode
in the Failover Cluster Management interface.
2. There is a storage connectivity issue: When a node loses connectivity to attached storage
that is supporting a CSV volume, the cluster implements a recovery mode by redirecting
storage I\O to another node in the cluster over a network that CSV can use.The status of the
cluster Physical Disk resource associated with the CSV volume isRedirected Accessand all
storage I\O for the associated virtual machine(s) being hosted on that volume is redirected
over the network to another node in the cluster that has direct access to the CSV volume.
This is by far thenumber one reasonCSV volumes are placed inRedirected Accessmode.
Troubleshoot this as you would any other loss of storage connectivity on a server. Involve
the storage vendor as needed. Since this is a cluster, the cluster validation process can also
be used as part of the troubleshooting process to test storage connectivity.
Look for the following event ID in the system event log.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 10/8/2010 6:16:39 PM
Event ID:5121
Task Category: Cluster Shared Volume
Level: Error
Keywords:
User: SYSTEM
Computer: Node1.cluster.com
Description:Cluster Shared Volume ‘DATA-LUN1’ (‘DATA-LUN1’) is no longer directly
accessible from this cluster node. I/O access will be redirected to the storage device over the
network through the node that owns the volume. This may result in degraded performance. If
redirected access is turned on for this volume, please turn it off. If redirected access is
turned off, please troubleshoot this node’s connectivity to the storage device and I/O will
resume to a healthy state once connectivity to the storage device is reestablished.
3.A backup of a CSV volume fails: When a backup is initiated on a CSV volume, the volume
is placed inRedirected Accessmode. The type of backup being executed determines howlong a CSV volume stays in redirected mode. If a software backup is being executed, the
CSV volume remains in redirected mode until the backup completes.
If hardware snapshots are being used as part of the backup process, the amount of time a
CSV volume stays in redirected mode will be very short.
For a backup scenario, the CSV volume status is slightly modified.The status actually shows
asBackup in progress, Redirected Access (Figure 6) to allow you to better understand why
the volume was placed inRedirected Accessmode. When the backup application completes
the backup of the volume, the cluster must be properly notified so the volume can be brought
out of redirected mode.Figure 6
A couple of things can happen here.Before proceeding down this road, ensure a backup is
reallynotin progress.
The first thing that needs to be considered is that the backup completes but the application
did not properly notify the cluster that it completed so the volume can be brought out of
redirected mode.The proper call that needs to be made by the backup application
isClusterClearBackupStateForSharedVolumewhich is documented onMSDN.If that is the
case, you should be able to clear theBackup in progress, Redirected Accessstatus by
simulating a failure on the CSV volume using the cluster PowerShell cmdletTestClusterResourceFailure.
Using the CSV volume shown in Figure 6, an example would be –
Test-ClusterResourceFailure “35 GB Disk”
If this clears the redirected status, then the backup application vendor needs to be notified
so they can fix their application.
The second consideration concerns a backup that fails, but the application did not properly
notify the cluster of the failure so the cluster still thinks the backup is in progress. If a backup
fails, and the failure occurs before a snapshot of the volume being backed up is created,
then the status of the CSV volume should be reset by itself after a 30 minute time delay.If,
however, during the backup, a software snapshot was actually created (assuming the
application creates software snapshots as part of the backup process), then we need to use
a slightly different approach. To determine if any volume shadow copies exist on a CSV
volume, use thevssadmincommand line utility and runvssadmin list shadows(Figure 7).Figure 7
Figure 7 shows there is a shadow copy that exists on the CSV volume that is inRedirected
Accessmode. Use thevssadminutility to delete the shadow copy (Figure 8).Once that
completes, the CSV volume should come Onlinenormally.If not, change the Coordinator
node by moving the volume to another node in the cluster and verify the volume
comesOnline.Figure 8
4.An incompatible filter driver is installed in the cluster:The last item in the list has to do with
filter drivers introduced by third party application(s) that may be running on a cluster node
and are incompatible with CSV.
When these filter drivers are detected by the cluster, the CSV volume is placed in redirected
mode to help prevent potential data corruption on a CSV volume.When this occurs anEvent
ID 5125[EC4]Warningmessage is registered in the System Event Log.Here is a sample
message – 17416 06/23/2010 04:18:12 AMWarning<node_name>5125Microsoft-WindowsFailoverClusterin Cluster Shared Vol NT AUTHORITY\SYSTEMCluster Shared Volume
‘Volume2’ (‘Cluster Disk 6’) has identified one or more active filter drivers on this device
stack that could interfere with CSV operations. I/O access will be redirected to the storage
device over the network through another Cluster node. This may result in degraded
performance. Please contact the filter driver vendor to verify interoperability with Cluster
Shared Volumes.Active filter drivers found: <filter_driver_1>,<filter_driver_2>,<filter_driver_3>
The cluster log will record warning messages similar to these –
7c8:088.06/10[06:26:07.394](000000) WARN[DCM] filter <filter_name> found at unsafe
altitude <altitude_numeric>
7c8:088.06/10[06:26:07.394](000000) WARN[DCM] filter <filter_name>found at unsafe
altitude <altitude_numeric>
7c8:088.06/10[06:26:07.394](000000) WARN[DCM] filter <filter_name>found at unsafe
altitude <altitude_numeric> Event ID 5125 is specific to a file system filter driver.If, instead,
an incompatible volume filter driver were detected, anEvent ID 5126would be registered.For
more information on the difference between file and volume filter drivers, consultMSDN.
Note:Specific filter driver names and altitudes have been intentionally left out.The
information can be decoded by downloading the ‘File System Minifilter Allocated Altitudes’
spreadsheetposted on the Windows Hardware Developer Central public website.
Additionally, thefltmc.execommand line utility can be run to enumerate filter drivers.An
example is shown in Figure 9.Figure 9
Once the Third Party filter driver has been identified, the application should be removed
and\or the vendor contacted to report the problem. Problems involving Third Party filter
drivers are rarely seen but still need to be considered.
Hopefully, I have provided information here that will get you started down the right path to
resolving issues that involve CSV volumes running in aRedirected Accessmode.