Quantcast
Channel: High Availability (Clustering) forum
Viewing all 3614 articles
Browse latest View live

cluster error 1090 and 7024

$
0
0

Windows Server 2012 R2 Standard

i am recovering an Exchange 2013 Mailbox server and have reformatted this Windows Server 2012 R2. when i restart the cluster service is disabled. if i enable it and start it, it gives those errors in event viewer as 1090:

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          6/19/2015 7:16:17 PM
Event ID:      1090
Task Category: Startup/Shutdown
Level:         Critical
Keywords:
User:          SYSTEM
Computer:      ruh1mb02.ALJOMAIHBEV.com
Description:
The Cluster service cannot be started. An attempt to read configuration data from the Windows registry failed with error '2'. Please use the Failover Cluster Management snap-in to ensure that this machine is a member of a cluster. If you intend to add this machine to an existing cluster use the Add Node Wizard. Alternatively, if this machine has been configured as a member of a cluster, it will be necessary to restore the missing configuration data that is necessary for the Cluster Service to identify that it is a member of a cluster. Perform a System State Restore of this machine in order to restore the configuration data.

and 7024:

Log Name:      System
Source:        Service Control Manager
Date:          6/19/2015 7:16:17 PM
Event ID:      7024
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      ruh1mb02.ALJOMAIHBEV.com
Description:
The Cluster Service service terminated with the following service-specific error:
The system cannot find the file specified.

i would have tried the system state restore but there is no system state backup for this particular server.

how does one recover a cluster member then?


troubleshoot cluster service does not start

$
0
0

Hi,

We have two virtual node  windows 2012 failover cluster. But we are getting error like cluster service does not start.

Can I get some basic/ advanced troubleshooting steps or advise ?

Thanks

 

can not fix corrupt system file

$
0
0

even I run the dism online restore, there have still have a corrupted file can not fixed, please help

2015-06-22 21:38:39, Info                  CBS    This session already attempted mapping cache rebuild, skip.
2015-06-22 21:38:39, Info                  CBS    Failed to find package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5 from the index with mapping index packages recently rebuilt,  [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to get WU category/updateID for package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5 [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to get the mapping of package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5, continue. [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to find  [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to collect payload and there is nothing to repair. [HRESULT = 0x800f0906 - CBS_E_DOWNLOAD_FAILURE]
2015-06-22 21:38:39, Info                  CBS    Failed to repair store. [HRESULT = 0x800f0906 - CBS_E_DOWNLOAD_FAILURE]
2015-06-22 21:38:39, Info                  CBS    Ensure CBS corruption flag is clear
2015-06-22 21:38:39, Info                  CBS   
=================================

Checking System Update Readiness.

(p) CSI Payload Corrupt   amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\utc.app.json
Repair failed: Missing replacement payload.
(p) CSI Payload Corrupt   amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\telemetry.ASM-WindowsDefault.json
Repair failed: Missing replacement payload.

2015-06-22 20:57:06, Info                  CSI    000008fc [SR] Could not reproject corrupted file [ml:520{260},l:114{57}]"\??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings"\[l:24{12}]"utc.app.json"; source file in store is also corrupted
2015-06-22 20:57:06, Info                  CSI    000008fd Hashes for file member \??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings\telemetry.ASM-WindowsDefault.json do not match actual file [l:66{33}]"telemetry.ASM-WindowsDefault.json" :
  Found: {l:32 b:ErEvcGxrC5RD30CwVgig/0sasSdfpRLjd18ZiXseYV4=} Expected: {l:32 b:EeQJzlVPvq9GNIcA2FEwrOjEeuDam1G+ol3x61gKasQ=}
2015-06-22 20:57:06, Info                  CSI    000008fe Hashes for file member \SystemRoot\WinSxS\amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\telemetry.ASM-WindowsDefault.json do not match actual file [l:66{33}]"telemetry.ASM-WindowsDefault.json" :
  Found: {l:32 b:ErEvcGxrC5RD30CwVgig/0sasSdfpRLjd18ZiXseYV4=} Expected: {l:32 b:EeQJzlVPvq9GNIcA2FEwrOjEeuDam1G+ol3x61gKasQ=}
2015-06-22 20:57:06, Info                  CSI    000008ff [SR] Could not reproject corrupted file [ml:520{260},l:114{57}]"\??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings"\[l:66{33}]"telemetry.ASM-WindowsDefault.json"; source file in store is also corrupted



Migration did not succeed. Not enough disk space at '\'. Windows Server 2012 R2

$
0
0

Hi.
I get the error referenced here:
https://support.microsoft.com/en-us/kb/2913461but the os is Windows Server 2012 R2 (not Windows Server 2012).

The machine is one of the two Windows Server 2012 R2 hyper-v failover clustering nodes using CSV on iscsi storage.
The vm machine was created locally in hyper-v and is using 2 vhdx (one for os the other for data) on local disk.
To make vm highly available i configure role in Failover Cluster Manager and it is ok then i move storage on CSV (os vdisk on one CSV data disk on the other) with move > Virtual Machine Storage and i get two errors (in event viewer):

Hyper-V-VMMS Event id 20820

Storage migration for virtual machine 'machinename' (21330etc...) failed with error 'There is not enough space on the disk.' (0x80070070).

Hyper-V-VMMS Event id 20750

Migration did not succeed. Not enough disk space at '\'.

I've already done the same thing for other vm's without any problem and there is a lot of space in CSVs.

So, is the fix valid for windows server 2012 R2 also?

Thank you

Failover Cluster Manager shows node down/wrong owner

$
0
0

I'm getting up to speed on clustering and ran into something I can't figure out.  Any help would be appreciated!

The Hyper-V hosts are running Server 2008 R2. The VM's are running Server 2012 R2.

Failover Cluster Manager is showing VM "FS1" online on host "HV1". However it says VM "FS2" is offline and lists the wrong owner (shows HV2 but actual host = HV3).

If I login to either VM, Failover Cluster Manager shows both nodes online, and everything seems to be working fine. My suspicion is that FCM on the Hyper-V hosts are using the config from a server of the same name that used to reside on HV1. What's the best way to correct this?

Windows 2012 R2 Hyper-V cluster survive a switch reboot

$
0
0

Hello,

I currently manage a 5 node Windows 2012 R2 Hyper-V cluster using shared iSCSI SAN storage and the virtual machines run on CSVs.

I need to reboot our switch stack which could bring down the networking for up to two minutes.  Please note: The iSCSI switches are isolated from these switches so connectivity to iSCSI storage will not be affected during this process.

The NICs are teamed on the Hyper-V hosts but since both switches in the stack need to go down at the same time, this seems irrelevant.

The goal is to not have to shut down 70+ servers along with the entire cluster during this network outage and having the cluster nodes maintain quorum and not go bezerk.

Does anyone have any ideas or suggestions on how this can be done?

Is changing the heartbeat interval settings (Samesubnetdelay / threshold) a viable option in this case?

I have also seen some suggestions on forcing a Disk only Quorum so as long as all nodes can see the Quorum disk, they will stay online.  I already have a Quorum disk configured.

I appreciate your help with this.

Regards,

Chris


Windows NLB - Multicast

$
0
0

Hi,

 I configured windows NLB in multicast mode and gave the MAC to network team for adding a static ARP entry in switches, i am wondering whether this MAC id is dynamic or it will stay the same untill i change the mode, or should i go for IGMP mode

same subnet but cluster places in separate networks

$
0
0

Hello,

I have an issue where a host with IP configuration that is on the same subnet as the rest of the hosts keeps getting placed into a separate network. Failover validation doesn't report any issues, it can communicate with all of the other hosts on that network.

Checking the cluster logs I see the event where it's occurring:

INFO  [ClNet] Adapter Hyper-V Virtual Ethernet Adapter #3 is still attached to network Cluster Network 1.

And here is the event where it skips attaching to the right network:

INFO  [ClNet] Ignoring configuration entry for cluster network Public (8d603185-4e36-4222-9c92-be4e3a22ac1e) because it has no previous matching adapter. Processing has not yet completed so an adapter may still be found for this network.

Any idea what can be causing this?


WS2K8 R2 Cluster does not detect Generic Service failure

$
0
0

We have a service set up as a Generic Service cluster resource named QTrans-BPPLog. We have the resource set up to be restarted automatically in case of failure.

What's happening is that when this service sometimes fails or crashes, the cluster is unaware of the fact that the service is down and doesn't restart it. If I go to the services.msc applet, I can see that the service is not running. The service process is gone in task manager. However, the cluster administrator still shows the service as online. To get it to restart, I have to bring the resource offline then online again. Can someone help?

Here is an excerpt of the cluster log from one of the times I brought it online and it crashed right away but the cluster doesn't see it. Note that there is another resource that is failed in this group but there are no dependencies between that resource and QTrans-BPPLog/

00000d14.00001ea8::2015/06/24-15:26:23.248 INFO  [NM] Received request from client address NCSMCDWTST02.

00000d14.00002134::2015/06/24-15:31:23.131 INFO  [NM] Received request from client address NCSMCDWTST02.

---- I am bringing offline QTrans-BPPLOG, which is not really running but the cluster thinks it's online because it didn't detect the previous failure
00000d14.00002134::2015/06/24-15:31:34.706 INFO  [RCM] rcm::RcmApi::OfflineResource: (QTrans-BPPLog)
00000d14.00002134::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) Online-->OfflineCallIssued.
00000d14.00002134::2015/06/24-15:31:34.862 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Failed --> Pending)
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] HandleMonitorReply: OFFLINERESOURCE for 'QTrans-BPPLog', gen(2) result 997.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflineCallIssued-->OfflinePending.
00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RES] Generic Service <QTrans-BPPLog>: Service died or not active any more; status = 1062.
---- Now the cluster realized that the service was down, but only when I brought it offline

00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RES] Generic Service <QTrans-BPPLog>: Service is now offline.
00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RHS] Resource QTrans-BPPLog has come offline. RHS is about to report resource status to RCM.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] HandleMonitorReply: OFFLINERESOURCE for 'QTrans-BPPLog', gen(2) result 0.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflinePending-->OfflineSavingCheckpoints.
00000d14.000008ac::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflineSavingCheckpoints-->Offline.
00000d14.000008ac::2015/06/24-15:31:34.862 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Pending --> Failed)

---- bringing QTrnas-BPPLog back online...
00000d14.00002134::2015/06/24-15:31:38.139 INFO  [RCM] rcm::RcmApi::OnlineResource: (QTrans-BPPLog)
00000d14.00002134::2015/06/24-15:31:38.201 INFO  [RCM] TransitionToState(QTrans-BPPLog) Offline-->OnlineCallIssued.
00000d14.00002134::2015/06/24-15:31:38.201 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Failed --> Pending)
00000d14.00001e80::2015/06/24-15:31:38.217 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'QTrans-BPPLog', gen(2) result 997.
00000d14.00001e80::2015/06/24-15:31:38.217 INFO  [RCM] TransitionToState(QTrans-BPPLog) OnlineCallIssued-->OnlinePending.
00000f20.00002334::2015/06/24-15:31:39.745 INFO  [RES] Generic Service <QTrans-BPPLog>: Service is now running.
00000f20.00002334::2015/06/24-15:31:39.745 INFO  [RHS] Resource QTrans-BPPLog has come online. RHS is about to report status change to RCM
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'QTrans-BPPLog', gen(2) result 0.
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] TransitionToState(QTrans-BPPLog) OnlinePending-->Online.
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Pending --> Failed)
---- QTrans-BPPLOG crashed at 15:31:48, but the cluster doesn't see the failure

00000d14.00002520::2015/06/24-15:34:14.047 INFO  [NM] Received request from client address NCSMCDWTST02.

Failover Cluster on S-2012 R2 - CNO Issue

$
0
0

Set up is with 2 Nodes (6 VMs per node running from clustered storage).
I have a CNO in the Active Directory & registered in the DNS, within its security tab it has full control to its named object ('CNO-C1$') and both nodes ( 1 & 2).

Node 2: Failover Cluster Manager reports no errors currently. (can ping CNO)
Node 1: Failover Cluster Manager reports 1 error of event ID 1207 (can ping CNO)

"
The computer object associated with the cluster network name resource 'Cluster Name' could not be updated in domain 'mydomain.contoso.com' during the 
Resource post online operation.

The text for the associated error code is: There is no such object on the server.


The cluster identity 'CNO-C1$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

"

Each FOC Manager reports 2 functional Nodes, 2 functional NIC with 2 different sub-nets for communication ( 1.xxx & 0.xxx)
Clustered Storage is currently online for both nodes, I can currently live migrate and 'shared nothing' migrate.

The error says there is no object... can anyone see a hole here?

Thanks in advance.


Issue with setting up a File Server Role error 1254,1205 and 1069

$
0
0

Hi Everyone

I am current building a new 2012R2 file cluster to replace our 2008R2 file cluster

on each node (total of 3) I have enabled the following roles and features

2 Nic Internal and heartbeat

heartbeat network for cluster only  

"File Server roles, failover clustering features, File Server Resource management tools and Share and Storage management tools"

I have mapped 2 Luns to the nodes

Lun 1 quorum

Lun 2 File Storage

both Luns can access by all the nodes 

during the creation of the cluster complete successfully without any error

In configure role High Availability wizard >  File Server > File Server for general use

In Client Access Point I specify the NetBIOS name and  IP address

Select the available cluster Disk2 > in the next wizard screen "you are ready to configure high availability for file cluster screen" > Next > Finish

I can see the role service create successfully how ever I can't see "test" account object create in the OU.

As you can see status show "Failed"

I am able to move the share cluster disk to another node.

Add Share option also grey out

Please help

Many thanks  

Clustered role 'Test' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  No additional attempts will be made to bring the role online or fail it over to another node in the cluster.  Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

The Cluster service failed to bring clustered role 'Test' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Cluster resource 'Test' of type 'Network Name' in clustered role 'Test' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Storage:
Cluster Disk 2
Network Name:
test
OU:
OU=FileCluster,OU=Servers,OU=,DC=,DC=,DC=
IP Address:
Started
25/06/2015 9:53:47 p.m.
Completed
25/06/2015 9:53:49 p.m.
Creating the group test.
Creating File Server resources.
Configuring the cluster storage device.
Configuring File Server resources.
Configuring File Server networking.
Verifying the client access point settings are valid.
Configuring network name resource.
Configuring new IP address resources.
Configuring the dependencies for the IP address resources.
Configuring the network name dependencies.
The client access point has been configured successfully.
Configuring File Server resources.
Creating the highly available file server resource.
Verifying required dependencies are configured.
A File Server has been successfully created.




Cluster IP address fails with error 1077

$
0
0

Hi all,

I have a windows 2012r2 failover cluster and recently I noticed that the cluster name is offline due to a failure of its IP Address:

Health check for IP interface 'Cluster IP Address' (address '10.16.18.70') failed (status is '55'). Run the Validate a Configuration wizard to ensure that the network adapter is functioning properly.

I've run the validation wizard (only for the network part coz the cluster is in production since some months) and everything is ok. If I try to bring online the IP addrees I receive a failure.

Failover on 2008R2 Fileserver cluster takes 7 minutes when a server reboots for updates

$
0
0

I have a 2008R2 domain with (2)2008R2 nodes in a cluster. I have 2 questions

1. If I "validate the cluster" and it is a cluster with several 10TB disk on it, etc. will it take a long time to validate?  Just want to make sure that step does not cause any issues on a running cluster in production

2. When a node reboots for Windows updates, the cluster does a good job on continuing to ping, but it is basically offline for about 7 minutes.  Is there something I can do to speed up the failover process?

Thanks,


Dave




Query on Multi-subnet Failover Cluster Setup

$
0
0

I have setup Failover Cluster Instance on 2 nodes with Windows Server 2008 and SQL Server 2012 running on a SAN storage with "Node and Disk based Quorum", DTC and public and Private IP communication.

Now I want to setup multi-subnet cluster by adding another two nodes and a SAN storage from different subnet.

I have following questions in mind for which I am looking for your expert help:-

  1. Do we need to enable communication between public IP of site1 servers and public IP of site2 servers and how?
  2. How our existing cluster in site1 can be stretched up-to site2 so that DC2 will also have Active-Passive architecture when failover to site2?
  3. How the Quorum will work in Node and Disk Majority mode in site2?
  4. How the Private IP communication will work when SQL Server 2012 Cluster failover to site2?
  5. What are the components we will replicate from site1 to site2 using SAN replication? Will quorum will also get replicated or we should exclude quorum from replication?

I am new in multi-subnet cluster setup and looking for your urgent help for my setup.

Recovery steps for a failed single-JBOD SOFS cluster

$
0
0

Just wondering here.

So a single, sole JBOD goes down which is connected to 2 SOFS nodes.

What are the options for quickly spinning up the VMs that resided on the JBOD?

The Hyper-V cluster cannot access the shared storage provided by the SOFS cluster, so what happens? The HV nodes will still be in a clustered state - is there a way of placing the cluster 'on hold', in order to possible run VMs from the local storage of the HV nodes?


Windows 2003 Cluster Domain / Change - Request off domian

$
0
0

Hello.  This is sort of a weird question.  We have a customer that had a two node cluster.  The customer was originally going to move their cluster to a new domain.  Before they contacted us, they removed both machines from the domain but haven't rebooted so its working okay.  As MS says you need to be i'm pretty sure after the reboot it will not work.  They have since changed the request to keep it off of the domain even if that means its online 1 server.  My thoughts are that even if the 2nd node is taken marked as failed, and the remaining server is active, after a reboot the cluster services wont start.  Task asked that i convert it to a non cluster setup.  Does anyone know what they might even mean.  I've never of converting a cluster to a non-cluster.

In this server, I think they either need a new non cluster we can migrate that is in a workgroup.  Does anyone see another way to get them down to a single domain, non-domain member?

Multi-site cluster with different connections

$
0
0

Hello,

At the moment we have a sql 2014 cluster in one datacenter where all of our customers connect to

In the near future we are going to expand to a second datacenter and we want to move 1 node to this datacenter and create a DMDW connection in between.

In this second datacenter we also want to let customers connect to, but different customers. So the customers connecting to the first datacenter will not connect to the second datacenter. 

The customers connecting to the second datacenter will be routed via the DMDW connection to the first node in the first datacenter.

Now my problem: It could happen that the DMDW connection breaks down and all the customer connections from the 2nd to the 1st datacenter will be lost. Now i want that the customers of the 2nd datacenter connect to the 2nd node and continue to work.

In the setup i have now, that's not possible, because you will create  split brain issue. But how can i make this to work?

Help in creating first Windows Failover Cluster

$
0
0

This is my first attempt in creating a failover cluster. I've followed the instructions on "Failover Cluster Step-by-Step Guide: Configuring Accounts in Active Directory" along with the required ports.

https://technet.microsoft.com/en-us/library/cc731002(v=ws.10).aspx#BKMK_steps_precreating
http://cybergav.in/2013/07/28/windows-server-failover-cluster-port-requirements-for-intra-node-connectivity/

Added the Failover Cluster Featurs via Manager on both server-dev01 & server-dev02. I ran the validation and all seems to check out.  I then create the cluster, "server-cdev", via "Create Cluster Wizard".  That didn't go well.

I pulled the logs via PowerShell command "Get-ClusterLog" and tried looking at the logs but can't really make heads or tails with it.

There's some things I don't get and I'm hoping you can help me out.
 - Why there are random IPv4 and IPv6 (IPv6 not enabled) on the logs that I know nothing off and not on our DNS records in the logs?
 - Both 137 & 3343 ports are open on our firewall but can't telnet to server-dev01 <- either -> server-dev2 on port 3343 / 137. Does this need to be fixed before running the cluster wizard?
 - I see these two but have no idea what they mean (I did see a similar post but no answer to the question - https://social.technet.microsoft.com/Forums/windowsserver/en-US/9d25c123-a763-405f-8c20-61da2d4b4390/cluster-creation-error?forum=winserver8gen)
[DCM] DiskControlManager bitlocker load status 126
[API] DmQueryString failed to retrieve the security   descriptor status 2, default security descriptor will be used for authorizing client connections

I tried posting the logs here but got an error message about the "Body must be 4 - 60000 characters long"

Your help is greatly appreciated..

Clustered Task giving RPC Server unavailable error

$
0
0

Hi everyone,

I have a clustered environment with SQL AlwaysOn. When I try to create a clustered task using powershell using this command, I get an error Register-ClusteredScheduledTask : The RPC server is unavailable.

Register-ClusteredScheduledTaskClusterMyClusterTaskNameMyResourceSpecificTaskTaskTypeResourceSpecificResourceMyResourceNameAction $action Trigger $trigger

I've checked the nodes and cluster which are running, so I'm not sure what's causing this error.

Some or all CSVs will Fail Only on the Weekends. They work just fine during the 5 day week.

$
0
0

Good afternoon,

There is a problem which I have noticed only occurs during the weekends. 

The problem: All or just some CSVs go into the offline state only on the weekends and this of course brings down all virtual machines as they depend on the vhds that are stored on those CSVs.

This is the 2nd weekend where the CSVs have gone offline. There are no problems bringing the CSVs back online and it takes me about 20 minutes to get all the VMs running again but this is horrible if I have deal with this every weekend. 

I have looked through the Event logs and there is nothing which points to an actual problem. Errors in EV include live migration failed errors and hyper-v host and cluster errors. These errors all take place at 3am and are related to the CSVs going offline. The Hyper-V witness is one the CSVs that go offline.

I have a feeling that there might be something happening on the Compellent which hosts the CSVs but I would like to know your thoughts on this. Have you experienced something like this, where the CSVs go offline during the weekends but not during the work day? 

This could also be a networking error between the cluster nodes and the Compellent or drivers? Maybe or maybe not.

The cluster nodes are all R720 with exactly the same hardware configurations and they are all running Windows Server 2012 R2 and all OS updates have been applied.

Any assistance and suggestions would be helpful.

Thanks in advance!

Viewing all 3614 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>