Quantcast
Channel: High Availability (Clustering) forum
Viewing all 3614 articles
Browse latest View live

Shared Storage for a Clustered Certificate Authority

$
0
0

The following guides have been used as a reference for clustering a certificate authority:

  • https://social.technet.microsoft.com/wiki/contents/articles/15067.step-by-step-guide-clustering-an-existing-certification-authority.aspx
  • https://social.technet.microsoft.com/wiki/contents/articles/9256.active-directory-certificate-services-ad-cs-clustering.aspx

They both indicate the use of a shared disk with a letter for storing the database and log.  That disk is then added to the generic cluster role that is created for ADCS.  Can a cluster shared volume be used instead?


Hyper - V Cluster Networking

$
0
0

Hi ,

I have setup a Hyper-V Cluster with 2 servers , researching a lot it is recommended to have converged network for best practices.

I have 4 x Physical NIC on each servers

- Created a NIC team with all 4 adapters

- Created vSwitch on the top of that and

-Created 4 x vNIC's on that

Now I have 4 vNIC's per Host and I want to configure the networks properly, I currently have below in failover clustering

1 vNIC network to use live migration

1 vNIC network to use for Cluster Comms

1vNIC network to use for Host Comms ( this is being used for idrac too ) and

1vNIC network for VM comms

My questions is what are the recommended failover clustering configurations are ? which networks do I allow cluster network , client access or no cluster commnunication

And do I really need a separate network for VM communication ? Can I use the Host Comms for host communications and vm communication

Thank you in advance

VM on CSV goes Pause-Critical

$
0
0
Hi guys,

I hope some of you can enlighten some questions.

How many hosts/nodes do you recommend in a Hyper-V Cluster?

We are having som CSV issues when thay are in some cases when moved to anoter hosts are about 30 to 60 seconds and therefor the VMs on "that" CSV is going in "Pause/Critical" mode, the hosts is connected to our NetAPP AllFlash SSD via Fiber Channel.
We have looked in to the number of hosts in the Clusters (the biggest Cluster have 15 hosts, and the smallest Cluster have 4 hosts) - all hosts have the same OS installed, patchlevel and so on - som Clusters are server 2016 and others server 2019.
It looks like it is something that develops over time (1+ month) and a reboot of all of the hosts (one at the time) seems to fix the Pause/Critical issue. 

Please let me know i any of you have experiend something like.

Best and kind regards.

Please remember to mark the replies as answers if they help and unmark them if they provide no help.

Failover Clustering - service failure on 1 node

$
0
0

Hello,
I have a 3 node Cluster on Windows Server 2016.
For some weeks one of the nodes (node3, a quorum node) was not part of the domain when it should have been. The remaining nodes continued to work.

Node3 was re-added to the domain thinking that would resolve the errors, but alas no.

The cluster event on node3 shows error “Cluster failed to start. The latest copy of cluster configuration data was not available within the set of nodes attempting to start the cluster. Changes to the cluster occurred while the set of nodes were not in membership and as a result were not able to receive data updates.” and

“Cluster node “Node3” failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls”

Firewall is off. All nodes can ping and browse to folders on one another.

The Cluster service has been restarted on all nodes.

Running command “cluster node /status” on node3 returns:
Node1 Down
Node2 Down
Node3 Joining

Running the same command on Node1 and Node2 returns the status as being Up for all.

Couple more logs from the Event viewer:
“The Cluster Service service terminated with the following service-specific error: The wait operation timed out.” and “The Cluster Service service terminated unexpectedly.  It has done this 53 time(s).  The following corrective action will be taken in 0 milliseconds: Restart the service.”

A generation of the cluster log file gives the same errors as above but with an added:
“Attempt to start the cluster service on all nodes in the cluster so that nodes with the latest copy of the cluster configuration data can first form the cluster. The cluster will be able to start and the nodes will automatically obtain the updated cluster configuration data. If there are no nodes available with the latest copy of the cluster configuration data, run the 'Start-ClusterNode -FQ' Windows PowerShell cmdlet. Using the ForceQuorum (FQ) parameter will start the cluster service and mark this node's copy of the cluster configuration data to be authoritative.  Forcing quorum on a node with an outdated copy of the cluster database may result in cluster configuration changes that occurred while the node was not participating in the cluster to be lost.”

How can we add node3 back into the cluster successfully? 

Validate Active Directory Configuration

$
0
0

    Hi Folks, 

    Getting the below error while testing the cluster failover validation .

    Description: Validate that all the nodes have the same domain, domain role, and organizational unit.

    Start: 8/14/2019 2:54:31 PM.
    Validating that all nodes have the same domain, domain role, and organizational unit.
    FqdnDomainDomain RoleSite NameOrganizational Unit
    USTYHPV01..COM.COMMember ServerDefault-First-Site-Name
    The distinguished name of node USTYHPV01 could not be determined because of this error: There was an error getting information about the organization unit for node 'USTYHPV01..COM' from the domain '.com'.
    The organizational unit of node USTYHPV01.COM could not be determined because of this error: Did not find an Organization Unit (OU) in the Active Directory
    Connectivity to a writable domain controller from node USTYHPV01.COM could not be determined because of this error: Could not get domain controller name from machine USTYHPV01.
    Node(s) USTYHPV01.COM cannot reach a writable domain controller. Please check connectivity of these nodes to the domain controllers.

new-cluster static address was not found on any cluster network

$
0
0

Hi guys,

Recently my 2 node cluster got into some issue one of my node was not able to start up. I tried running start-custernode -forcequorum

but i got error Start -

clusternode: the system cannot find the file specified.

No solution found. So i went to the other node which still with the cluster and remove the cluster.

Removing of cluster is fine no issue. However i face another problem the moment i wanted to recreate the cluster.

The error shown was -

New-Cluster: Static address 'x.x.x.x' was not found on any cluster network.

If anyone know what going on please let me know. Thanks.

Live migraton failed between different processor

$
0
0

Hello,

I have a problem about hyper-v failover cluster live migration. Originally i have a Windows Server 2012R2 hyper-v failover cluster with 3 node members (2 x Dell R420 1 X Dell R430). The live migration works perfect between the nodes. Our company buy 2  Dell R440 server and a new storage and we bulid a new Windows Server 2019 hyper-v failover cluster. After migrate VM's to the new cluster we destroyed the old cluster and reinstall the R430 to Windows Server 2019. Next step we add the R430 to the new cluster.

node1 (R440):  Intel® Xeon® Silver 4116 Processor

node 2 (R440): Intel® Xeon® Silver 4116 Processor

node3 (R430): Intel® Xeon® Processor E5-2640 v2

If I try live migraton from node 1 or node 2 to node 3 live migration failed:

Event 21502

Live migration of 'VMNAME' failed.

Virtual machine migration operation for 'VMNAME' failed at migration destination 'NODE03'. (Virtual machine ID 63AFF93A-13F7-40B9-8C4A-32B9E6801448)

The virtual machine 'VMNAME' is using processor-specific features not supported on physical computer 'NODE03'. To allow for migration of this virtual machine to physical computers with different processors, modify the virtual machine settings to limit the processor features used by the virtual machine. (Virtual machine ID 63AFF93A-13F7-40B9-8C4A-32B9E6801448)

 

Processor compatibility is already turned on every VM!!

If I turned off VM I can migrate to node3. After offline migration I turend on VM on node3 (Dell R430) I can move the VM between all nodes but if I restart VM  node1 or node2 live migration to node3 fail again.

 

All nodes is updated with SUU and the OS is up to date.


new-cluster static address was not found on any cluster network

$
0
0

Hi guys,

Recently my 2 node cluster got into some issue one of my node was not able to start up. I tried running start-custernode -forcequorum

but i got error Start-clusternode: the system cannot find the file specified.

No solution found. So i went to the other node which still with the cluster and remove the cluster.

Removing of cluster is fine no issue. However i face another problem the moment i wanted to recreate the cluster.

The error shown was -

New-Cluster: Static address 'x.x.x.x' was not found on any cluster network.

If anyone know what going on please let me know. Thanks.


Windows 2012 rolling upgrade to 2016 file server

$
0
0

Hi Folks,

    I am not even sure is this fully supported by Microsoft .  I am doing  POC for this one. Sharing with the rest of you of what I have out found also to spare you some time of troubleshooting.

1. Adding 2016 node must be done in Windows 2016 Failover clustering manager.  Failing to do so cause the cluster to go offline.

2.In mixed mode ironically, configuring the file server role in Windows 2016 Failover clustering manager will not work .  I need to do it in Windows 2012 node for it to work.

3. At this point everything seems ok. Both Cluster  and Client Access Point are up.

4. Problem arises when I want that 2016 node to take ownership of the File server role (need to do this as I want to evict 2012 node one by one). The File server role immediately gone down when I do  this.

Troubleshooting steps taken:

1. Delete the Virtual name computer object and create the new one.

Any help and tips will be appreciated. Thank you.

how to check RCA for heartbeat missing

$
0
0
  • \

  • The cluster service was halted to prevent an inconsistency within the failover cluster . the error code was 1359

  • Server : windows 2016

    As per my investigation , the  network adapter reset issue was observed at the same timestampi.e., 3:18:26 AM on 07-01-2019. Please be informed that cluster logs timezone will be in GMT timezone.


    00000c64.00001950::2019/07/01-07:18:33.587 INFO  [IM - Cluster Network 1] Resetting interface state calculation state

    00000c64.00001950::2019/07/01-07:18:33.587 INFO  [IM] Leader is sending request for all interfaces in the current view

    00000c64.00000b44::2019/07/01-07:18:33.587 INFO  [DCM] Force disconnect payload: netname \xxxxxxx, requested disconnect status (0), src <null>, dest <null>

    00000c64.00000b44::2019/07/01-07:18:33.587 ERR   [DCM] Force disconnect failed on DisconnectSmbInstance::CSV, status (c000000d)

    00000c64.00000b44::2019/07/01-07:18:33.587 INFO  [DCM] Force disconnect(DisconnectAll): server \169.254.2.228, DisconnectSmbInstance::CSV

    00000c64.00000b44::2019/07/01-07:18:33.587 INFO  [DCM] Releasing RDR handle for target node id 2

    .000006ec::2019/07/01-07:19:02.884 ERR   [NODE] Node 1: Connection to Node 2 is broken. Reason (10054)' because of 'channel to remote endpoint 169.254.2.228:~3343~ has failed with status 10054'

    00000c64.000006ec::2019/07/01-07:19:02.884 WARN  [NODE] Node 1: Initiating reconnect with n2.

    00000c64.000006ec::2019/07/01-07:19:02.884 INFO  [MQ-thpqhms0] Pausing

    00000c64.000008dc::2019/07/01-07:19:02.884 INFO  [Reconnector-thpqhms0] Reconnector from epoch 1 to epoch 2 waited 00.000 so far.

    00000c64.00001930::2019/07/01-07:19:03.012 INFO  [IM] got event: Node with FaultTolerantAddress xxxxx:~0~ has gone down with fatal error\crash

    00000c64.00001930::2019/07/01-07:19:03.013 ERR   [IM] Couldn't find node id for remote virtual IP xxxxxxxx:~0~

    0000194c::2019/07/01-07:19:14.683 DBG   [NETFTAPI] Signaled NetftRemoteUnreachable event, local address 10.81.64.153:3343 remote address 10.81.65.25:3343

    00000c64.00001930::2019/07/01-07:19:14.683 INFO  [IM] got event: Remote endpoint 10.81.65.25:~3343~ unreachable from xxxxx

    00000c64.00001930::2019/07/01-07:19:14.683 INFO  [NDP] Checking to see if all routes for route (virtual) local xxxxx:~0~ to remote 169.254.2.228:~0~ are down

    00000c64.00001930::2019/07/01-07:19:14.683 WARN  [NDP] All routes for route (virtual) local 169.254.1.43:~0~ to remote xxxxxxxxx:~0~ are down

    00000c64.00001924::2019/07/01-07:19:14.683 INFO  [CORE] Node 1: executing node 2 failed handlers on a dedicated thread

  • Also found this in event logs :

    07-02-2019           7:20:42 AM           Warning thpqghs0.prod.travp.net     10400    Microsoft-Windows-NDIS   N/A         N/A         The network interface 'vmxnet3 Ethernet Adapter' has begun resetting.  There will be a momentary disruption in network connectivity while the hardware resets. Reason: The network driver detected that its hardware has stopped responding to commands. This network interface has reset 1 time(s) since it was last initialized.

Please let me know if this causing the issue

Windows 2012 R2 rolling upgrade to 2016 file server

$
0
0

Hi Folks,

    I am not even sure is this fully supported by Microsoft .  I am doing  POC for this one. Sharing with the rest of you of what I have out found also to spare you some time of troubleshooting.

1. Adding 2016 node must be done in Windows 2016 Failover clustering manager.  Failing to do so cause the cluster to go offline.

2.In mixed mode ironically, configuring the file server role in Windows 2016 Failover clustering manager will not work .  I need to do it in Windows 2012 node for it to work.

3. At this point everything seems ok. Both Cluster  and Client Access Point are up.

4. Problem arises when I want that 2016 node to take ownership of the File server role (need to do this as I want to evict 2012 node one by one). The File server role immediately gone down when I do  this.

Troubleshooting steps taken:

1. Delete the Virtual name computer object and create the new one.

Any help and tips will be appreciated. Thank you.


FIle Cluster 2016 _ New SMBshare : The request is not supported.

$
0
0

HI 

I have built a fresh windows 2016 File server Failover cluster. I have been trying to create file shares through Powershell where in i face this error. 

New-SmbShare : The request is not supported.
At line:1 char:1
+ New-SmbShare -Name "T1" -Path "e:\Test2" -FullAccess "example\Testuser1"
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo       : InvalidOperation: (MSFT_SMBShare:ROOT//Microsoft/Windows/SMB/MSFT_SMBShare) [New-SmbShare
   ], CimException
    + FullyQualifiedErrorId : Windows System Error 50,New-SmbShare

Please help. 

High Availability file service will not sync - no offline files are available.

$
0
0

I set redirected folders to a HA file service.  When I click on the documents folder the Status: Online icon shows up.  When I look at the offline files folder there is nothing there.

If I map and drive to the folder, the "Make available Offline" selection does not show up.  If I use a non-HA file share, the "Make available Offine" option appears.

Any ideas?

The cache is enabled on the share.

CSV - CLUSTERING Hper-V Hosts!

$
0
0

I have a 4 node cluster of Hyper-V Hosts Win2012R2 with Clustered Shared Volumes

I have around 22 VMs spread across these nodes sitting on the CSVs

Every now and then VMs get in failed state and I have to COLD Boot my Hosts to get them back online

Is it because of CSV....? I have a VM which is on C Drive and it remains fine, no issues never

Please advice, whats the best way for VMs to be highly available going Win2019

Thanks a lot

PS I have a 1TB SAN


SV

Failovercluster Management Network Interface

$
0
0

Hello Friends,

my question today is revolving around Failoverclustering (which is sitting, in our case, on top of a S2D deployment). Everything is working fine in that regard.

Next week we will get new switches and I plan to trunk/lacp both SFP+ ports on the servers (at the moment only one SFP+ port of every server is connected to the switch) and connect them to the new switches.

Iam aware how trunking/lacp works on the powershell, thats not the problem. What I fear is the connectivity of the failovercluster itself. When I have to dissolve the Hyper-V Virtual Ethernet Adapter to form a new LACP, so I can create a new Virtual Adapter on top of the LACP, the failovercluster will lose the only management interface. 

When I look into the Failovercluster MMC Snap-In under "Network" I can see my interfaces, but I dont see any buttons to add interfaces and declare them as management network (my idea was to use the copper ports of the server nic to create a temporary management interface until I have built the LACP/virtual adapter and integrate them as the primary management interface).

So the question remains: can I add (maybe through powershell) additional interfaces as management to the failovercluster or do I actually have to dissolve the whole cluster and build anew?

Thanks in advance.

Best Regards,

Constantin



Unable to mount the File cluster resource shares ( Windows 2016 Cluster) on AIX systems

$
0
0

Hi 

I have been trying to upgrade file cluster from windows 2008R2 to 2016. I decided to build new cluster in 2016 and do storage remapping from old cluster to new cluster which i was successful. I have a file cluster resource name (Test-Batch) created on 2016 cluster. We have some AIX systems which needs to mount this shares. In my old 2008R2 cluster , it worked very fine. When i use 2016 cluster. the AIX systems couldn't mount the file shares like they can mount till \\test-batch\ but not beyond that. Meaning i have couple of shares under  \\test-batch\ lest say T1,T2..T10. The AIX systems can mount till \\test-batch\ but not \\test-batch\t1 , \\test-batch\t2 ..etc. I have checked all the permissions and everything was perfect working. 

I have also noticed like if i do telnet to my resource name over port 139 it works for 2008R2 cluster but not for my 2016 cluster. AIX team could mount the file share using samba client for 2016 cluster. It works because samba uses port 445 for file share. But for windows it uses 139 as this is not working AIX team says. we need to fix it. Im completely clueless who to fix that. 

Port 139 is opened on the new 2016 nodes and works, it doesnt just work for the file cluster resource name. 

Even i changed the SMB version usage on 2016 to use smb1.0 still it didnt work. 

Any suggestions please how to fix this.

Server2016 Cluster network traffic coming from host ip rather than role ip

$
0
0

Hello

I have two 2016 vm's in a hyper-v environment that are clustered. Each VM is on a separate physical host.

Each VM only has 1 nic. My clusters ip's are as follows:

172.18.1.113 ProductionIP - Role IP
172.18.1.114 Cluster IP
172.18.1.115 VM Host A
172.18.1.116 VM Host B

I've added the Role IP address (172.18.1.113) to an ipsec tunnel on my firewall, but my firewall see's the traffic as coming from either of the 2 host ip addresses (.115 or .116).  If I ping the remote end of the ipsec tunnels host from the either host A or B and source it as the .113 the ping works, but by default it always takes host ip and fails. 

How do I get the clusters nodes to always send traffic out of the role ip no matter which node is active? 

Thanks

Dan

File Server Role in Cluster-to-Cluster Replica

$
0
0

I have an asynchronous cluster to cluster storage replica setup across a metro link and I'm replicating two volumes that are assigned to a general use file server role and providing a coupled shared folders. That role (just called "storage1", accessible at \\storage1) is working fine and the volumes are all replicating correctly.

When I set this up, the second cluster was also given the same type of role (called "storage2", accessible at \\storage2). However, when I enabled storage replica, that role disappeared. Now I'm a bit confused as to how we actually are going to go about failing from one site to the other - I had assumed it would involve a powershell command and a change in DNS to point the storage target from storage1 to storage2 and wait for clients to update, but now I'm wondering if that's actually the case. Will the entire role (and subsequently the DNS) for "storage1" be migrated to the replica site? What are the implications of the two sites losing connection ... is there any automated movement I should be aware of that may take place or trigger a split-brain scenario?

Disks at site 2 assigned to storage2 file server role:

No role actually listed at site 2 now:






Paul Hite - MCSE, MCITP

部署DHCP Cluster,管理DHCP服务器时显示“无效的扩展名 CLuster_SNAPIN_EXTENSION_NAME ”

$
0
0

部署DHCP Cluster,管理DHCP服务器时显示“无效的扩展名 CLuster_SNAPIN_EXTENSION_NAME ”

请问该如何解决?谢谢。

Windows 2016 MPIO Does Not Appear (COMPLENTCompellent Vol)

$
0
0

Hi,

we added support for ISCSI, but no (COMPLENTCompellent Vol) appeared.

It's all working, but I was curious why this happened.

Two other servers appeared successfully and have the same version of Windows 2016.


Tks.

Viewing all 3614 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>