Quantcast
Channel: High Availability (Clustering) forum
Viewing all 3614 articles
Browse latest View live

Mixed version clusters?

$
0
0

Greetings. My question is more pertaining to SQL Server Always On, but I figured if this piece doesn't work that won't for sure.

That said, I found this doc for a rolling upgrade from 2012 R2 to 2016, but this assumes there's an existing 2012 R2 cluster. What if I simply want to  create a new cluster w 2012 R2 on one node and 2016 on the other for the purpose of migrating to a new data center where the vendor doesn't still support 2012 R2? 

I know there are MANY things wrong with this plan, but I still need to find out if it's possible for management. 


Thanks in advance! ChrisRDBA


Is there any impact on witness disk if cluster IP and cluster nodes IP address is changed?

$
0
0

Hi,

I have a requirement to change cluster IP and its node's IP address, wanted to know its impact on witness disk.

What will happen to the witness disk if IP addresses are changed? 

Thank you!

Multi domain VM of Hyper-v Host in CSV of failover Clustering

$
0
0

My company has one AD DS forest that contains 2 domains. All servers run Windows Server 2K12 R2. My company uses ISCSI and Fibre channel storage. I've plan to deploy single Hyper-v cluster that will use Cluster shared volumes (CSV). The cluster must include VM's from both domains.What should I do?

Which option shall I follow:

Join each hyper-v host server to the same AD DS domain

Deploy clustered storage spaces

Deploy serially attached SCSI (SAS)

Join each hyper-v host server to different AD DS domains.


Hyper-V Clustering failed with Cluster Shared Volume

$
0
0

Hello, All

I deployed and configured hyper-v cluster environment with windows server 2016 std evaluation editon.

Successfully, it worked normally without any sort of issue or problem.

But later, I changed its edition to data center by using dism command.

Wired thing is during changing primary node's edition, error happend however; somehow I have it changed normally.

At the second node, there was no any issues, it's successfully changed to data center.

After that, in the primary node, all storages repeated On line(No Access) and Pending status.

Eventually, All VMs are required to be migrated to second node.

What I have done for solving this is

Verifying Firewall, all firewalls are disabled now.

Shutdown and startup both nodes.

copy and paste registry key : There was no parameters key in the registry, HKLM - System - CurrentControlset - Clussvc

So I copied second node has and paste into primary. after that cluster service was possible to startup, before I did it, cluster service was also impossible.

At this point, Rhs key was also copied and paste entirely, I suspect it shouldn't be the same both node but I don't have any idea about it.

Here are captrued shot showing event id and messages.

Please any one of you know about or solved ever, let me know what I should do.

Thanx.

Hyper-V 2019 Update-ClusterFunctionalLevel = MAJOR OUTAGE!

$
0
0

Help!!!

We have just finished upgrading all our Hyper-V nodes from 2016 to 2019.  Mainly due to all the bugs in 2016.  We have found 2019 to be much, much better.  Each one was first evicted before having a full format/reinstall before being added back into the cluster.  There were eight nodes in total SERVER01-08.

Yesterday we ran the Update-ClusterFunctionalLevel command and all hell broke lose.  VMs went offline, blue screen, disk corruption you name it.  Took us hours to get everything back.  It looks like everything on SERVER08 had the problem.

We evicted SERVER08 and rebuilt it again.  The problem now is that it won't rejoin the cluster.

Error 0x5b4

The image we are using has been perfect, fully tested and rock solid.  Network connectivity is good, as is the connection to the SAN.  No issues with pinging the cluster name or any of the other nodes.  Running the validation checks comes back all green.

Config and setings on all nodes are 100% identical as everything is ran from scripts, so no human error.

Everything was done by the book, following Microsofts instructions to the letter.

Functional Level is still at 9 which I believe is 2016

any ideas?







Can't delete file

$
0
0

Hello

I have a two node Hyper-V cluster.  One of the virtual disks in the Cluster Shared Volume became corrupted somehow and the file server it was attached to wouldn't boot because of it.  I detached the vhdx file from the virtual machine so it would boot, and I have a good copy of the vhdx that I have restored from backup, but I can't drop it into the CSV because the old corrupt vhdx is still in there.  I can't delete, move or rename it.  Any attempt to generates this error:

Error 0x80070570: The file or directory is corrupted and unreadable.

There are several other working vhdx files in that directory, so it must be the file and not the directory that is corrupt.  Can someone help me to delete this file?


Hutch

No disks were found on which to perform cluster validation tests

$
0
0

should the storage be set in the cluster if the file share is set to the cluster? it is asked as warning "No disks were found on which to perform cluster validation tests" is highlighted in Yellow at Validate storage persistent reservation in the Failover Cluster Validation Report.

My configuration:

2 HP PCs installed server 2012 R2 

A windows failover cluster created with 2 nodes (HP PC)

a file share witness created in a seperate server (3d PC installed server 2012 R2 too. The witness is connected to the cluster.

such please clarify if the cluster storage (whatever disks or pools) requires to set/configure per the warnings in the clsuter validation report.  

thanks

John

DSN CACHE CANNOT BE FLUSHED

$
0
0

how to flush DNS cathe on Server 2012 R2? the content still exists even though ipconbfig/flushdns or restart are done on the server. waht would the wrong with it?

thanks

John


multisubnetfailover and clsuter parameters

$
0
0

please clarify if it would still need to change HostRecordTTL value if the MultiSubnetFailover= True is set in the additional Connection Parameter at conenct to server in SSMS 2017.  

SQL 2016 standard installed on SERVER 2012 R2

A Cluster with 2 nodes created in 2 subnets

a warning shows "The HostRecordTTL property for network name 'Name: ClusterNAME' is set to 1200 ( 20 minutes). For multi-site clusters the suggested value is 300 (5 minutes)." in the failiver cluster validition report. 

I wonder if it would do either HostRecordTTL value change or MultiSubnetFailover= True. please advise.

thanks

John

CSV access issues from non owner node

$
0
0

We are having an issue on a brand new 2019 datacenter build.

We have a 3 node cluster connected to a 3par SAN via FC. All LUNS are showing and are present as cluster shared volumes.

When we try and set the default replica or hyper-v file location on the non owner node, we get the following error.

Failed to add authorization entry. Unable to open specified location to store Replica files. 

Error: 0x80070057 (One or more arguments are invalid).

Has anybody seen this problem before? It is the second time that we have seen this in a 2019 environment

Cluster network name resource failed to find the associated computer object in Active Directory.

$
0
0

We have set up a Cluster on Windows Server 2016. Initial validation succeeded, however I moved the computer object generated by the Cluster in active directory from it's default location to AD - Computers OU and now seeing this error: 

"Cluster network name resource failed to find the associated computer object in Active Directory. This may impact functionality that is dependent on Cluster network name authentication.

Network Name: Cluster Name
Organizational Unit: OU=Windows DSC,DC=XXXXXXX,DC=Local"

Guidance:

Restore the computer object for the network name from the Active Directory recycle bin.

(domain blanked for security reasons) 

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          7/06/2019 12:51:57 PM
Event ID:      1685
Task Category: Network Name Resource
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      XXXXXXXXX.XXXXXXX.Local
Description:
Cluster network name resource failed to find the associated computer object in Active Directory. This may impact functionality that is dependent on Cluster network name authentication.

Network Name: Cluster Name
Organizational Unit: OU=Windows DSC,DC=XXXXXXX,DC=Local

Guidance:

Restore the computer object for the network name from the Active Directory recycle bin.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
    <EventID>1685</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>19</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2019-06-07T02:51:57.836490300Z" />
    <EventRecordID>10392</EventRecordID>
    <Correlation ActivityID="{C0BF5C0C-E484-4BDC-A006-D7B5895DE02C}" />
    <Execution ProcessID="4572" ThreadID="7176" />
    <Channel>System</Channel>
    <Computer>XXXXXXX.XXXXXX.Local</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="ResourceName">Cluster Name</Data>
    <Data Name="OrganizationalUnit">OU=Windows DSC,DC=XXXXXX,DC=Local</Data>
  </EventData>
</Event>

It was fine until I moved it to the computers OU. My question is, does it need to be in it's default location to work? 

Performance Issue on Storage Space Direct Server 2019 - Getting high read and write Latency

$
0
0

Hello All,

On S2D i am getting performance issue, getting high read and write Latency. From some days getting more issues, not getting constant IOPs, in every second IOPs reach thousands and in next second it came to hundreds, same thing happening with Throughput read and write speed, earlier having performance issue but getting constant IOPs. In admin center it's creating peeks on IOP's and Throughput, due to this hosted VPS are getting hang and slow.

I have configured S2D with 4 nodes having Nvme for caching and SSD for storage as below:

Node 1 : 1x250 Nvme, 3x1TB SSD, Not Having Hyper-v role

Node 2 : 1x500 Nvme, 3x1TB SSD, Not Having Hyper-v role

Node 3 : 2X250 Nvme, 4x1TB SSD, Having Hyper-v role

Node 4 : 2X250 Nvme, 4x1TB SSD, Not Having Hyper-v role

Node 5,6,7 : Not having any SSD or Nvme for storage,  Only having Hyper-V role

All server are connected with 10 GB Ethernet and using CSV to storing the VM files.

Please suggest how to resolve the issue.

Storage Spaces Direct, server specs for SSDs

$
0
0

Hi All,

Looking to build an R&D VDi platform between two nodes using local disks.

I'm planning on buying two servers each with 4 x 1.92tb 6gbps sata SSDs.  My research tells me this:

2 servers meaning 2-way mirror

all ssds so no caching required

auto calculated reserve space

Usable capacity = 6.9tb

fileshare witness hosted away from the cluster

This is the first time I've looked into storage spaces direct as I've always gone with the traditional route of compellent sans.  My servers have an HBA330 card which is needed for this technology (ie, no raid at the hardware level).  I'm confused right from the off regarding installing windows on each server.  Usually I go with 2xssd raid1 for the OS then map my iscsi targets for the storage.  How do I go about setting up the disks so I can get windows installed before then installing the roles to support storage?  Is it simply a case of specing the server with say 2x250gb nvme (raid1) on its own controller card?

I'm going with two network cards.  The first one will give me dual 25gbps for the storage (dedicated fibre switch for storage only), and I'm going with a second card which is dual 40gbps to the LAN.  We have plenty of ports available on our fibre core switch so might as well make use of it all.  Does this sound like a good idea, or should I look into swapping the disks for sas 12gbps ones and upgrading the network card from 25gbps to 40gbps for storage?

The two nodes will also be running hyper-v failover clustering so we can live migrate critical desktop vms (although not all will need to failover)

also, when I add a third (and maybe forth) server I can change to 3-way mirror on the fly?

Thanks!!













Downgrade Cluster level

$
0
0

Hi,

I have to downgrade Windows server 2019 cluster back to 2016

One node I reinstalled to W2016 but can't add to cluster regarding cluster functional level.

Is it possible to create new cluster2, add same storage from existing cluster1

Will work cluster storage in both clusters ?

Partition information lost on cluster shared disk

$
0
0

Hi everyone,


we've got a cluster virtual disk where the partition table and volume name broke. Has anyone experienced a simliar problem and got some hints on how to recover?


The problem occured last friday. I restarted node3 for windows updates. During the restart node1 had a bluescreen and also restarted. The failover cluster manager tried to bring online the cluster resources but failed several times. Finally the resource-swapping came to a rest on node1 which came up early after the crash. Many virtual disks were in an unhealthy state, but the repair process managed to repair all disks so they are now in a healthy state. We aren't able to explain why node1 crashed. Since the storage pool is in dual parity mode the disks should be able to work even if there are only 2 nodes running.

One virtual disk, however, lost its partition information.


Network config:

Hardware: 2x Emulex OneConnect OCe14102-NT, 2x Intel(R) Ethernet Connection X722 for 10GBASE-T

Backbone-Network: On the "right" Emulex network card (only members in this subnet are the 4 nodes)

Client-access teaming network: emulex "left" and intel "left" cards in team; 1 untagged network and 2 tagged networks


Software Specs:

    • Windows Server 2016
    • Cluster with 4 Clusternodes
    • Failover Cluster Manager + File Server Roles running on the cluster
    • 1 Storagepool with 36 HDDs / 12 SSDs (9HDD / 3 SSD on each node
    • Virtual disks are configured to use dual parity:
Get-VirtualDisk Archiv | get-storagetier | fl
  •    FriendlyName           : Archiv_capacity
  •    MediaType              : HDD
       NumberOfColumns        : 4
       NumberOfDataCopies     : 1
       NumberOfGroups         : 1
       ParityLayout           : Non-rotated Parity
       PhysicalDiskRedundancy : 2
       ProvisioningType       : Fixed
       ResiliencySettingName  : Parity

Hardware Specs per Node:

  • 2x Intel Xeon Silver 4110
  • 9HDDs à 4 TB and 3 SSD à 1 TB
  • 32GB RAM on each node

Additional information:

The virtualdisk is currently in Healthy state:

Get-VirtualDisk -FriendlyName Archiv

FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach   Size

------------ --------------------- ----------------- ------------ --------------   ----
Archiv                             OK                Healthy      True           500 GB


The storagepool is also healthy:

PS C:\Windows\system32> Get-StoragePool
FriendlyName   OperationalStatus HealthStatus IsPrimordial IsReadOnly

------------   ----------------- ------------ ------------ ----------
Primordial     OK                Healthy      True         False
Primordial     OK                Healthy      True         False
tn-sof-cluster OK                Healthy      False        False


Since the incident the event log (of current master: Node2) has various errors for this disk like:

[RES] Physical Disk <Cluster Virtual Disk (Archiv)>: VolumeIsNtfs: Failed to get volume information for \\?\GLOBALROOT\Device\Harddisk13\ClusterPartition2\. Error: 1005.


Before the incident we also had errors that might indicate a problem:

[API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.


Our suspicions so far:

We did registry changes to: SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\0001 (to 0009) and set the value PnPCapabilities to 280 (disabling the checkbox "Allow the computer to turn off this device to save power") but not all network adapters support this checkbox so this may have had some side effects)



One curiosity: after the error we noticed that one of the 2 tagged networks had the wrong subnet on two nodes. This may have caused some of the failover role switches that occured on friday, but we're unsure about the reason since they were configured correctly some time before.

We've had a similar problem in our test environment after activating jumbo frames on the network interfaces. In that case we lost more and more filesystems after moving the file server role to another server. In the end all filesystems were lost and we reinstalled the whole cluster without enabling jumbo frames.

We now suspect that maybe two different network cards in the same network team may cause this problem.

What are your ideas? What may have caused the problem and how can we prevent this from happening again?

We could endure the loss of this virtual disk since it was only archive data and we have a backup, but we'd like to be able to fix this problem.

Best regards

Tobias Kolkmann



Cluster resource 'Virtual Machine VMNAME' of type 'Virtual Machine' in clustered role 'VMNAME' failed.

$
0
0

Hello!

I have Hyper-V Failover Cluster with 3 node. 

NODE1: Windows SRV2016

NODE2: Windows SRV2016

NODE3: Windows SRV2019

There are 30 VMs in failover cluster. I can move VMs with Live Migration to all node except one. The one of the VM can move with Live Migration from NODE2 to NODE 1 and NODE1 to NODE2, but I can't move from NODE1 and NODE2 to NODE3 and I get the following error:

Event id: 1069

Cluster resource 'Virtual Machine VMNAME' of type 'Virtual Machine' in clustered role 'VMNAME' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event id: 1205

The Cluster service failed to bring clustered role 'VMNAME' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Why should be the problem?

Thank You.

Storage Spaces Direct Networking

$
0
0
Quick question for those who know this technology

Two node cluster, each with dual 100gb mellonex network cards (RDMA compliant) directly connected to each other. Server1 Nic1 > Server2 Nic1 & Server1 Nic2 > Server2 Nic2

Is it good practice to connect both nics to a virtual SET switch and carry both storage and live migration traffic?

I think it is, but wondering if there's anything else perhaps I haven't thought about

There will be a second network card (quad intel 10gb connected to a physical switch) so my plans for that would be a second virtual switch for lan access and heartbeat

New server with Windows 2019 (Cluster Hyperv 2012 R2)

$
0
0
Hi,

I have a hyperv 2012 R2 cluster with two nodes.

We will replace these two nodes with new servers and we already want to implement with Windows 2019.

What is the best strategy for doing this migration?

Option 1

Create a new cluster and add the CSV volumes to the new cluster and import the VMs?

Option 2

Include the new Windows 2019 nodes in the existing cluster with the Windows 2012 nodes, do the live migration?

Thank you.

Upgrade the virtual machine and Integration Services

$
0
0
Hi,

we recently had experience migrating from a VMware environment that during the Vmtools upgrade was questioned the need to install a Microsoft KB.

In addition to updating the VM version it was necessary to restart all the VMs.

Now I'm running the same procedure only with Cluster Hyperv. In case of a migration from 2012 R2 to 2016 and beyond 2019, what is the correct upgrade recommendation?

Should I upgrade only at the end when it's already in 2019? Or should I do it when I'm in 2016 and then again when I'm in 2019?

Do VMs need to be fully updated with fixes?

Will VMs need to be restarted?

Thank you.

Adding New Version Node Higher Than Cluster with Bottom Version

$
0
0
Hi,

I have to add a new node with 2016 to the cluster with server 2012.

Can I do the inclusion using using (Failover Cluster) of 2012 or do I have to use the (Faiolver Cluster) of 2016?

Thank you.
Viewing all 3614 articles
Browse latest View live