Quantcast
Channel: High Availability (Clustering) forum
Viewing all 3614 articles
Browse latest View live

Windows Admin Center: Missing sddcres.dll

$
0
0

Hello,

I have recently spun up a 3-Node failover cluster with S2D and Hyper-V roles installed, configured, and actively working. Windows Admin Center is pointing me to an article that states it relies on a set of APIs that are not included in Server 2016. However, when I run the command posted on the article, it fails and displays that "C:/windows/cluster/sddcres.dll" doesn't exist. According to the article the libraries are downloaded in 2016 if the 05-2018 KB is installed. I've verified that all 3 nodes are on 07-2019 (just ran CAU to ensure it was installed on all nodes and it was successful). This still didn't fix the command. So, I downloaded the update directly from the Microsoft Update Catalog just in case.. and the installer returns a message that "this update is not applicable".

During the deployment of these nodes I didn't see anything that specifically mentioned 'Hyper-Converged' or a setting I needed to toggle to indicate that. As far as I'm aware the term Hyper-Converged just describes the configuration of the architecture (S2D+Hyper-V on boxes in a Cluster).

Everything in the cluster validation is coming back valid, and I've verified that S2D is functional (NVMe are "Journals" and my HDD/SDD pool is correctly displaying as Capacity & Performance).

Any recommendations?




Showing Un-Monitor and isolated host from the fail over cluster

$
0
0

Hi Expert,

We are encountering same type of issue in my failover cluster environment " your host XYZ is un monitored state or islolated in cluster".

Due to this error my all belonging VMs of particular host were restarted or shutdown. i created 2-3 times support tickets to Microsoft but we did not get any finding or solution from them. I retsrated my host and then it will be ok.

Kindly advise me.

IN my failover cluster, we have 4 hosts and we are using Server 2016.

Thanks in advance.


ejaz

Unable to make a storage pool

$
0
0

Hello all :)

I'm currently in the process of teaching myself about Server 2019 and some of the technologies I've not had chance to play with before.

The one that I am trying at the moment is creating a file server using failover clustering.
I am able to create the cluster (LAB-CLUSTER01) using 3 servers (LAB-S03,LAB-S04 and LAB-S05), running Server 2019 DC Core.

I have created 3 storage pools before creating the cluster (S03-SP, S04-SP and S05-SP). These pools are made of 4 virtual SSDs, creating a single drive.

All of this is running on ESXi 6.5

The storage pools are all running and happy without issue but I am unable to access them from the cluster. The error given below

'Failed to bring the resource 'S03-SP' online.

The device does not recognize the command

Looking at the physical disks tab in cluster manager, they are all marked as 'Becoming Ready'

Once I have tried to add this pool to the cluster, I am then no longer able to access it from within Windows Server Manager.

Would it be possible for someone to advise what I causing this and what can be done (if anything) to fix it.

Many thanks
Tom


How to clustering Windows Server 2016 two different types hardware Dell vs Lenovo server

$
0
0

I have a question about Clustering between two different hardware companies.

I have a Lenvo x3650M 5 5462 server running Windows Server 2016.

Now I have another server, the Dell R740, which also runs Windows Server 2016.

My question is whether to run Windows Server 2016 clustering on Lenovo and Dell OS servers 2016.

Thanks for technical advice


Need help extending a clustered shared volume

$
0
0


Hello everyone,

I am new to Clustered share volumes within server 2012 r2. I am trying to expand or create a new volume.

I have tried to use diskpart to expand the V$ but I keep getting the error that there is not space available to expand.

This volume is on a 12TB SAN. I can see 1.2TB are available.

Does anyone know what I am missing? I don't know why I can see it available on the machine, but not within diskpart.


Problem running Update-ClusterFunctionalLevel on Server 2019

$
0
0

Hi

I have in-place upgrade a 2 node SQL cluster (from Server 2016 Std. to Server 2019 Std.). The whole process worked as expected.

Now I want to run Update-ClusterFunctionalLevel, but it is returning the following error:

Update-ClusterFunctionalLevel : You do not have administrative privileges on the cluster. Contact your network administ
rator to request access.
    Access is denied
At line:1 char:1+ Update-ClusterFunctionalLevel+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ CategoryInfo          : AuthenticationError: (:) [Update-ClusterFunctionalLevel], ClusterCmdletException+ FullyQualifiedErrorId : ClusterAccessDenied,Microsoft.FailoverClusters.PowerShell.UpdateClusterFunctionalLevelCo
   mmand


I the Microsoft-Windows-FailoverClustering/Diagnostic eventlog it gives me the following error:

EventID: 2051

Description: [CORE] mscs::ClusterCore::VersionUpgradePhaseTwo: (5)' because of 'Gum handler completed as failed'

I think all permissions are correct, but I can't find the root cause, can you please help me?



Failover Clustering Task Scheduler Survey

Some cluster networks with unavailable status

$
0
0
Hello. When we "mounted" the Cluster Failover with Windows Server 2012 R2, and all Networks were "Up", however we realized that we were not able to do Live Migration, and we checked into the Cluster Networks part and saw several interfaces with Status of "Not available". However, when we test access to these interfaces, they are normal and accessible. We have already checked Anti-Virus and Firewall on all Cluster servers (Nodes), and there is no restriction on Anti-Virus and Firewall is disabled.

Print attached.

NOTE: I already did what is on http://blog.mpecsinc.ca/2010/03/nic-binding-order-on-server-core-error.html

NOTE 2: This is only happening on some Interfaces of "Cluster Network 3", "Cluster Network 2" and "Cluster Network 1", all interfaces are "Up"

Guest file server cluster constant crashes

$
0
0

Hi

I have make working a guest file server cluster with Windows Server 2019. the cluster crash constantly, being very slow and finally crashing all my hypervisors servers....

Hypervisor infrastructure:

  • 3 hosts windows server 2019 LTSB datacenter
  • iSCSI Storage 10 Gb with 11 LUNs
  • cluster valid for all tests

Guest file server cluster, 2 VM with the same config:

  • VM 2nd generation with 2019 LTSB Server
  • 4 virtual UC
  • 8GB of non-dynamic RAM
  • 1 SCSI controller
  • primary hard drive: VHDX format, SCSI Controller, ID 0
  • empty DVD drive on SCSI controller, ID 1
  • 10 VHDS disks on SCSI controller, ID 2 to 11, same ID on each node
  • 1 network card on virtual switch routing to 4 physical teamed network cards.
  • Cluster is valid for all tests except the network with one failure point for non redundancy.


after some time, the cluster become very slow, crash and make all my hypervisors crashs. the only errors returned by Hyper-V is some luns became unavalaible due to a timeout with this message:

Le volume partagé de cluster « VSATA-04 » (« VSATA-04 ») est à l’état suspendu en raison de « STATUS_IO_TIMEOUT(c00000b5) ». Toutes les opérations d’E/S seront temporairement mises en file d’attente jusqu’à ce qu’un chemin d’accès au volume soit rétabli.

I have checked every single one parameters on VM and Hyper-V config, search with each hint I was given by logs but nothing and the crashes remains....

and sorry for my poor language, english is not my main ability for speaking

Zero Downtime File Server - Would this setup work?

$
0
0

Hello everybody,

I was given the task to plan a redundant file storage environment that can compensate failure of any component without service interruption. This is a field I have little experience with, which I want to confirm that the concept I am working on actually works. I don't have the resources to build a test system available at the moment either, making this a very theoretical construct.

I want to use a Windows Failover Cluster with a Scale Out File Server role installed. Three physical servers with limited storage space for only the operating system are supposed to be the nodes of this cluster (three as to avoid using a file witness). A single SAN storage solution will provide the storage space for the file server, attached to the individual nodes via fibre channel. The SAN storage itself has all components built in redundantly, eliminating the need to provide a second storage unit and managing the synchronization of both.

The clients are expected to then connect to the file service provided by the cluster which is then (transparently) handled by any of the nodes and, in case of failure of this node (e.g. loss of power), instantly taken over by another without interruption or considerable delay.

In case it is important: The file server is supposed to host files of different applications including resources and configurations. These applications are not run on the server, but on clients. They are executed FROM the server share though, so constant and uninterrupted file provision is required, otherwise the applications will eventually crash. Executing from the server share is mandatory.

Now as I mentioned my experience with this is rather limited, and while the concept is based on what I read from MS documentation I would like to ask you for confirmation of this working or, in case it doesnt, advice on what to do differently.

Additionally, as far as I understand running a domain controller role on the same server that is running a scale out file server role is not possible or at least not recommended. Is this still valid for Server 2019 and if, is there a way to achieve the goal of zero downtime file provisioning on the same device that is running a DC or do it have to be seperate machines?

Thanks in advance!

Drive on all nodes in SQL Availability Group "Formatted" at the same time (Cluster on Windows 2016 standard)

$
0
0

We have a 2 node SQL Availability Group on a Windows 2016 Std Cluster.

SQL Server reported the databases suspect after the data drives on both servers appeared to have been formatted.

On one of the servers we found the following events:

Event ID 7036 on 7/26/2019 at 9:37:55AM

Event ID 98 on 7/26/2019 at 9:38:12AM

Event ID 98 on 7/26/2019 at 9:38:13AM

These appear to indicate that the drive was formatted.

We have tested and found that using the Powershell "Format-Volume" command (Run locally or remotely) against one server causes the same drive on both nodes in the Cluster/AG to be formatted.

One possible cause is a server build script has been run with incorrect server details and we are investigating this possibility.

My questions are:

Has anyone experienced drives being "Formatted" simultaneously across nodes in a Clustered SQL AG?

Is the formatting of drives on an Availability Group supposed to affect all nodes? I've not found documentation to explain this.

How to automate actions based Cluster Validation Test results?

$
0
0

In windows clustering you can run a "Cluster Validation Report" either from the Cluster Administration Console or from PowerShell using Test-Cluster.

However, the output is an .htm file, which isn't really super helpful compared to getting a list of True/False values like you would expect from a "proper" PowerShell cmdlet 😉

So, my question is whether anyone knows of a way to pass the results from Test-Cluster on, so I can build something that can fix the settings that failed?
Or do I really only have a choice between inventing the wheel by creating a bunch of tests myself, or manually reading a report?

I find it hard to really believe that this is something that hasn't been automated yet?

I have been googling fairly hard, but haven't been able to find any tooling around this already.
(I did suggest fixing our build pipeline so we could have a success-rate higher than 15% on new clusters, but apparently that's not popular ¯\_(ツ)_/¯)

ps. currently I'm looking at whether I can parse the htm file that is output, but meh -__-

Can I migrate a VM from another node to current active node?

$
0
0

Hi

Is it possible to migrate a VM from another node in cluster to currently active node of hyper V Cluster?

Any clue is highly appreciated.

Thanks

Problem with virtual disk on 4 node cluster.

$
0
0
Hi Guys



I am going out of my mind.. Been struggling with this for days unable to find something that can bring me along the right path.

My cluster was powered down when starting up and that resulted in a virtual disk being stuck in an "online pending" -> "Failed" -> "Online pending" loop. And then i tries to start it on another server. So it keeps bouncing around all 4 servers.



I have tried almost all articles i could find. When running get-storagejob i have 1 job that keeps running:

Name   IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
----   ---------------- ----------- -------- --------------- -------------- ----------
Repair True             00:01:25    Running  0               0              45097156608



It seems that every 2-3 minutes the jobs restarts. I am getting this info in the event log (Sorry for missing pics i was not allowed to post them):

EventID: 1069

Cluster resource 'Cluster Virtual Disk (HyperVDisk1)' of type 'Physical Disk' in clustered role '96fd0e69-9c2d-41c0-92e3-09bdcd126686' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.



EventID: 5142

Cluster Shared Volume 'HyperVdisk1' ('Cluster Virtual Disk (HyperVDisk1)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.



EventID: 5142

Cluster Shared Volume 'HyperVdisk1' ('Cluster Virtual Disk (HyperVDisk1)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.



EventID: 1793

Cluster physical disk resource online failed.

Physical Disk resource name: Cluster Virtual Disk (HyperVDisk1)
Device Number: 5
Device Guid: {a75e8b5d-a226-4b0e-b6d4-cde8fffa4d1b}
Error Code: 5008
Additional reason: WaitForVolumeArrivalsFailure



EventID: 1795

Cluster physical disk resource terminate encountered an error.

Physical Disk resource name: Cluster Virtual Disk (HyperVDisk1)
Device Number: 5
Device Guid: {a75e8b5d-a226-4b0e-b6d4-cde8fffa4d1b}
Error Code: 1168



What i have tried:

This article from kreelbits: storage-spaces-direct-storage-jobs-hung



Tried optimize-storagePool and repair-virtualDisk with no success 



Found a great article from JTpedersen on troubleshooting-failed-virtualdisk-on-a-storage-spaces-direct-cluster



Every time i tried to run: 
Remove-Clustersharedvolume -name "Cluster Virtual Disk (HyperVDisk1)"

1 time i got that the job failed because the disk was moving to another server (Not the exact wording)

The normal response is it just hangs on the command and have been doing that for +24 hours.



To me it seems that the problem is that before any commands can get a hold of the disk it restarts the storageJob og moves the disk to another server and restarts the loop.



Thanks i advance.



/Peter





WMI Equivalent of powershell Start-ClusterResource and Move-ClusterVirtualMachineRole

$
0
0

Hi,

I want to use these two commands. 1) Start-ClusterResource and 2) Move-ClusterVirtualMachineRole for first starting the VM and then moving to the active host.

If I move VM without starting it, it does not move. so first I start it and the move it.. It works but

How can i do this using WMI? which are their's WMI equivalents?

Please help



migrate a VM using Failover Cluster WMI Providers

$
0
0

Hi Community,

I want to migrate a VM from node B to node A, where node A is current node.

I used WMI Providers MSCluster_Resource.

I first used BringOnline Method to make the resource online.

Then I am trying to use ExecuteResourceControl method with control code: clusctl-resource-vm-start-migration

(https://docs.microsoft.com/en-us/previous-versions/windows/desktop/mscs/clusctl-resource-vm-start-migration)

but this code seems not be existing while docs show that this is supported from 2012 onwards.

Any help is appreciated on this..

HYPER-V WIN SERVER CORE 2019 IN ONE VHDS SHARING OS

$
0
0

MY IDEA IS TO RUN ONE OS IN MORE HYPER-V MACHINES

I have build two hyper-v machines[VMCSV, VM11] with vhds (sharing in both machines), i installed win server 2019 core inside vhds and build cluster sharing volume (CSV) AND but the OS of vhds in the CSV... it's run good in both machines as same time .....Awesome!!


The problem::

when i try to used one of the machines the other shown BLUE SCREEN ERROR.. why??

and if there is any solutions to used both machines as same time please write down 

i have pictures of issues but didnt allow to submit until verify my account!!

Error applying Replication Configuration Windows Server 2019 Hyper-V Replica Broker

$
0
0

Hello,

Recently we started replacing our Windows Server 2016 Hyper-V Clusters for Server 2019. On each cluster we have a Hyper-V Replica broker that allows replication from any authenticated server and stores the Replica Files to a default location of one of the Cluster Shared Volumes.

With WS2019 we run into the issue where we get an error applying the Replication Configuration settings. The error is as follows:
Error applying Replication Configuration changes. Unable to open specified location for replication storage. Failed to add authorization entry. Unable to open specified location to store Replica files 'C:\ClusterStorage\volume1\'. Error: 0x80070057 (One or more arguments are invalid).

When we target the default location to a CSV where the owner node is the same as the owner node for the Broker role we don't get this error. However I don't expect this to work in production (moving roles to other nodes).

Did anyone ran into the same issue, and what might be a solution for this? Did anything changed between WS2016 & WS2019 what might cause this?

Kind regards,

Malcolm

Windows Server 2016 Failover Cluster Get-Volume lists all volumes

$
0
0

I created a 2-node failover cluster in my Hyper-V environment. 

My concern here is that when I RAN:

Format-Volume -DriveLetter D

The D drives on both nodes were formatted.

When I ran Get-Volume on one of the nodes, I noticed that my D & E drives on each node was listed twice.

I noticed that 'Storage Replica' was added as a Cluster Resource Type and that the following device is installed:

Microsoft ClusPort HBA

Which some cursory research says:

"The Software Storage Bus (SSB) is a virtual storage bus spanning all the servers that make up the cluster. SSB essentially makes it possible for each server to see all disks across all servers in the cluster providing full mesh connectivity. SSB consists of two components on each server in the cluster; ClusPort and ClusBlft. ClusPort implements a virtual HBA that allows the node to connect to disk devices in all the other servers in the cluster. ClusBlft implements virtualization of the disk devices and enclosures in each server for ClusPort in other servers to connect to."

Is this by design? Is there a way to disable this? How do we fix this?

Windows Server 2016 Standard, running on Hyper-V



2012 R2 Scale-Out File Server Performance Issue

$
0
0

I'm implementing a product called AppLayering by Citrix in a VMware environment. It creates a unique .vhd for each piece of software you install and want to deploy to end users. We created a Scale-Out File Server for the share so that we could have 100% up time from crashes and updates/reboots. The end user machines mount the .vhds at login; usually anywhere from 5-15 of these .vhds which range from 1Gb to 12GB in size.


Now that I'm increasing the amount of machines accessing this share, sometimes I experience a very long delay, as much as 6 minutes, before the layers are mounted. They usually mount within seconds. However, it's not consistently worse the more machines that are logged in, rarely it's still instant, but it does seem to get worse in general the more machines are mounting these layers.


The only performance settings I've tried to tinker with is the MaxThreadsPerQueue from 20 to 64. This reg entry was not in the registry by default, I had to make it myself, so I'm not sure if that means anything. Also not sure if 64 is even a good number to change it to either, just shooting in the dark here, any help would be much appreciated!


Darin

Viewing all 3614 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>