Contents
-
HyperFlex Pre-Upgrade Validation Checks
- add vNICs
- Cautions and Guidelines
- Viewing HyperFlex Cluster Health
- Verifying If DRS Is Enabled
- Viewing ESX Agent Manager
- Verify Health of HyperFlex Cluster In Cisco UCS Manager
- Configuring vMotion Interfaces(Options Jumbo Frame 9000)
- Configure Lenient Mode
- vCenter
-
Upgrade FI-UCSM
- Required Order of Steps for Auto Install
- Creating an All Configuration Backup File
- Disabling Smart Call Home
- Verifying the Operability of a Fabric Interconnect
- Verifying the High Availability Status and Roles of a Cluster Configuration
- Configuring the Default Maintenance Policy
- Disabling the Management Interface
- Verifying the Status of an I/O Module
- Verifying the Status of a Server
- Verifying the Ethernet Data Path
- Verifying the Data Path for Fibre Channel End-Host Mode
- Downloading Firmware Images to the Fabric Interconnect from a Remote Location
- Displaying the Firmware Package Download Status
- Displaying All Available Packages on the Fabric Interconnect
- Checking the Available Space on a Fabric Interconnect
- Upgrade the Infrastructure Firmware with Auto Install
- Acknowledging the Reboot of the Primary Fabric Interconnect
- Viewing the Status of the FSM During An Infrastructure Firmware Upgrade
-
Registering a Storage Cluster with a New vCenter Cluster
- Moving the Storage Cluster from a Current vCenter Server to a New VCenter Server
-
Unregistering a Storage Cluster from a vCenter Cluster
- Step 1 Identify the HX cluster UUID.
- Step 2 To unregister the storage cluster extenstion: Login to the vCenter server MOB extension manager
- Step 3 Locate the HX storage cluster extensions with the cluster IDs. Scroll through the Properties > extensionList to locate the storage cluster extensions:
- Step 4 Unregister each storage cluster extension.
- Step 5 Restart the vSphere Client services.
- Registering first HX cluster
- Registering TWO HX cluster
- Troubleshooting
- Management cluster
- Confirm that upgrade is complete
HyperFlex Pre-Upgrade Validation Checks
add vNICs
add MAC pool
LAN-->Root-->Pool-->Sub Organizations-->MAC pools
add vNIC templates
LAN-->Policy-->root-->Sub Organizations-->vNICs templates
(option)LAN pin Group
LAN-->Lan cloud-->Ping Group
Cautions and Guidelines
Before you begin upgrade of a Cisco HyperFlex System, consider the following cautions, guidelines, and limitations. Optimizations in Capacity Tier—Backend access is optimized to significantly reduce the magnitude and frequency of high latency spikes. Important Upgrade Guidelines This upgrade is recommended for only those customers who have been identified having this problem. For hybrid clusters—The default upgrade process will not enable this optimization. Contact Cisco TAC to enable this performance enhancement during the upgrade process. Enabling this optimization will require a longer maintenance window. For All Flash clusters—The upgrade times will not be significantly affected and the default upgrade path will enable this performance enhancement. Upgrade VMware ESXi before starting the upgrade process. Important: If you have to upgrade from VMware ESXi version 5.5 U3, contact Cisco TAC for assistance. The Cisco HX Data Platform and Cisco UCS firmware bundles must be compatible. Refer UCS Hardware and Software Compatibility Matrix for more details. For a split upgrade, Cisco HX Data Platform must be updated first, before updating the Cisco UCS firmware to avoid no connection error message. During online upgrade, as one node is being upgraded (put into maintenance mode), the number of tolerated node failures is reduced based on the Data Replication Factor and Access Policy settings. All endpoints in a Cisco HyperFlex domain must be fully functional and all processes must be complete before you begin a firmware upgrade on those endpoints. For example, the firmware on a server that has not been discovered cannot be upgraded or downgraded. Each endpoint is a component in the Cisco HyperFlex domain that requires firmware to function. In a three node cluster, if you shut down one node or put into maintenance mode it makes the cluster unhealthy, but the cluster is still online. During the upgrade process, put the host in maintenance mode one at a time and move to the next host, after the cluster becomes healthy. * Note You cannot remove a node from 3 node cluster by doing stcli node remove operation. To replace a node on a 3 node cluster, please contact Cisco TAC for assistance with the node replacement procedure. Firefox browser is not supported due to an outdated version of flash that is bundled with the browser. Manual update of flash within Firefox is possible, but the recommendation is to use either Chrome or Internet Explorer with a modern version of flash.
Viewing HyperFlex Cluster Health
- Using GUI
From the vSphere Web Client Navigator, select vCenter Inventory Lists > Cisco HyperFlex Systems > Cisco HX Data Platform > cluster > Summary. View the cluster widget to verify if the HyperFlex cluster is healthy and online. From the vSphere Web Client Navigator, select vCenter Inventory Lists > Clusters > cluster > Summary. Verify if all HX Cluster nodes are connected to the vCenter and they are online.
- Using CLI
Log in to any controller VM in the storage cluster. Run the command
# stcli cluster storage-summary --detail
Sample response that indicates the HyperFlex storage cluster is online and healthy.
address: 192.168.100.82
name: HX-Cluster01
state: online
uptime: 0 days 12 hours 16 minutes 44 seconds
activeNodes: 5 of 5
compressionSavings: 78.1228617455
deduplicationSavings: 0.0
freeCapacity: 38.1T
healingInfo:
inProgress: False
resiliencyDetails:
current ensemble size:5
# of ssd failures before cluster shuts down:3
minimum cache copies remaining:3
minimum data copies available for some user data:3
minimum metadata copies available for cluster metadata:3
# of unavailable nodes:0
# of nodes failure tolerable for cluster to be available:2
health state reason:storage cluster is healthy.
# of node failures before cluster shuts down:3
# of node failures before cluster goes into readonly:3
# of hdd failures tolerable for cluster to be available:2
# of node failures before cluster goes to enospace warn trying to move the existing data:na
# of hdd failures before cluster shuts down:3
# of hdd failures before cluster goes into readonly:3
# of ssd failures before cluster goes into readonly:na
# of ssd failures tolerable for cluster to be available:2
resiliencyInfo:
messages:
Storage cluster is healthy.
state: healthy
hddFailuresTolerable: 2
nodeFailuresTolerable: 1
ssdFailuresTolerable: 2
spaceStatus: normal
totalCapacity: 38.5T
totalSavings: 78.1228617455
usedCapacity: 373.3G
clusterAccessPolicy: lenient
dataReplicationCompliance: compliant
dataReplicationFactor: 3
Verifying If DRS Is Enabled
- Step 1
From the vSphere Web Client Navigator, select vCenter Inventory Lists > Clusters > cluster > Summary. Verify that DRS is Enabled.
- Step 2 Click the vSphere DRS tab.
Check if Migration Automation Level is set to Fully Automated.
Viewing ESX Agent Manager
From the vSphere Web Client Navigator, select Administration > vCenter Server Extensions > vSphere ESX Agent Manager > Summary.
- Verify that vSphere services are running and ESX Agent Manager (EAM) health is normal.
Verify Health of HyperFlex Cluster In Cisco UCS Manager
Configuring vMotion Interfaces(Options Jumbo Frame 9000)
Configure Lenient Mode
Cluster access policy is set by default to lenient mode.
Step 1 SSH to any one of the controller VMs and login as root.
Step 2 #stcli cluster get-cluster-access-policy
Check if lenient mode is already configured.
Step 3 ~/#stcli cluster set-cluster-access-policy --name lenient
If set to strict, change to lenient.
If already set to lenient, no further action is required.
Step 4 stcli cluster info | grep -i policy
Confirm the change.
The following example checks how lenient mode is currently configured. If the lenient mode is set to strict, it sets the mode to lenient, and confirms the change made to lenient mode.
~/#stcli cluster get-cluster-access-policy
strict
~/#stcli cluster set-cluster-access-policy --name lenient
stcli cluster info | grep -i policy
vCenter
- Requirements
The nested vCenter method requires:
HX Data Platform Installer version 2.6(1a) or later. Prior HX Data Platform versions are not supported.
vCenter to be installed inside a VM.
Compute-only nodes may be added post installation only after the HX storage cluster is registered to a vCenter server.
Cluster expansion with additional HyperFlex nodes may be performed only after the HX storage cluster is registered to a vCenter server.
When installing vCenter, select the embedded Platform Services Controller option. An external Platform Services Controller is not supported.
Upgrade FI-UCSM
Required Order of Steps for Auto Install
If you want to upgrade all components in a Cisco UCS domain to the same package version, you must run the stages of Auto Install in the following order:
- Install Infrastructure Firmware Install Server Firmware
This order enables you to schedule the server firmware upgrades during a different maintenance window than the infrastructure firmware upgrade.
Creating an All Configuration Backup File
This procedure assumes that you do not have an existing backup operation for an All Configuration backup file.
- Step 1 UCS-A# scope system
Enters system mode.
- Step 2 UCS-A /system # create backup URL all-configuration enabled
ftp:// username@hostname / path
scp:// username@hostname / path
sftp:// username@hostname / path
tftp:// hostname : port-num / path- Step 3 UCS-A /system # commit-buffer
Commits the transaction.
Disabling Smart Call Home
BGY-NEW-HX-B /system # scope monitoring
BGY-NEW-HX-B /monitoring # scope callhome
BGY-NEW-HX-B /monitoring/callhome # disable
BGY-NEW-HX-B /monitoring/callhome #
BGY-NEW-HX-B /monitoring/callhome # commit-buffer
BGY-NEW-HX-B /monitoring/callhome # show detail
Callhome:
Admin State: Off
Throttling State: On
Contact Information:
Customer Contact Email:
From Email:
Reply To Email:
Phone Contact e.g., +1-011-408-555-1212:
Street Address:
Contract Id:
Customer Id:
Site Id:
Switch Priority: Debugging
SMTP Server Address:
SMTP Server Port: 25
Current Task:
Verifying the Operability of a Fabric Interconnect
BGY-NEW-HX-B# scope fabric-interconnect a
BGY-NEW-HX-B /fabric-interconnect # show
Fabric Interconnect:
ID OOB IP Addr OOB Gateway OOB Netmask OOB IPv6 Address OOB IPv6 Gateway Prefix Operability
---- --------------- --------------- --------------- ---------------- ---------------- ------ -----------
A 10.254.253.77 10.254.253.254 255.255.255.0 :: :: 64 Operable
BGY-NEW-HX-B /fabric-interconnect # exit
BGY-NEW-HX-B# scope fabric-interconnect b
BGY-NEW-HX-B /fabric-interconnect # show
Fabric Interconnect:
ID OOB IP Addr OOB Gateway OOB Netmask OOB IPv6 Address OOB IPv6 Gateway Prefix Operability
---- --------------- --------------- --------------- ---------------- ---------------- ------ -----------
B 10.254.253.78 10.254.253.254 255.255.255.0 :: :: 64 Operable
Verifying the High Availability Status and Roles of a Cluster Configuration
BGY-NEW-HX-B# show cluster state Cluster Id: 0xaded5a7ccf6611e7-0xb59400defb68c3a1 B: UP, PRIMARY A: UP, SUBORDINATE HA READY
Configuring the Default Maintenance Policy
BGY-NEW-HX-B# scope org /
BGY-NEW-HX-B /org # scope maint-policy default
BGY-NEW-HX-B /org/maint-policy # set reboot-policy user-ack
BGY-NEW-HX-B /org/maint-policy* # commit-buffer
BGY-NEW-HX-B /org/maint-policy # show
Maintenance Policy:
Name Schedule
---------- --------
default
BGY-NEW-HX-B /org/maint-policy # show detail
Maintenance Policy:
Name: default
Schedule:
Description:
Policy Owner: Local
Reboot Policy: User Ack
Config Trigger: None
Soft Shutdown Timer: 150 Secs
BGY-NEW-HX-B /org/maint-policy #
Disabling the Management Interface
BGY-NEW-HX-B# scope monitoring BGY-NEW-HX-B /monitoring # set mgmt-if-mon-policy admin-state enabled BGY-NEW-HX-B /monitoring # commit-buffer BGY-NEW-HX-B /monitoring #
Verifying the Status of an I/O Module
Not have
Verifying the Status of a Server
OK as bellow
BGY-NEW-HX-B# scope server 1
BGY-NEW-HX-B /server # show status
Server Slot Status Availability Overall Status Discovery
------- ------------------------- ------------ --------------------- ---------
1 N/A Available Unassociated Complete
BGY-NEW-HX-B /server # show status detail
Server 1:
Conn Path: A,B
Conn Status: A,B
Managing Instance: B
Availability: Available
Admin State: In Service
Overall Status: Unassociated
Oper Qualifier: N/A
Discovery: Complete
Current Task:
Check Point: Discovered
BGY-NEW-HX-B /server # show adapter status
Server 1:
Overall Status
--------------
N/AFailed discovered as bellow
BGY-NEW-HX-B# scope server 2
BGY-NEW-HX-B /server # show sta
stats status
BGY-NEW-HX-B /server # show status
Server Slot Status Availability Overall Status Discovery
------- ------------------------- ------------ --------------------- ---------
2 N/A Unavailable Discovery In Progress
BGY-NEW-HX-B /server # show status detail
Server 2:
Conn Path: A
Conn Status: A
Managing Instance: A
Availability: Unavailable
Admin State: In Service
Overall Status: Restart
Oper Qualifier: N/A
Discovery: In Progress
Current Task: Preparing to check hardware configuration server sys/rack-unit-2(FSM-STAGE:sam:dme:ComputePhysicalHardreset:PreSanitize)
Check Point: Shallow Checkpoint
Verifying the Ethernet Data Path
BGY-NEW-HX-B# connect nxos b Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (c) 2002-2016, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under license. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or the GNU Lesser General Public License (LGPL) Version 2.1. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://www.opensource.org/licenses/lgpl-2.1.php BGY-NEW-HX-B(nxos)# BGY-NEW-HX-B(nxos)# show interface brief | grep -v down -------------------------------------------------------------------------------- Ethernet VLAN Type Mode Status Reason Speed Port Interface Ch # -------------------------------------------------------------------------------- Eth1/1 1 eth vntag up none 10G(D) -- Eth1/2 1 eth vntag up none 10G(D) -- Eth1/3 1 eth vntag up none 10G(D) -- Eth1/4 1 eth vntag up none 10G(D) -- Eth1/5 1 eth vntag up none 10G(D) -- Eth1/6 1 eth vntag up none 10G(D) -- Eth1/7 1 eth vntag up none 10G(D) -- Eth1/8 1 eth vntag up none 10G(D) -- -------------------------------------------------------------------------------- Port VRF Status IP Address Speed MTU -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Vethernet VLAN Type Mode Status Reason Speed -------------------------------------------------------------------------------- Veth32769 1 virt trunk up none auto Veth32770 1 virt trunk up none auto Veth32771 1 virt trunk up none auto Veth32772 1 virt trunk up none auto Veth32773 1 virt trunk up none auto Veth32774 1 virt trunk up none auto Veth32775 1 virt trunk up none auto Veth32776 1 virt trunk up none auto ------------------------------------------------------------------------------- Interface Secondary VLAN(Type) Status Reason ------------------------------------------------------------------------------- BGY-NEW-HX-B(nxos)# show interface brief | grep -v down | wc -l 33 BGY-NEW-HX-B(nxos)# show platform fwm info hw-stm HW STM Contents dleft loc - bucket_type:line:bucket_number misc - learn_type:ecc:valid:fcf cdce format - ig:ul:switch_id:subswitch_id:end_node_id:pbp_idx:local_id VLAN MAC Address Port loc misc cdce ------+----------------+--------------+--------+-------+-------------------- 1.4043 380e.4d16.94e7 Eth1/4 1:51:0 0:0:1:0 2.a.bc.0.0.6 (e:0) 1.4044 380e.4d16.bd45 vif56 1:190:0 0:0:1:0 6.a.bc.0.0.8 (e:1) 1.4044 380e.4d16.9505 vif52 1:815:0 0:0:1:0 6.a.bc.0.0.6 (e:1) vsan1 0efc.00ff.fc4d sup-eth1 1:1221:0 0:0:1:1 2.0.0.0.0.3 (e:0) 1.4043 3890.a5bd.3a31 Eth1/6 1:1421:0 0:0:1:0 2.a.bc.0.0.b (e:0) 1.4044 3890.a5bd.3a4f vif55 1:1529:0 0:0:1:0 6.a.bc.0.0.b (e:1) 1.4043 380e.4d16.8f21 Eth1/2 1:1532:0 0:0:1:0 2.a.bc.0.0.7 (e:0) 1.4044 500f.809f.c294 vif51 1:1597:0 0:0:1:0 6.a.bc.0.0.9 (e:1) 1.4043 380e.4d16.bd27 Eth1/3 1:1931:0 0:0:1:0 2.a.bc.0.0.8 (e:0) 1.4043 500f.809f.c036 Eth1/7 1:2624:0 0:0:1:0 2.a.bc.0.0.a (e:0) 1.4044 380e.4d16.8f3f vif69 1:2659:0 0:0:1:0 6.a.bc.0.0.7 (e:1) 1.4043 380e.4d16.9547 Eth1/1 1:3266:0 0:0:1:0 2.a.bc.0.0.5 (e:0) 1.4044 380e.4d16.9565 vif68 1:3268:0 0:0:1:0 6.a.bc.0.0.5 (e:1) 1.4044 500f.809f.c054 vif53 1:3445:0 0:0:1:0 6.a.bc.0.0.a (e:1) 1.4043 500f.809f.be56 Eth1/8 1:3690:0 0:0:1:0 2.a.bc.0.0.c (e:0) 1.4044 500f.809f.be74 vif54 1:3692:0 0:0:1:0 6.a.bc.0.0.c (e:1) 1.4043 500f.809f.c276 Eth1/5 1:3950:0 0:0:1:0 2.a.bc.0.0.9 (e:0) 14.1 0efc.00ff.fc4d sup-eth1 1:3951:0 0:0:1:0 2.a.bc.0.0.3 (e:0) in-0 00de.fb68.c400 convert-err 3:0:1016 0:0:1:0 2.a.bc.0.0.0 (e:0) 14.407 0efc.0068.c400 convert-err 3:0:1017 0:0:1:0 2.a.bc.0.0.0 (e:0) in-0 ffff.ffff.ffff midx 8189 3:0:1018 0:0:1:0 1.0.0.0.1f.fd (e:0) in-0 0100.5e00.0000 midx 8189 3:0:1019 0:0:1:0 1.0.0.0.1f.fd (e:0) in-0 0100.5e00.0000 midx 8191 3:0:1020 0:0:1:0 1.0.0.0.1f.ff (e:0) in-0 0100.0000.0000 midx 8190 3:0:1021 0:0:1:0 1.0.0.0.1f.fe (e:0) DA hit 36569 miss 50651 SA ig_miss 27085 eg_miss 371035 Total addresses: 24 (unreserved entries -104) BGY-NEW-HX-B(nxos)# show platform fwm info hw-stm | grep '1.' | wc -l 26
Verifying the Data Path for Fibre Channel End-Host Mode
BGY-NEW-HX-B(nxos)# show npv flogi-table No flogi sessions found.
Downloading Firmware Images to the Fabric Interconnect from a Remote Location
UCS-A# scope firmware UCS-A /firmware # download image scp://user1@192.168.10.10/images/ucs-k9-bundle.1.0.0.988.bin OR download image usbB:/username/ucs-k9-bundle-b-series.3.0.1a.B.bin UCS-A /firmware # show download-task
Displaying the Firmware Package Download Status
BGY-NEW-HX-B# scope firmware
BGY-NEW-HX-B /firmware # show download-task
Download task:
File Name Protocol Server Userid State
--------- -------- ------------------------------------- --------------- -----
ucs-catalog.3.2.2b.T.bin
Local local Downloaded
ucs-k9-bundle-b-series.3.2.2b.B.bin
Local local Downloaded
ucs-k9-bundle-c-series.3.2.2b.C.bin
Local local Downloaded
ucs-k9-bundle-infra.3.2.2b.A.bin
Local local Downloaded
BGY-NEW-HX-B /firmware # show image
Displaying All Available Packages on the Fabric Interconnect
BGY-NEW-HX-B /firmware # show package
Name Version
--------------------------------------------- -------
ucs-catalog.3.2.2b.T.bin 3.2(2b)T
ucs-k9-bundle-b-series.3.1.2e.B.bin 3.1(2e)B
ucs-k9-bundle-b-series.3.2.2b.B.bin 3.2(2b)B
ucs-k9-bundle-c-series.3.1.2e.C.bin 3.1(2e)C
ucs-k9-bundle-c-series.3.2.2b.C.bin 3.2(2b)C
ucs-k9-bundle-infra.3.1.2e.A.bin 3.1(2e)A
ucs-k9-bundle-infra.3.2.2b.A.bin 3.2(2b)A
BGY-NEW-HX-B /firmware # show package ucs-k9-bundle-c-series.3.2.2b.C.bin expand
Package ucs-k9-bundle-c-series.3.2.2b.C.bin:
Images:
ucs-3260.3.0.3e.bin
ucs-adaptor-pcie-ucsc-pcie-x710ta4.800031CA-1.810.8.bin
ucs-c-amd-video-7150x2.015.049.000.016.007518_113-8747CA-102.bin
ucs-c-emulex-pci-lpe31002.11.2.156.27.bin
ucs-c-fusion-io-pfio1000mp.8.9.9.118194.bin
ucs-c-fusion-io-pfio1000mps.8.9.9.118194.bin
ucs-c-fusion-io-pfio1205m.7.1.17.bin
Checking the Available Space on a Fabric Interconnect
BGY-NEW-HX-B /fabric-interconnect # show storage
Storage on local flash drive of fabric interconnect:
Partition Size (MBytes) Used Percentage
---------------- ---------------- ---------------
bootflash 16329 37
opt 3877 2
spare 5744 5
usbdrive Nothing Empty
var_sysmgr 2000 4
var_tmp 600 1
volatile 240 Empty
workspace 3852 1
Upgrade the Infrastructure Firmware with Auto Install
If your Cisco UCS domain does not use an NTP server to set the time, make sure that the clocks on the primary and secondary fabric interconnects are in sync. You can do this by configuring an NTP server in Cisco UCS Manager or by syncing the time manually.
BGY-NEW-HX-A# show clock Thu Nov 23 09:38:34 UTC 2017 BGY-NEW-HX-A# BGY-NEW-HX-B# show clock Thu Nov 23 09:38:32 UTC 2017 BGY-NEW-HX-B#
BGY-NEW-HX-B /firmware # show package Name Version --------------------------------------------- ------- ucs-catalog.3.2.2b.T.bin 3.2(2b)T ucs-k9-bundle-b-series.3.1.2e.B.bin 3.1(2e)B ucs-k9-bundle-b-series.3.2.2b.B.bin 3.2(2b)B ucs-k9-bundle-c-series.3.1.2e.C.bin 3.1(2e)C ucs-k9-bundle-c-series.3.2.2b.C.bin 3.2(2b)C ucs-k9-bundle-infra.3.1.2e.A.bin 3.1(2e)A ucs-k9-bundle-infra.3.2.2b.A.bin 3.2(2b)A BGY-NEW-HX-B /firmware # BGY-NEW-HX-B /firmware # scope firmware BGY-NEW-HX-B /firmware # scope auto-install BGY-NEW-HX-B /firmware/auto-install # install infra infra-vers 3.2(2b)A This operation upgrades firmware on UCS Infrastructure Components (UCS manager, Fabric Interconnects and IOMs). Here is the checklist of things that are recommended before starting Auto-Install (1) Review current critical/major faults (2) Initiate a configuration backup (3) Check if Management Interface Monitoring Policy is enabled (4) Check if there is a pending Fabric Interconnect Reboot activitiy (5) Ensure NTP is configured (6) Check if any hardware (fabric interconnects, io-modules, servers or adapters) is unsupported in the target release Do you want to proceed? (yes/no): Triggering Install-Infra with: Infrastructure Pack Version: 3.2(2b)A
Acknowledging the Reboot of the Primary Fabric Interconnect
Caution
To upgrade with minimal disruption, you must confirm the following:
Ensure that all the IOMs that are attached to the Fabric Interconnect are up before you acknowledge the reboot of the Fabric Interconnect. If all IOMs are not up, all the servers connected to the Fabric Interconnect will immediately be re-discovered and cause a major disruption.
Ensure that both of the Fabric Interconnects and the service profiles are configured for failover.
Verify that the data path has been successfully restored from the secondary Fabric Interconnect before you acknowledge the reboot of the primary Fabric Interconnect. For more information, see Verification that the Data Path is Ready.
After you upgrade the infrastructure firmware, Install Infrastructure Firmware automatically reboots the secondary fabric interconnect in a cluster configuration. However, you must acknowledge the reboot of the primary fabric interconnect. If you do not acknowledge the reboot, Install Infrastructure Firmware waits indefinitely for that acknowledgment rather than completing the upgrade. BGY-NEW-HX-B# show cluster state Cluster Id: 0xaded5a7ccf6611e7-0xb59400defb68c3a1 B: UP, PRIMARY, (Management services: INIT IN PROGRESS) A: UP, SUBORDINATE HA NOT READY Management services: initialization in progress on local Fabric Interconnect
BGY-NEW-HX-A# scope firmware BGY-NEW-HX-A /firmware # scope auto-install BGY-NEW-HX-A /firmware/auto-install # acknowledge primary fabric-interconnect reboot BGY-NEW-HX-A /firmware/auto-install* # commit-buffer BGY-NEW-HX-A /firmware/auto-install # show cluster state Cluster Id: 0xaded5a7ccf6611e7-0xb59400defb68c3a1 A: UP, SUBORDINATE B: UP, PRIMARY HA READY BGY-NEW-HX-A /firmware/auto-install #
Viewing the Status of the FSM During An Infrastructure Firmware Upgrade
BGY-NEW-HX-A# scope firmware
BGY-NEW-HX-A /firmware # scope auto-install
BGY-NEW-HX-A /firmware/auto-install # show fsm status expand
FSM Status:
Affected Object: sys/fw-system/fsm
Current FSM: Deploy
Status: In Progress
Completion Time:
Progress (%): 90
FSM Stage:
Order Stage Name Status Try
------ ---------------------------------------- ------------ ---
1 DeployWaitForDeploy Success 0
2 DeployResolveDistributableNames Skip 0
3 DeployResolveDistributable Skip 0
4 DeployResolveImages Skip 0
5 DeployDownloadImages Skip 0
6 DeployCopyAllImagesToPeer Skip 0
7 DeployInternalBackup Skip 0
8 DeployPollInternalBackup Success 0
9 DeployActivateUCSM Skip 0
10 DeployPollActivateOfUCSM Success 0
11 DeployUpdateIOM Skip 0
12 DeployPollUpdateOfIOM Skip 0
13 DeployActivateIOM Skip 0
14 DeployPollActivateOfIOM Skip 0
15 DeployFabEvacOnRemoteFI Skip 0
16 DeployPollFabEvacOnRemoteFI Skip 0
17 DeployActivateRemoteFI Success 0
18 DeployPollActivateOfRemoteFI In Progress 1
19 DeployFabEvacOffRemoteFI Pending 0
20 DeployPollFabEvacOffRemoteFI Pending 0
21 DeployWaitForUserAck Pending 0
22 DeployPollWaitForUserAck Pending 0
23 DeployFailOverToRemoteFI Pending 0
24 DeployPollFailOverToRemoteFI Pending 0
25 DeployActivateLocalFI Pending 0
26 DeployPollActivateOfLocalFI Pending 0
27 DeployActivateUCSMServicePack Pending 0
28 DeployPollActivateOfUCSMServicePack Pending 020 minutes ago
BGY-NEW-HX-B /firmware/auto-install # show fsm status expand
FSM Status:
Affected Object: sys/fw-system/fsm
Current FSM: Deploy
Status: In Progress
Completion Time:
Progress (%): 90
FSM Stage:
Order Stage Name Status Try
------ ---------------------------------------- ------------ ---
1 DeployWaitForDeploy Success 0
2 DeployResolveDistributableNames Skip 0
3 DeployResolveDistributable Skip 0
4 DeployResolveImages Skip 0
5 DeployDownloadImages Skip 0
6 DeployCopyAllImagesToPeer Skip 0
7 DeployInternalBackup Skip 0
8 DeployPollInternalBackup Success 0
9 DeployActivateUCSM Skip 0
10 DeployPollActivateOfUCSM Success 0
11 DeployUpdateIOM Skip 0
12 DeployPollUpdateOfIOM Skip 0
13 DeployActivateIOM Skip 0
14 DeployPollActivateOfIOM Skip 0
15 DeployFabEvacOnRemoteFI Skip 0
16 DeployPollFabEvacOnRemoteFI Skip 0
17 DeployActivateRemoteFI Success 0
18 DeployPollActivateOfRemoteFI In Progress 4
19 DeployFabEvacOffRemoteFI Pending 0
20 DeployPollFabEvacOffRemoteFI Pending 0
21 DeployWaitForUserAck Pending 0
22 DeployPollWaitForUserAck Pending 0
23 DeployFailOverToRemoteFI Pending 0
24 DeployPollFailOverToRemoteFI Pending 0
25 DeployActivateLocalFI Pending 0
26 DeployPollActivateOfLocalFI Pending 0
27 DeployActivateUCSMServicePack Pending 0
28 DeployPollActivateOfUCSMServicePack Pending 0
BGY-NEW-HX-B /firmware/auto-install #
BGY-NEW-HX-B /firmware/auto-install # show fsm status expand detail | grep Prog
Status: In Progress
Progress (%): 98
Status: In Progress
BGY-NEW-HX-A /firmware/auto-install # show fsm status
FSM 1:
Remote Result: Not Applicable
Remote Error Code: None
Remote Error Description:
Status: Nop
Previous Status: Deploy Success
Timestamp: 2017-11-23T10:21:32.159
Try: 0
Progress (%): 100
Current Task:
Flags: 0
BGY-NEW-HX-A /firmware/auto-install # show cluster state
Cluster Id: 0xaded5a7ccf6611e7-0xb59400defb68c3a1
A: UP, PRIMARY
B: UP, SUBORDINATE
HA READY
BGY-NEW-HX-A /firmware/auto-install #
Registering a Storage Cluster with a New vCenter Cluster
Moving the Storage Cluster from a Current vCenter Server to a New VCenter Server
Before You Begin
If your HX Cluster is running HX Data Platform version older than 1.8(1c), upgrade before attempting to reregister to a new vCenter.
Perform this task during a maintenance window.
Ensure the cluster is healthy and upgrade state is OK and Healthy. You can view the state using the stcli command from the controller VM command line.
# stcli cluster info
Check response for:
upgradeState: ok
healthState: healthy
Ensure vCenter must be up and running.
Snapshot schedules are not moved with the storage cluster when you move the storage cluster between vCenter clusters.
Step 1 From the current vCenter, delete the cluster.
- This is the vCenter cluster specified when the HX storage cluster was created.
Step 2 On the new vCenter, create a new cluster using the same cluster name.
Step 3 Add ESX hosts to new vCenter in the newly created cluster.
Unregistering a Storage Cluster from a vCenter Cluster
This step is optional and not required. It is recommended to leave the HX Data Platform Plug-in registration alone in the old vCenter.
Before You Begin
Download the vSphere ESX Agent Manager SDK, if you have not already done so.
If multiple HX clusters are registered to the same vCenter, do not attempt this procedure until all HX clusters have been fully migrated to a different vCenter. Running this procedure is disruptive to any existing HX clusters registered to the vCenter.
Remove the datacenter from your vSphere cluster.
Step 1 Identify the HX cluster UUID.
# stcli cluster info | grep vCenterClusterId: vCenterClusterId: domain-c73
Step 2 To unregister the storage cluster extenstion: Login to the vCenter server MOB extension manager
First unregister the HyperFlex cluster.
In a browser, enter the path and command.
https://vcenter_server/mob/?moid=ExtensionManager
vcenter_server is the IP address of the vCenter where the storage cluster is currently registered.
Enter administrator login credentials.
Step 3 Locate the HX storage cluster extensions with the cluster IDs. Scroll through the Properties > extensionList to locate the storage cluster extensions:
com.springpath.sysmgmt.cluster_domain_id and com.springpath.sysmgmt.uuid.cluster_domain_id.
Step 4 Unregister each storage cluster extension.
From the Methods table click UnregisterExtension.
In the UnregisterExtension popup, enter an extension key value, com.springpath.sysgmt.cluster_domain_id.
For example: com.springpath.sysgmt.domain-26
Click Invoke Method.
Step 5 Restart the vSphere Client services.
service vsphere-client restart
Registering first HX cluster
stcli cluster reregister --vcenter-datacenter BGY-NEW-HX-DC --vcenter-cluster BGY-NEW-HX --vcenter-url 10.254.250.212 --vcenter-user administrator@vsphere.local
Registering TWO HX cluster
Login in Control VM. Check cluster status.
stcli cluster info | grep -i health
stcli cluster reregister --vcenter-datacenter BGY-NEW-HX-DC02 --vcenter-cluster BGY-NEW-HX02 --vcenter-url 10.254.250.212 --vcenter-user administrator@vsphere.local
Restart the vSphere Client services.
service vsphere-client restart
Troubleshooting
硬盘状态
stcli disk list -ip x.x.x.x stcli node list | grep -i -n10 blacklisted
Login exception
Answer your questions, kindly please refer to as following:
1, Explaining Workaround steps:
1. Verify files are populated in /var/lib/tomcat7/webapps/ <<<<<<<<<<<< Verify files are populated in /var/lib/tomcat7/webapps/
2. Run ' echo manual > /etc/init/ureadahead.override ' <<<<<<<<<<<<< add “manual” for /etc/init/ureadahead.override
3. Run ' echo manual > /etc/init/ureadahead-other.override ' <<<<<<<<<<<<< add “manual” for /etc/init/ureadahead.override
4. Run ' mount | grep lib ' If only /var/old-lib is mounted, run steps 5 through 7 <<<<<<<<<<<<< just for check mount files
5. Run ' python /usr/share/springpath/storfs-misc/relinquish_node.py ' <<<<<<<<<<<<<<<just for check zk service via python script
6. Run ' reboot ' <<<<<<<<<<<just reboot SCVM
7. Wait for cluster to become healthy. <<<<<<<<<<<<< just for keep cluster healthy
8. Run steps 1 through 7 on the next controller. <<<<<<<<<<< for every SCVM
2, Absolutely doesn’t change the version
3, There is no need to roll back, because we have solved more the 200 same cases.
4, Same question with item 1
5, Yes, there is another way to sloved this issue, that is upgrade FW to 3.0(1C), but this is big impact for customer’s production, I do not suggest do this action.
Because Tomcat mount files issue lead this problem , so I think mount files for Tomcat is safe for Hyperflex cluster healthy, Hyperflex node healthy, even customer’s productions.
Old Firmware Running
FlexFlash Controller 1 on server 8 is unhealthy. Reason: Status: FFCH_ERROR_OLD_FIRMWARE_RUNNING
action: Inventory --> decomission the comission will be OK.
Management cluster
shutdown storage cluster
- 首先确认所有VM业务正常关闭!
- 群集状态处于health状态!
- 关闭storage cluster之后, 把ESXi主机手动一台一台的进入HX维护模式。
stcli cluster shutdown
Step 1 To shutdown the HX storage cluster, perform the following two steps.
Step 2 Gracefully shutdown all workload VMs on all the HX datastores.
Alternatively, use vMotion to migrate the workload VMs to another cluster.
Note
Do not shutdown or move the storage controller VMs (stCtlVMs).
Step 3 Gracefully shutdown the HX storage cluster.
From any controller VM command line, run the command and wait for the shell prompt to return.
# stcli cluster shutdown
Run the cluster information command. Confirm the storage cluster is offline.
# stcli cluster info
In the command response text, check the cluster subsection and verify the healthstate is offline.
This HX cluster shutdown procedure does not shutdown the ESXi hosts.
If the maintenance or upgrade task does not require the physical components be powered off, exit these steps and proceed to What to do next:
Step 4 To power off the HX storage cluster, complete Step 2 and Step 3, then complete the rest of the following steps.
Step 5 On each storage cluster ESX host, shutdown the controller VM (stCtlVM).
Choose a method:
Using vCenter VM Power Off
From vCenter client, locate the controller VM on each ESX host.
Right-click the controllerVM and select Power > Power Off.
This method performs a graceful guest VM shutdown.
Using vCenter ESX Agent Manager
From vCenter client, open the ESX Agent Manager console.
Locate the controller VM on each ESX host, and select Power > Power Off.
This method performs a graceful shutdown of agent VMs. The controller VM is an agent VM.
Using vCenter ESX Maintenance Mode
From vCenter client, locate each ESX host.
Right-click the ESX host and select Maintenance Mode > Enter Maintenance Mode.
This method performs a hard shutdown on every VM in the ESX host, including the controller VM.
Step 6 Shutdown each storage cluster ESX host.
From the vCenter client, locate the host.
Right-click the host and select Power > Shut Down.
Step 7 Power off the FIs, if this is needed for your maintenance task.
Cisco UCS FIs are designed for continuous operation. In a production environment, there is no need to shut down or reboot Fabric Interconnects. Therefore, there is no power button on UCS Fabric Interconnects.
To power off Cisco UCS Fabric Interconnect, pull the power cable manually. Alternatively, if you have the FI power cables connected to a smart PDUs, use the provided remote control to turn off the power from the electrical outlet.
Verify all the storage cluster servers on the FI do not have a green power LED.
Power off the secondary FI.
Power off the primary FI.
The HX storage cluster is now safely powered off.
Power On and Start Up the HX Storage Cluster
Complete the steps in Shut Down and Power Off the HX Storage Cluster.
- Connect all the ESX hosts to the FIs then power on ESXi hosts.
- Power on all the controller VMs (stCtlVM).
Verify the storage cluster is ready to be restarted. --> stcli about
Restart the storage cluster. --> stcli cluster start ; stcli cluster info | egrep -B 5 -A5 -i 'health'
- Through vCenter, verify that ESX remounted the datastores.
Once the cluster is available, the datastores are automatically mounted and available. If ESX does not recognize the datastores, from the ESX command line, run the command. # esxcfg-nas -r
Confirm that upgrade is complete
Step 1 Log in to Cisco UCS Manager to ensure that the HX nodes have no pending server activities. From Server > Pending Activities tab check for all server activities. Step 2 Confirm that the HX nodes match the expected firmware version. In Cisco UCS Manager, from Equipment > Firmware Management > Installed Firmware tab, verify for the correct firmware version. Step 3 Log in to any controller VM through SSH. # ssh root@controller_vm_ip Step 4 Confirm the HyperFlex Data Platform version. # stcli cluster version Cluster version: 2.5(1c) Node HX02 version: 2.5(1c) Node HX05 version: 2.5(1c) Node HX01 version: 2.5(1c) Node HX03 version: 2.5(1c) Node HX04 version: 2.5(1c) Step 5 Verify that the HX storage cluster is online and healthy. # stcli cluster info|grep -i health Sample output: healthstate : healthy state: healthy storage cluster is healthy Step 6 Verify that the datastores are up and are mounted properly on the ESXi host. From the HX controller VMs: # stcli datastore list From the ESXi host: # esxcfg-nas -l Step 7 For each browser interface you use, empty the cache and reload the browser page to refresh the HX Connect content.
