Contents

HyperFlex Pre-Upgrade Validation Checks
Registering a Storage Cluster with a New vCenter Cluster

HyperFlex Pre-Upgrade Validation Checks

add vNICs

add MAC pool

LAN-->Root-->Pool-->Sub Organizations-->MAC pools

add vNIC templates

LAN-->Policy-->root-->Sub Organizations-->vNICs templates

(option)LAN pin Group

LAN-->Lan cloud-->Ping Group

https://www.cisco.com/c/en/us/td/docs/hyperconverged_systems/HyperFlex_HX_DataPlatformSoftware/HyperFlex_upgrade_guide/2-0/b_HyperFlexSystems_Upgrade_Guide_2_0/b_HyperFlexSystems_Upgrade_Guide_1_8_c_chapter_010.html

https://www.cisco.com/c/en/us/td/docs/hyperconverged_systems/HyperFlex_HX_DataPlatformSoftware/HyperFlex_preinstall_checklist/b_HX_Data_Platform_Preinstall_Checklist.html

Cautions and Guidelines

Before you begin upgrade of a Cisco HyperFlex System, consider the following cautions, guidelines, and limitations.

Optimizations in Capacity Tier—Backend access is optimized to significantly reduce the magnitude and frequency of high latency spikes.

Important Upgrade Guidelines
This upgrade is recommended for only those customers who have been identified having this problem.

For hybrid clusters—The default upgrade process will not enable this optimization. Contact Cisco TAC to enable this performance enhancement during the upgrade process. Enabling this optimization will require a longer maintenance window.

For All Flash clusters—The upgrade times will not be significantly affected and the default upgrade path will enable this performance enhancement.

Upgrade VMware ESXi before starting the upgrade process.
Important:
If you have to upgrade from VMware ESXi version 5.5 U3, contact Cisco TAC for assistance.

The Cisco HX Data Platform and Cisco UCS firmware bundles must be compatible. Refer UCS Hardware and Software Compatibility Matrix for more details.

For a split upgrade, Cisco HX Data Platform must be updated first, before updating the Cisco UCS firmware to avoid no connection error message.

During online upgrade, as one node is being upgraded (put into maintenance mode), the number of tolerated node failures is reduced based on the Data Replication Factor and Access Policy settings.

All endpoints in a Cisco HyperFlex domain must be fully functional and all processes must be complete before you begin a firmware upgrade on those endpoints. For example, the firmware on a server that has not been discovered cannot be upgraded or downgraded. Each endpoint is a component in the Cisco HyperFlex domain that requires firmware to function.

In a three node cluster, if you shut down one node or put into maintenance mode it makes the cluster unhealthy, but the cluster is still online. During the upgrade process, put the host in maintenance mode one at a time and move to the next host, after the cluster becomes healthy.

  * Note        
You cannot remove a node from 3 node cluster by doing stcli node remove operation. To replace a node on a 3 node cluster, please contact Cisco TAC for assistance with the node replacement procedure.
Firefox browser is not supported due to an outdated version of flash that is bundled with the browser. Manual update of flash within Firefox is possible, but the recommendation is to use either Chrome or Internet Explorer with a modern version of flash.

Viewing HyperFlex Cluster Health

Using GUI

From the vSphere Web Client Navigator, select vCenter Inventory Lists > Cisco HyperFlex Systems > Cisco HX Data Platform > cluster > Summary. View the cluster widget to verify if the HyperFlex cluster is healthy and online. From the vSphere Web Client Navigator, select vCenter Inventory Lists > Clusters > cluster > Summary. Verify if all HX Cluster nodes are connected to the vCenter and they are online.

Using CLI

# stcli cluster storage-summary --detail

Sample response that indicates the HyperFlex storage cluster is online and healthy.
address: 192.168.100.82
name: HX-Cluster01
state: online
uptime: 0 days 12 hours 16 minutes 44 seconds
activeNodes: 5 of 5
compressionSavings: 78.1228617455
deduplicationSavings: 0.0
freeCapacity: 38.1T
healingInfo:
    inProgress: False
resiliencyDetails:
        current ensemble size:5
        # of ssd failures before cluster shuts down:3
        minimum cache copies remaining:3
        minimum data copies available for some user data:3
        minimum metadata copies available for cluster metadata:3
        # of unavailable nodes:0
        # of nodes failure tolerable for cluster to be available:2
        health state reason:storage cluster is healthy.
        # of node failures before cluster shuts down:3
        # of node failures before cluster goes into readonly:3
        # of hdd failures tolerable for cluster to be available:2
        # of node failures before cluster goes to enospace warn trying to move the existing data:na
        # of hdd failures before cluster shuts down:3
        # of hdd failures before cluster goes into readonly:3
        # of ssd failures before cluster goes into readonly:na
        # of ssd failures tolerable for cluster to be available:2
resiliencyInfo:
    messages:
        Storage cluster is healthy.
    state: healthy
    hddFailuresTolerable: 2
    nodeFailuresTolerable: 1
    ssdFailuresTolerable: 2
spaceStatus: normal
totalCapacity: 38.5T
totalSavings: 78.1228617455
usedCapacity: 373.3G
clusterAccessPolicy: lenient
dataReplicationCompliance: compliant
dataReplicationFactor: 3

Verifying If DRS Is Enabled

Step 1

From the vSphere Web Client Navigator, select vCenter Inventory Lists > Clusters > cluster > Summary. Verify that DRS is Enabled.

Step 2 Click the vSphere DRS tab.

Check if Migration Automation Level is set to Fully Automated.

Viewing ESX Agent Manager

From the vSphere Web Client Navigator, select Administration > vCenter Server Extensions > vSphere ESX Agent Manager > Summary.
Verify that vSphere services are running and ESX Agent Manager (EAM) health is normal.

Verify Health of HyperFlex Cluster In Cisco UCS Manager

Configuring vMotion Interfaces(Options Jumbo Frame 9000)

Configure Lenient Mode

Cluster access policy is set by default to lenient mode.

Step 1  SSH to any one of the controller VMs and login as root.          
Step 2  #stcli cluster get-cluster-access-policy 
        
Check if lenient mode is already configured.
 
Step 3  ~/#stcli cluster set-cluster-access-policy --name lenient
        
If set to strict, change to lenient.
If already set to lenient, no further action is required.
 
Step 4  stcli cluster info | grep -i policy
        
Confirm the change.

The following example checks how lenient mode is currently configured. If the lenient mode is set to strict, it sets the mode to lenient, and confirms the change made to lenient mode.
~/#stcli cluster get-cluster-access-policy
strict
~/#stcli cluster set-cluster-access-policy --name lenient
stcli cluster info | grep -i policy

vCenter

Requirements

The nested vCenter method requires:

    HX Data Platform Installer version 2.6(1a) or later. Prior HX Data Platform versions are not supported.

    vCenter to be installed inside a VM.

    Compute-only nodes may be added post installation only after the HX storage cluster is registered to a vCenter server.

    Cluster expansion with additional HyperFlex nodes may be performed only after the HX storage cluster is registered to a vCenter server.

    When installing vCenter, select the embedded Platform Services Controller option. An external Platform Services Controller is not supported.

Upgrade FI-UCSM

Required Order of Steps for Auto Install

If you want to upgrade all components in a Cisco UCS domain to the same package version, you must run the stages of Auto Install in the following order:

Install Infrastructure Firmware Install Server Firmware

This order enables you to schedule the server firmware upgrades during a different maintenance window than the infrastructure firmware upgrade.

Creating an All Configuration Backup File

This procedure assumes that you do not have an existing backup operation for an All Configuration backup file.

Step 1 UCS-A# scope system

Enters system mode.

Step 2 UCS-A /system # create backup URL all-configuration enabled

    ftp:// username@hostname / path
    scp:// username@hostname / path
    sftp:// username@hostname / path
    tftp:// hostname : port-num / path

Step 3 UCS-A /system # commit-buffer

Commits the transaction.

Disabling Smart Call Home

BGY-NEW-HX-B /system # scope monitoring 
BGY-NEW-HX-B /monitoring #  scope callhome 
BGY-NEW-HX-B /monitoring/callhome # disable 
BGY-NEW-HX-B /monitoring/callhome # 
BGY-NEW-HX-B /monitoring/callhome # commit-buffer
BGY-NEW-HX-B /monitoring/callhome # show detail

Callhome:
    Admin State: Off
    Throttling State: On
    Contact Information:
    Customer Contact Email:
    From Email:
    Reply To Email:
    Phone Contact e.g., +1-011-408-555-1212:
    Street Address:
    Contract Id:
    Customer Id:
    Site Id:
    Switch Priority: Debugging
    SMTP Server Address:
    SMTP Server Port: 25
    Current Task:

Verifying the Operability of a Fabric Interconnect

BGY-NEW-HX-B# scope fabric-interconnect a
BGY-NEW-HX-B /fabric-interconnect # show

Fabric Interconnect:
    ID   OOB IP Addr     OOB Gateway     OOB Netmask     OOB IPv6 Address OOB IPv6 Gateway Prefix Operability
    ---- --------------- --------------- --------------- ---------------- ---------------- ------ -----------
    A    10.254.253.77   10.254.253.254  255.255.255.0   ::               ::               64     Operable
BGY-NEW-HX-B /fabric-interconnect # exit
BGY-NEW-HX-B# scope fabric-interconnect b
BGY-NEW-HX-B /fabric-interconnect # show

Fabric Interconnect:
    ID   OOB IP Addr     OOB Gateway     OOB Netmask     OOB IPv6 Address OOB IPv6 Gateway Prefix Operability
    ---- --------------- --------------- --------------- ---------------- ---------------- ------ -----------
    B    10.254.253.78   10.254.253.254  255.255.255.0   ::               ::               64     Operable

Verifying the High Availability Status and Roles of a Cluster Configuration

BGY-NEW-HX-B# show cluster state 
Cluster Id: 0xaded5a7ccf6611e7-0xb59400defb68c3a1

B: UP, PRIMARY
A: UP, SUBORDINATE

HA READY

Configuring the Default Maintenance Policy

BGY-NEW-HX-B# scope org /
BGY-NEW-HX-B /org # scope maint-policy default
BGY-NEW-HX-B /org/maint-policy # set reboot-policy user-ack
BGY-NEW-HX-B /org/maint-policy* # commit-buffer 
BGY-NEW-HX-B /org/maint-policy # show

Maintenance Policy:
    Name       Schedule
    ---------- --------
    default
BGY-NEW-HX-B /org/maint-policy # show detail 

Maintenance Policy:
    Name: default
    Schedule:
    Description:
    Policy Owner: Local
    Reboot Policy: User Ack
    Config Trigger: None
    Soft Shutdown Timer: 150 Secs
BGY-NEW-HX-B /org/maint-policy #

Disabling the Management Interface

BGY-NEW-HX-B# scope monitoring 
BGY-NEW-HX-B /monitoring # set mgmt-if-mon-policy admin-state enabled
BGY-NEW-HX-B /monitoring # commit-buffer
BGY-NEW-HX-B /monitoring #

Verifying the Status of an I/O Module

Not have

Verifying the Status of a Server

OK as bellow

BGY-NEW-HX-B# scope server 1
BGY-NEW-HX-B /server # show status 
Server  Slot Status                       Availability Overall Status   Discovery
------- -------------------------         ------------ --------------------- ---------
1       N/A                               Available    Unassociated          Complete
BGY-NEW-HX-B /server # show status detail 
Server 1:
    Conn Path: A,B
    Conn Status: A,B
    Managing Instance: B
    Availability: Available
    Admin State: In Service
    Overall Status: Unassociated
    Oper Qualifier: N/A
    Discovery: Complete
    Current Task:
    Check Point: Discovered

BGY-NEW-HX-B /server # show adapter status 
Server 1:
    Overall Status
    --------------
    N/A

Failed discovered as bellow

BGY-NEW-HX-B# scope server 2
BGY-NEW-HX-B /server # show sta
stats   status  

BGY-NEW-HX-B /server # show status 
Server  Slot Status                       Availability Overall Status   Discovery
------- -------------------------         ------------ --------------------- ---------
2       N/A                               Unavailable  Discovery             In Progress
BGY-NEW-HX-B /server # show status detail 
Server 2:
    Conn Path: A
    Conn Status: A
    Managing Instance: A
    Availability: Unavailable
    Admin State: In Service
    Overall Status: Restart
    Oper Qualifier: N/A
    Discovery: In Progress
    Current Task: Preparing to check hardware configuration server sys/rack-unit-2(FSM-STAGE:sam:dme:ComputePhysicalHardreset:PreSanitize)
    Check Point: Shallow Checkpoint

Verifying the Ethernet Data Path

BGY-NEW-HX-B# connect nxos b
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2016, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php
BGY-NEW-HX-B(nxos)# 
BGY-NEW-HX-B(nxos)# show interface brief | grep -v down 

--------------------------------------------------------------------------------
Ethernet      VLAN    Type Mode   Status  Reason                   Speed     Port
Interface                                                                    Ch #
--------------------------------------------------------------------------------
Eth1/1        1       eth  vntag  up      none                        10G(D) --
Eth1/2        1       eth  vntag  up      none                        10G(D) --
Eth1/3        1       eth  vntag  up      none                        10G(D) --
Eth1/4        1       eth  vntag  up      none                        10G(D) --
Eth1/5        1       eth  vntag  up      none                        10G(D) --
Eth1/6        1       eth  vntag  up      none                        10G(D) --
Eth1/7        1       eth  vntag  up      none                        10G(D) --
Eth1/8        1       eth  vntag  up      none                        10G(D) --

--------------------------------------------------------------------------------
Port   VRF          Status IP Address                              Speed    MTU
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Vethernet     VLAN   Type Mode   Status  Reason                    Speed
--------------------------------------------------------------------------------
Veth32769     1      virt trunk  up      none                     auto     
Veth32770     1      virt trunk  up      none                     auto     
Veth32771     1      virt trunk  up      none                     auto     
Veth32772     1      virt trunk  up      none                     auto     
Veth32773     1      virt trunk  up      none                     auto     
Veth32774     1      virt trunk  up      none                     auto     
Veth32775     1      virt trunk  up      none                     auto     
Veth32776     1      virt trunk  up      none                     auto     

-------------------------------------------------------------------------------
Interface Secondary VLAN(Type)                    Status Reason                 
-------------------------------------------------------------------------------
BGY-NEW-HX-B(nxos)# show interface brief | grep -v down | wc -l
33


BGY-NEW-HX-B(nxos)# show platform fwm info hw-stm
HW STM Contents
dleft loc   - bucket_type:line:bucket_number
misc        - learn_type:ecc:valid:fcf
cdce format - ig:ul:switch_id:subswitch_id:end_node_id:pbp_idx:local_id

VLAN   MAC Address      Port           loc      misc    cdce
------+----------------+--------------+--------+-------+--------------------
1.4043 380e.4d16.94e7   Eth1/4         1:51:0   0:0:1:0 2.a.bc.0.0.6 (e:0)
1.4044 380e.4d16.bd45   vif56          1:190:0  0:0:1:0 6.a.bc.0.0.8 (e:1)
1.4044 380e.4d16.9505   vif52          1:815:0  0:0:1:0 6.a.bc.0.0.6 (e:1)
vsan1  0efc.00ff.fc4d   sup-eth1       1:1221:0 0:0:1:1 2.0.0.0.0.3 (e:0)
1.4043 3890.a5bd.3a31   Eth1/6         1:1421:0 0:0:1:0 2.a.bc.0.0.b (e:0)
1.4044 3890.a5bd.3a4f   vif55          1:1529:0 0:0:1:0 6.a.bc.0.0.b (e:1)
1.4043 380e.4d16.8f21   Eth1/2         1:1532:0 0:0:1:0 2.a.bc.0.0.7 (e:0)
1.4044 500f.809f.c294   vif51          1:1597:0 0:0:1:0 6.a.bc.0.0.9 (e:1)
1.4043 380e.4d16.bd27   Eth1/3         1:1931:0 0:0:1:0 2.a.bc.0.0.8 (e:0)
1.4043 500f.809f.c036   Eth1/7         1:2624:0 0:0:1:0 2.a.bc.0.0.a (e:0)
1.4044 380e.4d16.8f3f   vif69          1:2659:0 0:0:1:0 6.a.bc.0.0.7 (e:1)
1.4043 380e.4d16.9547   Eth1/1         1:3266:0 0:0:1:0 2.a.bc.0.0.5 (e:0)
1.4044 380e.4d16.9565   vif68          1:3268:0 0:0:1:0 6.a.bc.0.0.5 (e:1)
1.4044 500f.809f.c054   vif53          1:3445:0 0:0:1:0 6.a.bc.0.0.a (e:1)
1.4043 500f.809f.be56   Eth1/8         1:3690:0 0:0:1:0 2.a.bc.0.0.c (e:0)
1.4044 500f.809f.be74   vif54          1:3692:0 0:0:1:0 6.a.bc.0.0.c (e:1)
1.4043 500f.809f.c276   Eth1/5         1:3950:0 0:0:1:0 2.a.bc.0.0.9 (e:0)
14.1   0efc.00ff.fc4d   sup-eth1       1:3951:0 0:0:1:0 2.a.bc.0.0.3 (e:0)
in-0   00de.fb68.c400   convert-err    3:0:1016 0:0:1:0 2.a.bc.0.0.0 (e:0)
14.407 0efc.0068.c400   convert-err    3:0:1017 0:0:1:0 2.a.bc.0.0.0 (e:0)
in-0   ffff.ffff.ffff   midx 8189      3:0:1018 0:0:1:0 1.0.0.0.1f.fd (e:0)
in-0   0100.5e00.0000   midx 8189      3:0:1019 0:0:1:0 1.0.0.0.1f.fd (e:0)
in-0   0100.5e00.0000   midx 8191      3:0:1020 0:0:1:0 1.0.0.0.1f.ff (e:0)
in-0   0100.0000.0000   midx 8190      3:0:1021 0:0:1:0 1.0.0.0.1f.fe (e:0)
DA hit 36569 miss 50651   SA ig_miss 27085 eg_miss 371035
Total addresses: 24 (unreserved entries -104)

BGY-NEW-HX-B(nxos)# show platform fwm info hw-stm | grep '1.' | wc -l
26

Verifying the Data Path for Fibre Channel End-Host Mode

BGY-NEW-HX-B(nxos)#  show npv flogi-table 
No flogi sessions found.

Downloading Firmware Images to the Fabric Interconnect from a Remote Location

UCS-A# scope firmware
UCS-A /firmware # download image scp://user1@192.168.10.10/images/ucs-k9-bundle.1.0.0.988.bin
OR
download image usbB:/username/ucs-k9-bundle-b-series.3.0.1a.B.bin
UCS-A /firmware # show download-task

Displaying the Firmware Package Download Status

BGY-NEW-HX-B# scope firmware 
BGY-NEW-HX-B /firmware # show download-task 

Download task:
    File Name Protocol Server                                Userid          State
    --------- -------- ------------------------------------- --------------- -----
    ucs-catalog.3.2.2b.T.bin
              Local    local                                                 Downloaded
    ucs-k9-bundle-b-series.3.2.2b.B.bin
              Local    local                                                 Downloaded
    ucs-k9-bundle-c-series.3.2.2b.C.bin
              Local    local                                                 Downloaded
    ucs-k9-bundle-infra.3.2.2b.A.bin
              Local    local                                                 Downloaded

BGY-NEW-HX-B /firmware # show image

Displaying All Available Packages on the Fabric Interconnect

BGY-NEW-HX-B /firmware # show package 
Name                                          Version
--------------------------------------------- -------
ucs-catalog.3.2.2b.T.bin                      3.2(2b)T
ucs-k9-bundle-b-series.3.1.2e.B.bin           3.1(2e)B
ucs-k9-bundle-b-series.3.2.2b.B.bin           3.2(2b)B
ucs-k9-bundle-c-series.3.1.2e.C.bin           3.1(2e)C
ucs-k9-bundle-c-series.3.2.2b.C.bin           3.2(2b)C
ucs-k9-bundle-infra.3.1.2e.A.bin              3.1(2e)A
ucs-k9-bundle-infra.3.2.2b.A.bin              3.2(2b)A

BGY-NEW-HX-B /firmware # show package ucs-k9-bundle-c-series.3.2.2b.C.bin expand
Package ucs-k9-bundle-c-series.3.2.2b.C.bin:
    Images:
        ucs-3260.3.0.3e.bin
        ucs-adaptor-pcie-ucsc-pcie-x710ta4.800031CA-1.810.8.bin
        ucs-c-amd-video-7150x2.015.049.000.016.007518_113-8747CA-102.bin
        ucs-c-emulex-pci-lpe31002.11.2.156.27.bin
        ucs-c-fusion-io-pfio1000mp.8.9.9.118194.bin
        ucs-c-fusion-io-pfio1000mps.8.9.9.118194.bin
        ucs-c-fusion-io-pfio1205m.7.1.17.bin

Checking the Available Space on a Fabric Interconnect

BGY-NEW-HX-B /fabric-interconnect # show storage 

Storage on local flash drive of fabric interconnect:
    Partition        Size (MBytes)    Used Percentage
    ---------------- ---------------- ---------------
    bootflash        16329            37
    opt              3877             2
    spare            5744             5
    usbdrive         Nothing          Empty
    var_sysmgr       2000             4
    var_tmp          600              1
    volatile         240              Empty
    workspace        3852             1

Upgrade the Infrastructure Firmware with Auto Install

If your Cisco UCS domain does not use an NTP server to set the time, make sure that the clocks on the primary and secondary fabric interconnects are in sync. You can do this by configuring an NTP server in Cisco UCS Manager or by syncing the time manually.

BGY-NEW-HX-A# show clock
Thu Nov 23 09:38:34 UTC 2017
BGY-NEW-HX-A# 

BGY-NEW-HX-B# show clock
Thu Nov 23 09:38:32 UTC 2017
BGY-NEW-HX-B#

BGY-NEW-HX-B /firmware # show package 
Name                                          Version
--------------------------------------------- -------
ucs-catalog.3.2.2b.T.bin                      3.2(2b)T
ucs-k9-bundle-b-series.3.1.2e.B.bin           3.1(2e)B
ucs-k9-bundle-b-series.3.2.2b.B.bin           3.2(2b)B
ucs-k9-bundle-c-series.3.1.2e.C.bin           3.1(2e)C
ucs-k9-bundle-c-series.3.2.2b.C.bin           3.2(2b)C
ucs-k9-bundle-infra.3.1.2e.A.bin              3.1(2e)A
ucs-k9-bundle-infra.3.2.2b.A.bin              3.2(2b)A
BGY-NEW-HX-B /firmware # 
BGY-NEW-HX-B /firmware # scope firmware 
BGY-NEW-HX-B /firmware # scope auto-install 
BGY-NEW-HX-B /firmware/auto-install # install infra infra-vers 3.2(2b)A
This operation upgrades firmware on UCS Infrastructure Components
(UCS manager, Fabric Interconnects and IOMs).
Here is the checklist of things that are recommended before starting Auto-Install
(1) Review current critical/major faults
(2) Initiate a configuration backup
(3) Check if Management Interface Monitoring Policy is enabled
(4) Check if there is a pending Fabric Interconnect Reboot activitiy
(5) Ensure NTP is configured
(6) Check if any hardware (fabric interconnects, io-modules, servers or adapters) is unsupported in the target release
Do you want to proceed? (yes/no):

Triggering Install-Infra with:
   Infrastructure Pack Version: 3.2(2b)A

Acknowledging the Reboot of the Primary Fabric Interconnect

Caution 

To upgrade with minimal disruption, you must confirm the following:

    Ensure that all the IOMs that are attached to the Fabric Interconnect are up before you acknowledge the reboot of the Fabric Interconnect. If all IOMs are not up, all the servers connected to the Fabric Interconnect will immediately be re-discovered and cause a major disruption.

    Ensure that both of the Fabric Interconnects and the service profiles are configured for failover.

    Verify that the data path has been successfully restored from the secondary Fabric Interconnect before you acknowledge the reboot of the primary Fabric Interconnect. For more information, see Verification that the Data Path is Ready.

After you upgrade the infrastructure firmware, Install Infrastructure Firmware automatically reboots the secondary fabric interconnect in a cluster configuration. However, you must acknowledge the reboot of the primary fabric interconnect. If you do not acknowledge the reboot, Install Infrastructure Firmware waits indefinitely for that acknowledgment rather than completing the upgrade.

BGY-NEW-HX-B# show cluster state
Cluster Id: 0xaded5a7ccf6611e7-0xb59400defb68c3a1

B: UP, PRIMARY, (Management services: INIT IN PROGRESS)
A: UP, SUBORDINATE

HA NOT READY
Management services: initialization in progress on local Fabric Interconnect

BGY-NEW-HX-A# scope firmware   
BGY-NEW-HX-A /firmware # scope auto-install 
BGY-NEW-HX-A /firmware/auto-install # acknowledge primary fabric-interconnect reboot 
BGY-NEW-HX-A /firmware/auto-install* # commit-buffer 
BGY-NEW-HX-A /firmware/auto-install # show cluster state 
Cluster Id: 0xaded5a7ccf6611e7-0xb59400defb68c3a1

A: UP, SUBORDINATE
B: UP, PRIMARY

HA READY
BGY-NEW-HX-A /firmware/auto-install #

Viewing the Status of the FSM During An Infrastructure Firmware Upgrade

BGY-NEW-HX-A# scope firmware   
BGY-NEW-HX-A /firmware # scope auto-install 
BGY-NEW-HX-A /firmware/auto-install # show fsm status expand


    FSM Status:

        Affected Object: sys/fw-system/fsm
        Current FSM: Deploy
        Status: In Progress
        Completion Time:
        Progress (%): 90

        FSM Stage:

        Order  Stage Name                               Status       Try
        ------ ---------------------------------------- ------------ ---
        1      DeployWaitForDeploy                      Success      0
        2      DeployResolveDistributableNames          Skip         0
        3      DeployResolveDistributable               Skip         0
        4      DeployResolveImages                      Skip         0
        5      DeployDownloadImages                     Skip         0
        6      DeployCopyAllImagesToPeer                Skip         0
        7      DeployInternalBackup                     Skip         0
        8      DeployPollInternalBackup                 Success      0
        9      DeployActivateUCSM                       Skip         0
        10     DeployPollActivateOfUCSM                 Success      0
        11     DeployUpdateIOM                          Skip         0
        12     DeployPollUpdateOfIOM                    Skip         0
        13     DeployActivateIOM                        Skip         0
        14     DeployPollActivateOfIOM                  Skip         0
        15     DeployFabEvacOnRemoteFI                  Skip         0
        16     DeployPollFabEvacOnRemoteFI              Skip         0
        17     DeployActivateRemoteFI                   Success      0
        18     DeployPollActivateOfRemoteFI             In Progress  1
        19     DeployFabEvacOffRemoteFI                 Pending      0
        20     DeployPollFabEvacOffRemoteFI             Pending      0
        21     DeployWaitForUserAck                     Pending      0
        22     DeployPollWaitForUserAck                 Pending      0
        23     DeployFailOverToRemoteFI                 Pending      0
        24     DeployPollFailOverToRemoteFI             Pending      0
        25     DeployActivateLocalFI                    Pending      0
        26     DeployPollActivateOfLocalFI              Pending      0
        27     DeployActivateUCSMServicePack            Pending      0
        28     DeployPollActivateOfUCSMServicePack      Pending      0

20 minutes ago

BGY-NEW-HX-B /firmware/auto-install # show fsm status expand


    FSM Status:

        Affected Object: sys/fw-system/fsm
        Current FSM: Deploy
        Status: In Progress
        Completion Time:
        Progress (%): 90

        FSM Stage:

        Order  Stage Name                               Status       Try
        ------ ---------------------------------------- ------------ ---
        1      DeployWaitForDeploy                      Success      0
        2      DeployResolveDistributableNames          Skip         0
        3      DeployResolveDistributable               Skip         0
        4      DeployResolveImages                      Skip         0
        5      DeployDownloadImages                     Skip         0
        6      DeployCopyAllImagesToPeer                Skip         0
        7      DeployInternalBackup                     Skip         0
        8      DeployPollInternalBackup                 Success      0
        9      DeployActivateUCSM                       Skip         0
        10     DeployPollActivateOfUCSM                 Success      0
        11     DeployUpdateIOM                          Skip         0
        12     DeployPollUpdateOfIOM                    Skip         0
        13     DeployActivateIOM                        Skip         0
        14     DeployPollActivateOfIOM                  Skip         0
        15     DeployFabEvacOnRemoteFI                  Skip         0
        16     DeployPollFabEvacOnRemoteFI              Skip         0
        17     DeployActivateRemoteFI                   Success      0
        18     DeployPollActivateOfRemoteFI             In Progress  4
        19     DeployFabEvacOffRemoteFI                 Pending      0
        20     DeployPollFabEvacOffRemoteFI             Pending      0
        21     DeployWaitForUserAck                     Pending      0
        22     DeployPollWaitForUserAck                 Pending      0
        23     DeployFailOverToRemoteFI                 Pending      0
        24     DeployPollFailOverToRemoteFI             Pending      0
        25     DeployActivateLocalFI                    Pending      0
        26     DeployPollActivateOfLocalFI              Pending      0
        27     DeployActivateUCSMServicePack            Pending      0
        28     DeployPollActivateOfUCSMServicePack      Pending      0
BGY-NEW-HX-B /firmware/auto-install # 

BGY-NEW-HX-B /firmware/auto-install # show fsm status expand detail | grep Prog
        Status: In Progress
        Progress (%): 98
            Status: In Progress

BGY-NEW-HX-A /firmware/auto-install # show fsm status       


    FSM 1:
        Remote Result: Not Applicable
        Remote Error Code: None
        Remote Error Description:
        Status: Nop
        Previous Status: Deploy Success
        Timestamp: 2017-11-23T10:21:32.159
        Try: 0
        Progress (%): 100
        Current Task:
        Flags: 0

BGY-NEW-HX-A /firmware/auto-install # show cluster state 
Cluster Id: 0xaded5a7ccf6611e7-0xb59400defb68c3a1

A: UP, PRIMARY
B: UP, SUBORDINATE

HA READY
BGY-NEW-HX-A /firmware/auto-install #

Registering a Storage Cluster with a New vCenter Cluster

Moving the Storage Cluster from a Current vCenter Server to a New VCenter Server

Before You Begin

    If your HX Cluster is running HX Data Platform version older than 1.8(1c), upgrade before attempting to reregister to a new vCenter.

    Perform this task during a maintenance window.

    Ensure the cluster is healthy and upgrade state is OK and Healthy. You can view the state using the stcli command from the controller VM command line.

    # stcli cluster info

    Check response for:

    upgradeState: ok
    healthState: healthy

    Ensure vCenter must be up and running.

    Snapshot schedules are not moved with the storage cluster when you move the storage cluster between vCenter clusters.

Step 1 From the current vCenter, delete the cluster.

This is the vCenter cluster specified when the HX storage cluster was created.

Step 2 On the new vCenter, create a new cluster using the same cluster name.

Step 3 Add ESX hosts to new vCenter in the newly created cluster.

Unregistering a Storage Cluster from a vCenter Cluster

This step is optional and not required. It is recommended to leave the HX Data Platform Plug-in registration alone in the old vCenter.

Before You Begin

    Download the vSphere ESX Agent Manager SDK, if you have not already done so.

    If multiple HX clusters are registered to the same vCenter, do not attempt this procedure until all HX clusters have been fully migrated to a different vCenter. Running this procedure is disruptive to any existing HX clusters registered to the vCenter.
    Remove the datacenter from your vSphere cluster.

Step 1 Identify the HX cluster UUID.

# stcli cluster info | grep vCenterClusterId:
vCenterClusterId: domain-c73

Step 2 To unregister the storage cluster extenstion: Login to the vCenter server MOB extension manager

First unregister the HyperFlex cluster.

    In a browser, enter the path and command.

    https://vcenter_server/mob/?moid=ExtensionManager

    vcenter_server is the IP address of the vCenter where the storage cluster is currently registered.
    Enter administrator login credentials.

Step 3 Locate the HX storage cluster extensions with the cluster IDs. Scroll through the Properties > extensionList to locate the storage cluster extensions:

com.springpath.sysmgmt.cluster_domain_id and com.springpath.sysmgmt.uuid.cluster_domain_id.

Step 4 Unregister each storage cluster extension.

    From the Methods table click UnregisterExtension.
    In the UnregisterExtension popup, enter an extension key value, com.springpath.sysgmt.cluster_domain_id.

    For example: com.springpath.sysgmt.domain-26
    Click Invoke Method.

Step 5 Restart the vSphere Client services.

service vsphere-client restart

Registering first HX cluster

https://www.cisco.com/c/en/us/td/docs/hyperconverged_systems/HyperFlex_HX_DataPlatformSoftware/AdminGuide/2_6/b_HyperFlexSystems_AdministrationGuide_2_6/b_HyperFlexSystems_AdministrationGuide_2_6_chapter_0101.html#id_35901

stcli cluster reregister --vcenter-datacenter BGY-NEW-HX-DC --vcenter-cluster BGY-NEW-HX --vcenter-url 10.254.250.212 --vcenter-user administrator@vsphere.local

Registering TWO HX cluster

stcli cluster info | grep -i health

stcli cluster reregister --vcenter-datacenter BGY-NEW-HX-DC02 --vcenter-cluster BGY-NEW-HX02 --vcenter-url 10.254.250.212 --vcenter-user administrator@vsphere.local

Restart the vSphere Client services.

service vsphere-client restart

Troubleshooting

硬盘状态

stcli disk list -ip x.x.x.x

stcli node list | grep -i -n10 blacklisted

Login exception

Answer your questions, kindly please refer to as following:

 

1, Explaining Workaround steps:

         1. Verify files are populated in /var/lib/tomcat7/webapps/    <<<<<<<<<<<< Verify files are populated in /var/lib/tomcat7/webapps/

2. Run   '  echo manual > /etc/init/ureadahead.override  '    <<<<<<<<<<<<< add “manual” for /etc/init/ureadahead.override 

3. Run '  echo manual > /etc/init/ureadahead-other.override  '  <<<<<<<<<<<<< add “manual” for /etc/init/ureadahead.override

4. Run '  mount | grep lib  '     If only /var/old-lib is mounted, run steps 5 through 7  <<<<<<<<<<<<< just for check mount files

5. Run '  python /usr/share/springpath/storfs-misc/relinquish_node.py  '   <<<<<<<<<<<<<<<just for check zk service via python script

6. Run '  reboot  '     <<<<<<<<<<<just reboot SCVM

7.  Wait for cluster to become healthy. <<<<<<<<<<<<< just for keep cluster healthy

8. Run steps 1 through 7 on the next controller. <<<<<<<<<<< for every SCVM

2, Absolutely doesn’t change the version

3, There is no need to roll back， because we have solved more the 200 same cases.

4, Same question with item 1

5, Yes, there is another way to sloved this issue, that is upgrade FW to 3.0(1C), but this is big impact for customer’s production, I do not suggest do this action.

    Because Tomcat mount files issue lead this problem , so I think mount files for Tomcat is safe for Hyperflex cluster healthy, Hyperflex node healthy, even customer’s productions.

Old Firmware Running

FlexFlash Controller 1 on server 8 is unhealthy. Reason: Status: FFCH_ERROR_OLD_FIRMWARE_RUNNING

action: Inventory --> decomission the comission will be OK.

Management cluster

shutdown storage cluster

首先确认所有VM业务正常关闭！
群集状态处于health状态!
关闭storage cluster之后, 把ESXi主机手动一台一台的进入HX维护模式。

stcli cluster shutdown

Step 1          To shutdown the HX storage cluster, perform the following two steps.
Step 2          Gracefully shutdown all workload VMs on all the HX datastores.

Alternatively, use vMotion to migrate the workload VMs to another cluster.
Note            

Do not shutdown or move the storage controller VMs (stCtlVMs).
Step 3          Gracefully shutdown the HX storage cluster.

    From any controller VM command line, run the command and wait for the shell prompt to return.

    # stcli cluster shutdown

    Run the cluster information command. Confirm the storage cluster is offline.

    # stcli cluster info

    In the command response text, check the cluster subsection and verify the healthstate is offline.

This HX cluster shutdown procedure does not shutdown the ESXi hosts.

If the maintenance or upgrade task does not require the physical components be powered off, exit these steps and proceed to What to do next:
Step 4          To power off the HX storage cluster, complete Step 2 and Step 3, then complete the rest of the following steps.
Step 5          On each storage cluster ESX host, shutdown the controller VM (stCtlVM).

Choose a method:

Using vCenter VM Power Off

    From vCenter client, locate the controller VM on each ESX host.
    Right-click the controllerVM and select Power > Power Off.

    This method performs a graceful guest VM shutdown.

Using vCenter ESX Agent Manager

    From vCenter client, open the ESX Agent Manager console.
    Locate the controller VM on each ESX host, and select Power > Power Off.

    This method performs a graceful shutdown of agent VMs. The controller VM is an agent VM.

Using vCenter ESX Maintenance Mode

    From vCenter client, locate each ESX host.
    Right-click the ESX host and select Maintenance Mode > Enter Maintenance Mode.

    This method performs a hard shutdown on every VM in the ESX host, including the controller VM.

Step 6          Shutdown each storage cluster ESX host.

    From the vCenter client, locate the host.
    Right-click the host and select Power > Shut Down.

Step 7          Power off the FIs, if this is needed for your maintenance task.

Cisco UCS FIs are designed for continuous operation. In a production environment, there is no need to shut down or reboot Fabric Interconnects. Therefore, there is no power button on UCS Fabric Interconnects.

To power off Cisco UCS Fabric Interconnect, pull the power cable manually. Alternatively, if you have the FI power cables connected to a smart PDUs, use the provided remote control to turn off the power from the electrical outlet.

    Verify all the storage cluster servers on the FI do not have a green power LED.
    Power off the secondary FI.
    Power off the primary FI.

The HX storage cluster is now safely powered off.

Power On and Start Up the HX Storage Cluster

Complete the steps in Shut Down and Power Off the HX Storage Cluster.

Connect all the ESX hosts to the FIs then power on ESXi hosts.
Power on all the controller VMs (stCtlVM).
Verify the storage cluster is ready to be restarted. --> stcli about
Restart the storage cluster. --> stcli cluster start ; stcli cluster info | egrep -B 5 -A5 -i 'health'
Through vCenter, verify that ESX remounted the datastores.

Once the cluster is available, the datastores are automatically mounted and available. If ESX does not recognize the datastores, from the ESX command line, run the command. # esxcfg-nas -r

Confirm that upgrade is complete

Step 1          Log in to Cisco UCS Manager to ensure that the HX nodes have no pending server activities.

From Server > Pending Activities tab check for all server activities.
Step 2          Confirm that the HX nodes match the expected firmware version.

In Cisco UCS Manager, from Equipment > Firmware Management > Installed Firmware tab, verify for the correct firmware version.
Step 3          Log in to any controller VM through SSH.

# ssh root@controller_vm_ip
Step 4          Confirm the HyperFlex Data Platform version.

# stcli cluster version

Cluster version: 2.5(1c)
Node HX02 version: 2.5(1c)
Node HX05 version: 2.5(1c)
Node HX01 version: 2.5(1c)
Node HX03 version: 2.5(1c)
Node HX04 version: 2.5(1c)

Step 5          Verify that the HX storage cluster is online and healthy.

# stcli cluster info|grep -i health

Sample output:
healthstate : healthy
state: healthy
storage cluster is healthy

Step 6          Verify that the datastores are up and are mounted properly on the ESXi host.

From the HX controller VMs:
# stcli datastore list

From the ESXi host:
# esxcfg-nas -l
Step 7          For each browser interface you use, empty the cache and reload the browser page to refresh the HX Connect content.

désert/Cisco/HyperFlex