Use the NEXUS Dashboard Free Trial to Proactively Monitor Your ACI Fabric
In this white paper
- Virtual NEXUS Dashboard free trial
- Day 0, 1, 2 operations overview: Technology, people and processes
- Introducing the NEXUS Dashboard
- NEXUS Dashboard deep dive
- NEXUS Dashboard installation recommendations
- Installing the NEXUS Dashboard physical cluster
- Virtual NEXUS Dashboard installation discussion
- NEXUS Dashboard virtual cluster installation
- Configuring the NEXUS Dashboard cluster
- ACI setup for NEXUS Dashboard and NEXUS Insights integration
- NEXUS Dashboard overview and site onboarding
- Download
The NEXUS Dashboard has a virtual version for small fabrics or lab and demo capabilities and a cloud version for public cloud visibility. This guide shows the user how to build a Virtual NEXUS Dashboard (physical instructions also included), so the user can create a 90 day POC and verify fabric performance, troubleshoot issues, and validate the usefulness of the NEXUS Dashboard and day 2 operations suite.
In the Virtual NEXUS Dashboard (vND), we can now quickly build a six node cluster and utilize the free 90 day trial period. This free trial can verify the integrity of an ACI fabric's policy for best practices and changes and troubleshoot any issues that may occur using the assurance engine. Using the NEXUS Insights tools, we can verify the integrity of the fabric from PSIRTS, bugs, EOS software, hardware, and flow telemetry, showing dropped packets endpoint issues and flow-based analytics. The Cisco Nexus Dashboard uses a correlated data lake to provide real-time insights and automation services to operate multi-cloud data center networks spanning on-premises, virtual edge, and cloud sites. It provides a single unified pane of glass into proactive operations with continuous Assurance and actionable insights across the data center. The Nexus Dashboard incorporates Nexus Insights, NEXUS Dashboard Orchestrator, and third-party telemetry to provide seamless access to network controllers, third-party tools, and cloud-based services.
The vision for NEXUS Dashboard is to provide a unified platform and a correlated data lake for consumption, containerized applications to reside and consume this data lake, and integration with 3rd party applications. Eventually, this leads to a fully observable network, with auto-suggested solutions and eventually proper autonomous operations.
Virtual NEXUS Dashboard free trial
Day 0, 1, 2 operations overview: Technology, people and processes
- For those not familiar with the new IT model, a brief refresher follows below in the next few slides. Phase1 is the same; we still must Design and Scale according to business agility requirements. The operations model is now Day 0 or Service Delivery, including allocating rack space power and cooling, cabling, bringing up the compute storage network install patches upgrades, and many other tasks.
There are many complexities in building an app, and now, with the pandemic, we see more and more customers going to the cloud and working from home, which adds even more complexity to connect and manage our applications.
Phase 0 of the life of the application is the tasks below. These tasks may include connectivity into the public cloud and WAN connectivity to a Neutral Carrier Facility (NCF) like Equinix, which offers very low latency (2ms) and high-speed connection to the public clouds. The new design now offers the home user direct connectivity and a better user experience, but we must also have a way for all the lifecycle phases to be automated and monitored
Using ACI for Day 1 operations is simplified, and we can turn a design utilizing the policy model and tenancy with many automation tools and self-service catalogs. With this feature of ACI as you can deploy a consistent policy and apply security, and now it was deployed correctly to the fabric
Automation speeds up day one operations as we can now automate this process.
The next phase is day two operations, which is the support of the Application Lifecycle.
Visibility becomes difficult as scale increases.
Over the years, Cisco has developed various ACI tools (now for DCNM and NX-OS and eventually public cloud visibility) known as the day two operations suite. The tools were Network Assurance Engine(NAE) and NEXUS Insights (NI is now the combination of Network Insights Resources(NIR) and Network Insights Advisor(NIA). Cisco has embarked on combining NAE and NI into a single pane of glass view and sharable data lakes for all apps allowing for correlating application performance and fabric events.
IT Transformation is a focus with many IT departments, customer management, support staff, and end-users. The increasingly distributed nature of the applications for desired business outcomes requires tools that transform how IT operates. IT organizations become measured by the speed, simplicity, and security to support their business objectives.
Our daily lives have been transformed by events, forcing us to work, shop, learn anywhere, at any time. The ability to adapt while maintaining business resiliency and agility depends on the network's connective fabric using cloud and edge data center locations and public/private cloud operational models
Managing this connective fabric that allows consumers of applications wherever they reside is a multi-disciplinary effort among NetOps, SecOps, CloudOps, and DevOps teams. These teams are typically siloed, use disparate tools, and lack visibility to identify the root cause of an issue and validate fabric compliance
The Cisco Nexus Dashboard uses a correlated data lake to provide real-time insights and automation services to operate multi-cloud data center networks spanning on-premises, virtual edge, and cloud sites. It provides a single unified pane of glass into proactive operations with continuous assurance and actionable insights across the data center. The Nexus Dashboard incorporates Nexus Insights, NEXUS Dashboard Orchestrator, and third-party telemetry to provide seamless access to network controllers, third-party tools, and cloud-based services.
The vision for NEXUS Dashboard is to provide a unified platform and a correlated data lake for consumption, containerized applications to reside and consume this data lake, and integration with 3rd party applications. Eventually, this leads to a fully observable network, with auto-suggested solutions and eventually proper autonomous operations.
Introducing the NEXUS Dashboard
Previously, NAE, NIR, NIA ran separately on computing as an application in vSphere or on the APIC as an application. This architecture never allowed for sharing data between the apps or correlations with errors and telemetry views of packet loss. With the roadmap moving, we had applications and a shared data lake to draw from and correlate between application errors, changes to the policy, and deep flow telemetry, all visual as an epoch. The NEXUS Dashboard (ND) platform allows all the Cisco Day-2 apps and 3rd party applications for these needs. Secondly, the ND must be an expandable CPU and storage-intensive platform; today, the physical platform can scale with 3 master nodes and 4 worker nodes with the apps and their data residing on the ND cluster. As ND matures, more ND servers can join the cluster, and they can be separated regionally if within TTL requirements to distribute applications and provide DR strategies.
WWT highly recommends working with our architects to size and design the NEXUS dashboard install. The NEXUS Dashboard sizing guidance can be accessed here; NEXUS Dashboard sizing. Also, look at the Cisco Nexus Dashboard Deployment Guide, Release 2.1.x for the most up to date details
WWT disclaimer for using the Virtual NEXUS Dashboard on larger fabrics than recommended by Cisco for doing a POC
You can install the Virtual NEXUS Dashboard in 2 modes: a Demo mode using 3 App Node .ova files VMs that support up to 20 nodes and 200 flows. the Demo Mode should only be used for demo, lab, or small limited POC's for basic functionality. Cisco and WWT DO NOT recommend using this configuration for ANY PRODUCTION ACI or DCNM FABRICS.
The Production Mode is comprised of 3 Data Node .ova VMs as Docker masters and 3 APP Node .ova file VMs acting as Docker workers supporting 50 nodes and 2500 flows. Both of these modes are too small for most customers; Cisco and WWT advise not exceeding these recommendations for a successful trial. it may be possible to support slightly larger fabrics by tweaking the collection times and flows but we want customers to have a sucessful POC so we recommend staying in the guidance limits of 50 sites and 2500 flows.
If you have a larger fabric and want to do a POC using the physical NEXUS Dashboard Service Engines, contact your local WWT account team for ways to get hardware in such as a try and buy, leveraging Enterprise Agreements and other ways. WWT and Cisco both feel that this revolutionary tool is well worth doing a POC and will pay for itself during a outage by being able to pinpoint the issues in minutes not hours for MTTR. Also WWT has self service labs for learning NEXUS Dashboard, NEXUS Insights and NAE to see if the use cases presented will benifit the customer.
The virtual NEXUS Dashboard needs consideration of a few things for a successful POC.
First, the back-end disks to host the cluster must be fast enough for the vND to cluster. The recommendation by Cisco is to use fast SSD's; however, we have successfully used iSCSI or NFS to build the clusters on back-end storage.
Secondly, when configuring the Bug Scan and Assurance Analysis under the site, have the time between scans done as long as possible. One day is fine for Bug Scan; however, WWT recommends making the Assurance Analysis Scan every 60 minutes; try increasing if you run into issues. If things work fine, then you can try reducing this time for the Assurance Analysis Epochs.
The third is to limit flows for telemetry to a few VRFs and subnets and limit the number of switches you set up on the APIC Fabric policy node Control Policy. If you run into issues exceeding the flows to 2500, reduce the number of VRFs, subnets, and switches to which the Node Control Policy is applied. These all help bring the number of nodes (switches with the Node Control Policy) and reduce the number of flows closer to supported limits.
NEXUS Dashboard deep dive
The NEXUS Dashboard is a cluster of UCS Series servers very similar to the APIC but with much more CPU, Memory, and Storage and an addition of 25 and 40GB connectivity.
The NEXUS Dashboard offers RBAC controls and an admin view where sites are onboarded and configured and day 2 applications are added and configured. An operator view where the operator can go into the day 2 applications and monitor the network using the admin's sites and applications. It is a standard dashboard for day 2 ops that's easy to use, scale, and maintain.
The NEXUS Dashboard provides a common dashboard and offers visibility for all onboarded sites in a single pane of glass view
The NEXUS Dashboard has a virtual version for small fabrics or lab and demo capabilities, as well as a cloud version for public cloud visibility.
Another critical point on the NEXUS dashboard is that it is the hub for multidomain policy and telemetry using the Kafka bus. Not only will NEUXS Dashboard be able to provide a giant data lake for ACI, DCNM, or NX-OS policy and telemetry but using the Multidomain connector and the Kafka bus, it adds the policy and telemetry from DNAC, SD-WAN, and public cloud workloads. Eventually NEXUS Dashboard provides an end to end view of policy and telemetry between multiple domains and third-party applications
NEXUS Dashboard installation recommendations
To prepare for installation, you properly must size the NEXUS Dashboard cluster. As of this writing, a maximum of 3 master nodes and 4 worker nodes, and 2 standby nodes are supported. Plans are for the next release to support 11 nodes. Please see the NEXUS Dashboard sizing guide on how to size your ND cluster. You can find the sizing guide here:
As you can see you need the version of NEXUS Dashboard, the form factor whether its physical, virtual or cloud-based, the number of switches the ND services you will run.
NEXUS Dashboard Federation
In the latest version 2.1.1e, there is Federation between ND clusters and we can monitor and display 4 clusters connected together, up to 12 sites. These clusters can be physical and virtual (only ESXi based) mixed clusters in a ND Federation
There are two management networks needed to install the NEXUS Dashboard, one for OOB and one for the Data network. The data network is critical as this network must be reachable to all of the sites inband management. The Data Network must also be reachable from all nodes as this Data network is where the backend clustering takes place across.The ND must have routable addressing from the data network, and every ACI site must have inband connectivity preconfigured and routable to the ND before you start the site provisioning; otherwise onboarding the site fails to install to the NEXUS dashboard. Also, RTT considerations must be taken into account if the nodes are to be geographically dispersed
This chart shows the recommended placement of nodes for a stretched ND cluster
The last architectural piece to consider is how the data network and management network connect to your infrastructure. Remember that both OOB mgmt and the data network must be fully-routable to all sites you plan on onboarding to the ND; also, remember that inband management has to be preconfigured before adding a site to the ND, or the site installation does not register. The scenarios below show L3 connectivity the data network and management network can be the same infrastructure
The following figure shows the OOB for the APICs and L3 network for Data interfaces and ND management being separate. The recommended way to connect the physical or virtual NEXUS Dashboard cluster is via a layer 3 network the does NOT connect to your APIC or DCNM fabric if possible. There have been instances where customers had issues with the fabric the NEXUS Dashboard used for connectivity and had lost NEXUS Dashboard connectivity to other sites. The recommendation is to have the data and management interfaces of the NEXUS Dashboard connected to a routable L3 network and then connect the data interfaces to the in-band VRF. These recommendations hold for the physical as well as the virtual NEXUS Dashboard.
The following figure shows the Data interfaces connecting to a DCNM fabric via VLANs; alternatively, it can connect to an ACI fabric via EPG. We do not recommend this type of deployment as it uses the fabric for the data network and the telemetry, which needs to traverse the ACI or DCNM fabric. If there is an outage in the fabric, the leaf goes offline, upgrades, and anything that can affect connectivity to the data interfaces affects the clustering of the ND. Also, other sites' telemetry needs to traverse the fabric needlessly, and any issues, upgrades, etc. affect the telemetry from other sites. It is a supported method but not recommended
Installing the NEXUS Dashboard physical cluster
The physical NEXUS dashboard comes preconfigured with the latest ND software, and the setup is very straightforward. It is essential that sizing calculations are performed, and if the ND is installed in one site or spread across two for DR purposes. The last decision is L3 or EPG mode. Once you have that design down, ensure that you have your IP allocations for inband management for the ND appliance cluster and inband management you address for the sites you want to connect. Then configure in-band management on your sites to be monitored by the ND and preconfigure the APICs and all the site switches. Also, verify that this address space is reachable in your network.
The following steps are used during the initial setup and are presented here as a guide.
First, make sure your CIMC setup is correct on all three nodes. Your NIC mode must be Shared LOM Extended
Make sure that SSH and HTTPS is enabled Lab Access Methods and Definitions
Make sure LLDP is NOT enabled so we can pass LLDP if we do EPG/BD mode
Ensure that virtual vKVM, Virtual Media is enabled (notice I have a mapping for the nd-dk9.2.0.1d iso). As I upgraded from an EFT image, you would only need this if upgrading manually. Also, make sure Serial over LAN enabled so we can connect to the host via SSH session to the CIMC
Ensure Serial Over LAN is Enabled (SOL)
We run the first boot script on only one of the three nodes, then run through the GUI wizard to configure the other nodes. Remember to only perform this on one ND node; leave the others sitting at the first-boot screen
This ends the setup of the physical appliance for Cisco NEXUS Dashboard
Virtual NEXUS Dashboard installation discussion
The next piece to discuss is the ability in this release to create a small virtual node using virtual NEXUS Dashboard in various hypervisors (ESXi and KVM are the only options supported today). We can now install a vND and run NEXUS Insights using 3 of the data node size VM as Docker Masters and 3 of the APP-Node-sized VMs as docker workers to build the docker swarm. Using these smaller VM's allows us to quickly stand up a small POC and use the 90-day NEXUS Insights trial to gain full observability into our ACI or DCNM fabrics and determine if NEXUS Dashboard can help reduce MTTR, increase network stability and compliance, and allow a deep dive into policy and traffic flows in the fabric.
One thing that MUST be considered for this vND POC to work correctly is disk speed. We have successfully installed the vND on both iSCSI and NFS-backed storage. Also, deploying it on SSD or NVMe disks is recommended in the official Cisco vND installation guide. Due to IOPS speed, the vND does not form a cluster if installed on slower spinning disks on an ESXi host. You MAY get away with the vND clustering on 10k or 15k spinning disks if each VM has its host and storage. Finally, for the POC, it is sufficient to thin provision the disks even though Cisco recommends thick provisioning.
A good test is using a Linux box that uses the same storage for the vND VMs. If you don't have one available, it's straightforward to spin up a Ubuntu Linux box using the backing storage planned for the vND VM's. As an example, during our EFT testing;
On a Ubuntu VM on storage that ND clustering did not work:
[root@localhost ~]# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 13658 MB in 1.99 seconds = 6854.53 MB/sec
Timing buffered disk reads: 1452 MB in 3.08 seconds = 471.33 MB/sec -too slow
And on a Ubuntu VM on storage that the ND clustering worked fine:
[root@localhost ~]# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 15186 MB in 1.99 seconds = 7625.24 MB/sec
Timing buffered disk reads: 6000 MB in 3.00 seconds = 1999.78 MB/sec -worked fine
The major difference is in the write speed
This is a quick, easy way to see if you should proceed to build the VMs and go through the clustering process and will tell you if it will fail or work
The following chart shows the deployment models for running the various applications on a vND.
NEXUS Dashboard virtual cluster installation
The first step is to go to Cisco.com software downloads(note you must have a valid CCO ID). Navigate to the NEXUS Dashboard download, choose the latest release 2.1(1e) and choose the nd-dk9.2.2.1e-app.ova file and download. Repeat for the nd-dk9.2.2.1e-data.ova file and download
Next, go to the vCenter and use the following steps to deploy the 3 nd-dk9.2.2.1e-data.ova and 3 nd-dk9.2.2.1e-app.ova VM's. Ensure you have the proper disks (SSD recommended), at least 6 IP addresses for the management network, and 6 for the data network. These should be on separate subnets and have full reachability to the inband VRF and the APIC fabric management network that you want to monitor and test with the free 90-day trial POC. The diagram shows how the networking lays out; the data ports and management ports should be on separate L3 routable networks that connect to the inband and OOB networks of the fabric and the controllers.
Next, go to the vCenter where you will be deploying. Right-click the cluster where it is being installed, and choose Deploy OVF Template.
Next browse to the nd-dk9.2.2.1e-data.ova file you downloaded and select the file
Give the VM a name and folder if so desired. We used vND1-DATA to signify it's the first VM of the cluster and it is using the DATA ova file, not the APP ova file. I have also placed it in my ACI/NEXUS-DASHBOARD folder.
Choose the cluster or host where the 1st node resides. If you intend to use SSD drives local to that host, installing the remaining nodes of the cluster on separate hosts with SSD drives is recommended. Cisco's recommendation for a production Virtual NEXUS Dashboard is to use SSD or NVMe drives; for a free 90- day trial POC, fast iSCSI, NFS, or fiber channel storage should be sufficient.
Notice that the virtual disk has changed from thick provisioned to thin provisioned. Also, make sure there is enough disk space. Even though thin provisioned, the ova expects to see enough storage space as if it were thick provisioned. Click NEXT
Choose your management(mgmt0) and data(fabric0) port group networks. These should be on separate subnets, and they MUST have L3 reachability to the Fabric's Inband VRF under the Mgmt Tenant in ACI. Also, mgmt0 must have L3 connectivity to the APIC mgmt. Network. Click Next
Enter the rescue-user password (also be the admin password in the GUI for setup). The password becomes used for SSH or command line connection to the VM node.
Give the node an IP address and GW in the management subnet. Check the Cluster Leader box on only the first node. For the remaining nodes, you MUST uncheck. Click NEXT.
Review the configuration data for vND1-DATA, that you are using the correct .ova, the resources, IP address, and GW are correct, and that Cluster leader is true.
Repeat the process and create 2 more using a different mgmt. IP and naming conventions such as vND2-DATA, vND3-DATA. Make SURE that the CLUSTER LEADER checkbox is UNCHECKED
Make sure the .ova, IP address, GW, and Cluster Leader is false for node 2
On the third vND make sure the IP address, GW and Cluster Leader is false.
Repeat the process and create 3 more using a the APP.ova file to create the worker nodes different mgmt. IP and naming conventions such as vND4-APP, vND5-APP,vND6-APP. Make SURE that the CLUSTER LEADER checkbox is UNCHECKED. Make sure you are using the APP .ova, correct IP address, GW, and Cluster Leader is false for all nodes.
Next, go into the settings of all the DATA and APP VMs and, under VM Options, expand VMWare Tools; make sure that Synchronize guest time with the host is unchecked on a 6 VMs. Click OK.
Power on all 3 VM DATA nodes and 3 APP nodes, and go into the consoles to watch them boot. We will use the DATA nodes to form the initial cluster, then add the 3 APP nodes to increase the cluster size.
SSH or connect to the VM console and use the rescue-user and password configured during setup. Verify that the mgmt IP's can all reach each other.
That ends the setup of the virtual NEXUS Dashboard VM's. As long as there is reachability on the Data Network and the storage is fast enough from the previous tests, there should be no issues doing the clustering process.
Configuring the NEXUS Dashboard cluster
Once the initial physical ND set for cluster leader comes up, connect to the OOB address configured physical device you did the first boot setup on, or for virtual ND the VM IP that had the Cluster Leader checkbox checked. You see the setup screen login and the password you used in the first-time setup script and click Begin Setup. The setup is the same for physical or virtual NEXUS Dashboard once you get the cluster initialized.
Once logged in, the cluster details are needed to configure the other nodes. Give the cluster a Name, NTP, DNS server, and DNS Search domain. You can leave the App and Service network as they are used by the Kubernetes cluster internally and have no external connectivity. Click Next.
As we can see, the initial nodes OOB mgmt and GW display. Notice that the three nodes connected to mgmt0/1, the Data network, and Fabric 0/1 connect via the L2/3 Network for clustering and connecting sites we want to onboard to the vND. Click on the Pencil Icon to modify the Data Network IP address and GW for the master host. Remember, this address must be on a different subnet than Mgmt and must reach all the sites inband networking for flow telemetry to work. Also, the Data network must be on the same L2 subnet as the rest of the vND VM's or physical hosts as the clustering forms using this backend connectivity.
Add the Name, IP address, subnet mask and GW for the Data Network. Click Update.
We now see the data network applied under the first node. Next, we need to add the second node. Click the + to Add Node
Put the Mgmt IP address for the second node and the password. Notice how the user rescue-user prepopulates. Click on Validate. Once it connects to the second node and validates, it prepopulates the IP address and Serial number. Fill in the correct IP addresses for the Data network and its GW. Click Add.
Now we see the second node, now click the + Add Node to add the third node.
You have to put in the Mgmt IP for node 3 and password, then click validate. Once done, then add in the IP address, subnet and GW for the Data Network. Click Add
Once all three nodes become configured, verify that all IPs, GW's, subnet mask are good. If there are any errors, use the pencil icon to edit. Click Next.
It asks to confirm the entire configuration of NTP, DNS, Mgmt, and Data IP addressing and GW. If everything looks good, click Configure
The system now bootstraps the 3 Data nodes and creates the cluster using the configuration we just entered. Once the cluster comes up, we validate the cluster's health; then, we can add the 3 APP nodes.
Here we see the Kubernetes cluster being configured and bootstrapped
You can expand the nodes to see what each individual node is doing
This process takes at least 20 minutes, so it is an excellent time to get up and stretch and get some coffee. Once the cluster finishes setting up, the NEXUS Dashboard login shows up
Note that if after 30-40 minutes you do not get this screen, the clustering process hangs or stops because your VM's on the Data network cannot communicate due to the wrong IP address, subnet mask, port group issue, or the disks are too slow.
Try to SSH to the master node or use the VMWare vSphere console and verify you can ping the DATA IP addresses of the other two nodes. Also, issue the "ACS health" command if you can ping, but the cluster won't finish; you need faster disks or a faster SAN connection. If you cannot ping, verify that the port groups for the Fabric 0 vNICs are all in the same port group, and there are no network connectivity issues between hosts.
You should be successful using fast iSCSI or NFS storage and you should eventually see the Nexus Dashboard login page come up.
ACI setup for NEXUS Dashboard and NEXUS Insights integration
While the cluster is building, we now need to go into the ACI site we want to onboard to NEXUS Dashboard and make configuration changes to our APIC controller and fabric to reach the devices inband and gather flow telemetry. Failure to follow all of these steps causes the site to fail to onboard or gather flow telemetry. Please follow the steps in this Cisco white paper. The following installation notes help you follow along in your fabric to enable site onboarding and flow telemetry export.
ACI In-band Management setup
The ND uses the ACI in-band management network to receive telemetry from the Cisco APIC controllers and all the switches in the fabric. You must configure in-band following these steps
● Access Policies for Cisco APIC interfaces (access ports)
● MGMT tenant in-band bridge domain with a subnet
● MGMT tenant node management address (Cisco APIC, leaf switch, and spine switch)
● MGMT tenant node management EPG for in-band management
First, we need to configure the ports that connect the APIC to the fabric to support inband managing. Use the leaf switches and ports that the APIC s are connected to and use a VLAN not used in the fabric or a pool.
Under the Fabric/Access Policies/Quickstart Choose Configure an Interface, PC and vPC.
In the dialog, click the green plus + symbol.
- Select the two switches where the Cisco APIC ports connect from the drop-down list, in this case, 301 and 302.
- Enter a name in the Switch Profile Name field. Leaf301-302_Inband_Mgmt in this example.
- Set the Interface Type to Individual.
- In the Interfaces field, enter the Cisco APIC interfaces either as a comma-separated list or as a range. 1/1-3 in this example
- Enter a name in the Interface Selector Name field. Leaf301-302_Inband_Mgmt in this example
- Set the Interface Policy Group to Create One. You do not need to select an interface-level policy; the defaults are sufficient.
- In the Attached Device Type drop-down list, choose Bare Metal.
- Set the Domain and VLAN to Create One.
- Enter a name in the Domain Name field to name the physical domain associated with in-band management. Inband-Phys-Domain in this example
- Enter a VLAN ID used for in-band management in the fabric. 1001 in this example
Under the pre-configured inb bridge domain, Navigate to the mgmt Tenant and create a routable subnet that can reach the data subnet configured in the NEXUS Dashboard Data Network. Ensure the subnet is configured to advertise externally, and if the L3 out is not in the mgmt tenant, then shared between VRFs. Also, associate this subnet with the L3 out; it is in the Common tenant.
Also, verify that the inb bridge domain is associated with the inb VRF. If not, the telemetry does not work, so this is important to verify.
Next, go down to the Node Management EPGs and create a new In-Band EPG. In the example, this EPG named inb; uses VLAN-1001 (from the inband VLAN we created earlier). We use bridge domain inb and consume a contract from the Common tenant attached to our L3Out EPG.
Here we see the contract provided via out L3OUT and the L3OUT External EPG.
Next, under the Node Management Addresses/Static Node Management Addresses, assign IPs in the subnet of the inb bridge domain subnet. Use the inb subnet IP for the GW, make sure you choose the inb EPG, and check In-Band Addresses. Assign addresses to your APICs and leafs' and spines. Click Submit.
Make sure that NTP is set up and assigned to the fabric pod policy.
Check that the PTP policy is enabled and the settings are correct.
Under Fabric, Fabric Policies, use the default or create a new FNC policy. Make sure Enable DOM is checked, and Telemetry Priority is checked.
Next go to Switches, and under the Leaf switches, create a policy group (leaf's in this example), and in the node control policy, choose either default or one you created, making sure that policy has Dom and Telemetry Policy applied.
Repeat the above process for Spines.
Lastly, create a Leaf Switches Profile and associate it to the leafs' you want to monitor. Limiting the leafs' is an excellent way to only monitor a couple of leafs' during the POC for flow telemetry. The assurance engine monitors policy for the entire fabric, but we want to limit flows during the POC since we are using a small vND. This portion is critical; if you miss this step, flow telemetry in NEXUS Insights won't be available.
Repeat the Spine Switch Profile for the spines.
SSH to the vND nodes using username rescue-user and password you created during the installation to the vND nodes. Verify that you can ping from the vND to the inb subnet using the Data network. Ping the APIC inband and switch inband IP addresses and verify connectivity. If you cannot ping, then verify the inband management setup steps but do not proceed with the NEXUS Dashboard setup until the issue becomes resolved.
First, verify the interface of the data network interface using the "IP addr" command. Note that the bond0br interface has the Data network address configured; we use this interface to source the pings.
Next, use the ping -I bond0br x.x.x.x command, which sources the traffic from the data network to the inband of the APIC.
NEXUS Dashboard overview and site onboarding
Login with admin/XXXXXX which is the password used setting up.
On the Meet Cisco NEXUS Dashboard, click Lets Go!
Click Begin Setup.
On the Setup Overview page, click the X to close the setup so we can add the APP nodes to the cluster.
Closing the window brings you to the One View page, on the left navigation menu click on Admin Console.
Click on System Resources, then Nodes. Notice on the System Overview page we have 3 master nodes that make up the cluster. We shall add the 3 APP nodes as workers so we have 6 nodes.
Here we see the 3 master node Names, Data and Mgmt IP and Roles. Click on Add Node to add the APP worker nodes to the cluster.
We see the Add Node Wizard open up, just like when we configured the master nodes, fill in the correct info for node Name, Mgmt IP, Data IP and GW and Worker is selected as Type. Click Save. This will bootstrap the worker node and add to the cluster. Repeat for the other two nodes.
Once the 3 vND APP nodes bootstrap and join the cluster we can see the cluster if up and functional and we have 3 worker and 3 master nodes
Since this is a fresh NEXUS Dashboard install, we can see that no sites configuration is in place. For the sites to begin the assurance process and flow telemetry gathering, we must use the onboarding of the site to the NEXUS Dashboard. We can use onboard a site by clicking the Add Site button. Remember, before you do this, you MUST complete the earlier APIC setup tasks of configuring inband management, configuring flow telemetry, creating NTP and PTP configuration, and then verifying that the NEXUS Dashboard Data network can reach the ACI sites inband network and ping the GW, controllers, and switches.
Click on Sites then Add Site
From the Add Site setup page, we onboard an ACI, Cloud ACI, or DCNM site. We just need to give it a name, the OOB management address of the APIC, and the admin username and password. Also, remember the Inband EPG created earlier; you must add this, and it is case-sensitive. You can then drop a pin to indicate where in the world the site is located, then click Save.
Next, we want to add an APP Service; in our case, we shall use NEXUS Insights to see the assurance and telemetry of the fabric.
In our case, we need to go to the App Store tab to choose what App to install. Third-party apps install using the Installed Services button, which launches a window to upload a file for the App. These steps are out of the scope of the POC but good to know moving forward with the product. Click on the App Store tab.
When looking at the AppStore, we see 3 Apps for now. As more Cisco Applications and third-party integrations for importing and exporting become onboarded, this ever-expanding Correlated database multiplies quickly. So expect new applications to be added to the app store and correlate business analytics, application performance, network problems, and most importantly, user experience.
On the NEXUS Dashboard Insights, click the Install Button.
You see a prompt for a username to download the NEXUS Insights App. Click Next.
Password is next. Click Log In.
Agree on the license, then click on Agree and Download.
You can now see the NEXUS Dashboard downloading the installation files for NEXUS Insights
Once you see that NEXUS Dashboard Insights had been downloaded and installed, then go to the Installed Services tab.
From the Installed Service Tab, we can see the newly installed NEXUS Dashboard Insights. Click on Enable.
You are presented with a screen to choose the NEXUS Dashboard Insights sizing of the cluster for the APP. From here, choose the Virtual App size.
As NEXUS Dashboard Insights is starting, Logout under the Admin tab on the right.
Log back in using admin and password you created during the vND setup.
The What's New screen pops up, and click Get Started.
The new NEXUS Dashboard One View is displayed. You can see the sites you have onboarded (only use one for a vND POC). Using One View Federation in One View, you see anomalies and advisories in up to 4 NEXUS Dashboard clusters and up to 12 Sites (ACI, cAPIC, DCNM). Click on the Admin Console Navigation Tab on the left.
In One View Federation, one of the NEXUS Dashboard clusters becomes the Federation Manager, and from that NEXUS Dashboard GUI, the user can see all anomalies, network problems, and third-party integration from all sites under One View. The Federation Manager(FM) uses Site Managers(SM) to query all local data and replicate the FM. The APIGW is used to keep keys synced between federation members.
We see from the overview on this single NEXUS Dashboard site that it is not in a Federation. The Admin Console Overview shows the NEXUS Dashboard Overview of how the 6 cluster nodes are performing. We can also see one connected site we onboarded and the Intersight Proxy setup.
If we click on Sites on the left navigation menu, the Sites window opens up. We can see the Site name, Type, Connectivity(Up), Firmware Version, and Services Used. Notice we have not consumed any services, only onboarded the site onto NEXUS Dashboard. Click on Add Site button.
A wizard opens ups, and we can add sites just like we did with the first site. Close the window and do not create any more sites using the virtual-demo profile size vND.
Next, under the Systems Resources, and Nodes we see the 3 vND VMs, their IP's and roles. If we want to expand the cluster, we can click add node and go through the bootstrap process for the new node. Since we used the smaller APP node ova, we cannot expand the POC.
Looking at the Pods navigation tab, we can see the Kubernetes clusters striped across the vNDs.
Here we see the Daemon Sets of shat is running on the NEUXS Dashboard.
Under Deployments, we see NIR, appcenter, Kafka and the Kube-system.
The Stateful Sets show the processes running.
Finally, the Namespaces show the Service-Type and Namespace relationship.
Under Firmware Management, we can easily upload the image either from local or remote sources. A NEXUS Dashboard upgrade is very straightforward; each node is upgraded, added back to the cluster then the next node is upgraded without affecting cluster operations.
Collecting Tech Support file has been simplified and is easy to use; you can choose the NEXUS Dashboard System, the App Store or specific Applications.
In the Audit Logs, we can see the upgrade and install of NIR.
Backups are critical and should be done daily, as well as before doing any upgrade. The entire NEXUS dashboard site and settings are restored if a new cluster needs rebuilding with a full backup.
In the Cluster Configuration, we can see the cluster details. The Cluster Configuration page is also where you would configure the new Multi-Cluster Connectivity and ONE View Federation.
To configure the federation, click on Multi-Cluster Connectivity.
Add the new NEXUS Dashboard cluster to add to the Federation.
Next, we can see the vND resource utilization of the Cluster.
The next thing to discuss is the Intersight Device Connector. InterSight is typically used for UCS and HyperFlex, but we can use it for connection to the Cisco.com NI database for advisories and bugs; as NEXUS Dashboard expands, it adds more data sources to the correlated database and Intersight is one of those sources. The Intersight Connector polls the Cisco database for the latest PSIRTs, Bugs, TAC cases, then compares your fabrics configuration and software and hardware to see if your fabric is prone to bugs or PSIRTS that may have even been posted in the last day. Doing so enables us to validate that our fabric has the correct software version and configuration on a daily basis.
NEXUS Insights uses Intersight Connector to download updates to PSIRTS, BUGs, TAC cases based on configuration and code version. Using the Intersight Connector allows our fabric to download from Cisco and verify everyday problems with the fabric. Insight is continually checking the fabric for advisors against the daily database pulled from the Cisco Cloud. It then compares devices, the configuration and policies against PSIRTS, known bugs of your particular ACI code, and any TAC cases that use your code version. As an example, if you make an update to say your L3 Out OSPF and change BFD timers; if a TAC case or bug affects that; the next time it scans your fabric, it alerts you that the change you made can produce side effects whether they are a bug or TAC case. It then makes recommendations of a code release or workaround from the TAC case or bug.
To configure Intersight, click on Intersight on the left navigation menu. You can see a Device ID and a Claim Code. Click on the bottom where it says Open Intersight.
The hyperlink launches Intersight; if you already have an Intersite username and password, you can enter it, or use your CCO or create a new account.
Click on Target, then Click Claim Target.
Click on Cisco NEXUS Dashboard.
Enter the Device ID and Claim Code for previous screens.
Entering the Device ID and Claim Code starts the Intersight connection process to connect NEXUS Dashboard with Intersight and use the Cisco database to verify the state of your fabric. As you can see, I have two Intersight connections, one to my Dev NEXUS Dashboard and One to my Prod vND, that we are building together for this white paper.
Here we see that Intersight now connected to the ACI Fabric.
Go back to the NEXUS Dashboard Intersight menu, and we can see the connection status. Make sure it says ALLOW CONTROL.
The Next Tab on the left navigation menu under operations, we see the versions of all the App Infra Services.
Next, let us look at the login domain choices such as Radius, LDAP, and TACACS+. In our POC we will just use local.
We can see the certificates used, and you can change these to signed certificates for your organization.
Next, let us create a local user that is an operator.
We can now see the newly created account. In this latest version, Cisco has added a more granular RBAC so you can choose what a user can and cannot do.
Log out of NEXUS Dashboard.
Log back in using the operator ID and password you configured.
Notice how the operator view can only see the sites but cannot configure anything.
We now conclude part 1 of the NEXUS Dashboard setup.
Look forward to part 2 of this white paper to see how to add the site we onboarded to NEXUS Dashboard Insights and apply the Assurance Engine and the NEXUS Insights Engine to examine the fabric's health, anomalies, telemetry and monitor it in real-time. We will also examine case studies for how to use Assurance and Insight Engine to keep your ACI fabric running smooth and healthy and be warned when there are anomalies and advisories you would never know without NEXUS Dashboard watching your fabric 24x7x365. It's like having a CCIE watching your fabric all day and alerting you to problems!