Before Automation: A Reality Check on IT Fundamentals

Introduction

Many IT leaders have a good grasp on what they would like to achieve within IT to more effectively enable the business – infrastructure-as-code, GitOps workflows, self-healing systems and automated remediation. These aspirations are valid and necessary for competitive businesses. However, in my experience, most organizations still struggle with the foundations of IT capabilities. Basic, yet critical capabilities, such as network availability, asset management, observability, DNS, DHCP, and IPAM (DDI), as well as identity and access controls often challenge businesses from ultimately reaching their goals.

Just as a house requires a strong foundation before adding expensive appliances and smart home automation, IT must address fundamental operational capabilities before pursuing advanced initiatives, such as full-scale automation. Let's examine these core IT needs and their significance in achieving long-term success.

The Pyramid of IT Capabilities

Maslow's hierarchy of needs is a psychological framework that explains human motivation as a progression through different levels, starting with basic survival needs and advancing toward self-fulfillment. At the base are physiological needs such as food, water, and shelter – the essentials for survival. Once these are met, people pursue safety and security to obtain stability and protection from harm. Higher levels include social belonging, esteem, and ultimately self-actualization, where individuals ultimately reach their full potential. Progressing through these stages requires satisfying foundational needs before advancing to more complex aspirations. Like a mountain climber hiking the summit of Mount Everest to achieve their lifelong dream, success is impossible without a vital supply of oxygen to sustain them.

A pyramid of needs and it needs

AI-generated content may be incorrect. — Hierarchy of Needs Analogy to IT

In many ways, IT follows a similar structure. And by the way, try not to get derailed by assessing all the various capabilities within the IT triangle – they're just examples. The key here is that core infrastructure elements must be in place before an organization can achieve self-healing IT systems or automated provisioning. For example, if the core network fails, all business functions grind to a halt – users lose access to applications, backend systems cannot communicate, and cloud resources become unreachable. Additionally, if you have poor IT processes and standards, then expect poor automation. Just as human needs build upon one another, IT must establish a strong foundation before advancing to higher levels of capability.

The Core Capabilities that All of IT Relies Upon

In the human world, physiological needs such as air, water, and shelter must be met before anything else. In IT, there are several critical infrastructure pillars that serve as the foundation for everything else.

Network Availability - Just as air is fundamental to human existence, network availability is the cornerstone of all IT operations. Every component of IT, from infrastructure to applications and services to security and storage, depends on a well-designed and properly implemented network. To deliver performance and uninterrupted service, the network must be architected according to industry good practices. This includes eliminating single points of failure (SPOFs), having fault tolerance within individual systems, and implementing high-availability mechanisms across interconnected systems. Maintaining network availability requires various work efforts, but it's important to remember that no endeavor can be taken if the network itself is down.
Asset Management - The first step in effective IT management is knowing what assets you own. Without a comprehensive and up-to-date inventory of IT systems, maintaining operational efficiency and security becomes a daunting challenge. Imagine being responsible for delivering a critical service without even knowing that you own the system running it. The lack of visibility into IT assets not only introduces inefficiencies but also increases risks such as unexpected outages, compliance violations, and security vulnerabilities. Tools like NetBox and other CMDBs provide a centralized approach for tracking and managing IT infrastructure, but technology alone isn't enough. Organizations must also implement strong data governance program and data validation practices to ensure asset information remains accurate and up to date. Asset management is not just about knowing what you own – it's about having the right tools and processes in place to manage and protect those assets effectively.
Observability - Once you know what assets you own, you must also be aware of when they are functioning properly – or not. Observability is essential for maintaining a resilient and high-performing IT environment. IT teams must detect and respond to incidents before they cause disruptions, ensuring seamless operations and a positive user experience. Observability tools provide real-time insights by collecting, analyzing, and correlating logs, metrics, and traces across systems. Observability combined with an effective alert and notification process are critical to getting teams engaged quickly on incidents and ultimately reducing mean-time-to-awareness and mean-time-to-response. Furthermore, going beyond just monitoring, Observability offers deep visibility into system behavior, helping organizations understand why something is not working as opposed to just what is not working. Observability can greatly optimize overall IT availability, improve security, and reduce mean time to resolution (MTTR). Investing in a strong observability strategy and architecture ensures IT teams can anticipate problems, diagnose root causes faster, and maintain operational excellence.
DNS, DHCP, and IP address management (DDI) - In my experience, I have seen a DNS service outage bring a fortune-100 company to a complete stop with all IT services offline. Yet many organizations overlook its resilience and security risks. Without proper design and consideration, DNS disruptions can cause widespread outages. Furthermore, DHCP ensures client devices obtain network access, and poor DHCP management or IP address exhaustion can lead to connectivity failures and large-scale issues. A robust DDI strategy, including redundancy, security, and monitoring is essential to prevent outages and maintain business continuity. Solutions like Infoblox help organizations centralize and automate DDI management, enhancing visibility, security, and scalability. Infoblox's intelligent automation, threat protection, and DNS security capabilities ensure high availability while reducing operational complexity. However, DDI architecture must be regularly assessed for high availability, resiliency and security to preserve operational availability.
Identity and Access Management (IAM) - A solid identity and access management (IAM) program enables employees, customers, contractors, and devices to have appropriate access at the right time while preventing unauthorized entry. Without a strong IAM strategy, organizations expose themselves to threat actors seeking inappropriate access, posing a risk to the entire enterprise. Let's be frank – every IT interaction starts with identity. Whether booting up a new device or launching an application, authentication is required to even get started. It would be rather difficult to safeguard the availability of the network if you couldn't login to the system. I have even seen, firsthand, a forensics analyst report regarding a hacker who gained access to an enterprise admin account and then swiftly took down all of IT operations, leading to catastrophic consequences. A well-implemented IAM program is essential to safeguarding the organization from operational disruptions.

I am sure you might be thinking about other critical areas to consider as well, such as patch management, knowledge management, change management, and so on. However, in my opinion, everything really starts here and builds upon these.

The Gap Between Vision and Reality

In typical workshops that I conduct, we usually end up having to gain agreement among the participants on both where do you want to go (the vision) and where are you at today (the reality). Often times, some IT teams are really struggling to maintain operational stability and availability, while other adjacent teams with IT are heads down in their own work and unaware of their heroic undertakings. Imagine going on a hike and not discussing ahead of time on what summit you are climbing together (the vision) and where are you starting from (the reality). Does everyone agree that this is just a summer trail hike or are you headed over a glacier? Are you all starting at the base or is the team half-way up the path and didn't bring the appropriate climbing equipment?

Many organizations want to leap straight to infrastructure automation, GitOps, and self-healing endeavors. However, skipping foundational steps will make your journey extraordinarily difficult. For those organizations that have engaged in our Automation Envisioning Workshops and Observability Workshops, we broach these topics head on as they become critical paths for success. Frankly, most of the time we find effective ways to quickly tackle these issues within the workshop itself.

Common Pitfalls:

Procuring expensive automation tools: Without a source-of-truth supported by effective asset lifecycle and data governance processes, these tools often become siloed solutions, requiring extensive manual intervention and difficult justification conversations with senior leadership during renewals.
Underestimating observability needs: Infrastructure automation is only as effective as its visibility. If teams lack proper insights, impact analysis and performance reporting, troubleshooting remains reactive rather than proactive, this can make automated systems appear unreliable. Additionally, what I see most often, is the lack of clear metrics and ROI insights making it difficult to justify expansion and the continued investment to senior leadership.
Lacking IT standards and implementation consistency: Rushing into automation with unpredictable implementations, a variety of device types as well as poor data standards and ineffective data governance can make the automation journey unnecessarily challenging and seem like an unending investment in job task remediation and tuning efforts.

Key Questions to Consider:

Network: Is the production network stable and always available?
Assets: Do you know what you own? How do devices, applications, and services enter and exit your environment?
Configuration: Do you document how you intend to configure your systems and are they in fact configured that way?
Observability: Do you have visibility into all assets, even as they enter and leave your environment?
DDI: How resilient to failures is your DNS implementation?
IAM: Is identity and privileged access comprehensive and integrated across the enterprise?

Conclusion

Just like any structured framework – foundational components come first, even in IT. Many organizations aspire to GitOps managed infrastructure automation but struggle with basic operational challenges. Success comes from "going slow, to go fast." Aligning IT priorities to provide a robust foundation upon which to build will set the organization up for success on these more "self-actualized" endeavors – automation, observability, and AI-driven infrastructure. Fix the basics and then build from there!