2026-02-16 · IT-Service

IT Infrastructure Audit for Business: What We Check and Why

Every company with a dozen or more workstations has IT infrastructure. Servers, networking, domain structure, backups, virtualization — all of it either runs predictably or quietly degrades until something breaks at the worst possible moment.

An infrastructure audit is not a formality or a box-ticking exercise. It is a systematic examination of every component in your IT environment with a single purpose: understand what works, what doesn’t, what will break next, and what to do about it.

In this article, we explain how we conduct audits, what exactly we check, and provide a 25-point checklist you can use to preliminarily assess the state of your infrastructure on your own.

When Does a Business Need an IT Infrastructure Audit

Not every company recognizes the moment when an audit becomes a necessity. Typically, clients reach out to us in one of these scenarios:

Change of IT leadership. A new CTO, technical director, or IT manager needs an objective picture of the current state of systems. Without an audit, they inherit all the hidden risks left by their predecessor.

An incident that already happened. A server failed to come back up after an update, a backup turned out to be unusable, or the company suffered a cyberattack. After putting out the fire, you need to understand the full scope of the damage.

Scaling or migration. The company is growing, opening new offices, moving to a new virtualization platform, or planning to shift some services to the cloud. Without an audit, decisions are made blindly.

Transitioning from a “jack-of-all-trades admin” to a managed model. When a business grows beyond 30–50 workstations, a single system administrator is no longer enough. You need a model with clear ownership boundaries, SLAs, and change management procedures.

Preparing for an external audit or certification. ISO 27001, client requirements, or regulatory compliance — it all starts with understanding where you stand today.

If you recognized at least one of these scenarios — an audit is not just advisable, it is necessary.

Five Zones of an IT Infrastructure Audit

We structure every audit across five zones. Each one is a distinct layer where risks can accumulate.

1. Server Services

This is the core of corporate infrastructure. Problems here cascade into everything else.

Active Directory and domain structure. We check the health of domain controllers: replication, FSMO roles, SYSVOL and NETLOGON status. We analyze the OU structure, delegation of rights, and the presence of stale or “forgotten” accounts with privileged access. We separately review Group Policy Objects (GPO): conflicting policies, documentation for critical GPOs, and whether filtering works correctly.

DNS. We review forward and reverse lookup zones, scavenging for stale records, and correct configuration of forwarders and conditional forwarders. Misconfigured DNS is one of the most common causes of “strange” resource access issues.

DFS. If DFS-Namespaces or DFS-Replication is in use, we check replication status, backlog, and conflicts. In large environments, DFS-R problems can go unnoticed for weeks.

RDS and terminal services. For companies where employees work via Remote Desktop Services: we verify licensing (CALs), Session Host and Connection Broker health, session timeout settings, and user profile configuration.

SQL Server. We check the version and patch level, database maintenance plans, index health, tempdb configuration, and transaction log size and growth. We assess whether current resources are adequate for the actual workload.

2. Virtualization

Most modern infrastructures are built on virtualization. Understanding the real state of resources here is critical.

Host inventory. How many physical servers, which hypervisor version (ESXi, Hyper-V), update status, hardware health (disks, memory, controllers).

Resource allocation. We check CPU and RAM overcommit. Overcommit is not always a problem, but without monitoring it becomes a ticking time bomb. We look at balloon driver activity, swap usage, and ready time.

Storage. Datastore health, latency, thin vs. thick provisioning, orphaned files, and stale snapshots. A snapshot that has existed for more than 72 hours in production is a red flag.

High Availability. If VMware HA, vMotion, or Hyper-V Failover Clustering is in use, we verify the configuration and test failover — or analyze when it was last tested.

3. Network Layer

The network is the circulatory system of your infrastructure. Problems here often masquerade as “slow applications.”

Segmentation. Are the server segment, workstations, guest network, and IoT devices separated? Ideally, VLANs with inter-VLAN firewalling. In practice, we frequently encounter flat networks where a printer sits in the same segment as a domain controller.

Firewall and access rules. We review the configuration, look for overly broad rules (any-to-any), and check the date of the last review. Special attention goes to rules governing VPN and external access.

VPN tunnels. Status of site-to-site and client VPN connections, protocols in use (IKEv2, WireGuard, OpenVPN), split tunneling configuration and its security implications.

Single points of failure. Is there a single point of failure at the network level? One router, one internet link, one DNS server? For business-critical infrastructure, this is unacceptable.

4. Security

Security is not a standalone product — it is a characteristic of the entire infrastructure. We assess it end-to-end.

Access management. Who has privileged access to servers, AD, and network equipment? How many people know the Domain Admin password? Are there dedicated administrative accounts? Is LAPS configured for local admin accounts?

Password policies. Minimum length, complexity, expiration, lockout on failed attempts. Is the policy actually enforced, or does it only exist on paper?

Patching and updates. Update status for server and workstation operating systems. Is there a WSUS or another centralized patch management system? When were the servers last updated?

Antivirus protection. Presence, centralized management, signature freshness, coverage (are all endpoints protected?). For servers — are exclusions configured without compromising protection?

Encryption. BitLocker on workstations and servers, backup encryption, TLS for internal services.

5. Backup and Disaster Recovery (DR)

This is the zone where we most frequently find critical issues. A backup is not what you configured — it is what you can actually restore from.

Coverage and completeness. Are all critical systems being backed up? AD, SQL databases, file servers, network equipment configurations, virtualization services? We regularly encounter situations where “everything is backed up” — except the firewall configuration or DNS zones.

RPO and RTO. What are the actual Recovery Point Objective (how much data is lost) and Recovery Time Objective (how quickly can you recover) values? Do they match business expectations?

The 3-2-1 rule. Three copies of data, on two different media types, one offsite. This is the minimum standard. How is this implemented in your environment?

Recovery testing. The key question: when did you last successfully restore a server or database from backup? Not “verified the backup file exists” — actually brought a system back up? If the answer is “never” or “I don’t remember,” this is the single biggest risk in your infrastructure.

DR plan. Is there a documented action plan for a complete loss of the primary site? Who is responsible? Where are the recovery passwords and documentation stored?

Checklist: 25 Points for Self-Assessment

This checklist does not replace a full audit, but it allows you to quickly evaluate critical areas. If you answered “no” to 5 or more items, your infrastructure needs professional attention.

Server Services

All domain controllers are healthy, replication runs without errors
The number of accounts with Domain Admin rights is fewer than 5
Group policies are documented and reviewed regularly
DNS scavenging is enabled, stale records are removed automatically
RDS licensing matches the actual number of connections

Virtualization

The hypervisor version is current and supported by the vendor
RAM overcommit does not exceed 120% on any host
No snapshot in production has existed for longer than 72 hours
Server hardware health is monitored (disks, memory, temperature)
High Availability has been tested within the last 6 months

Network

The server segment is isolated from workstations via VLAN
Firewall rules have been reviewed within the last year
There are no any-to-any rules
VPN uses modern protocols (IKEv2, WireGuard)
A backup internet link is available

Security

Privileged accounts are separated from everyday accounts
Password policy requires a minimum of 12 characters
Servers have been updated within the last 30 days
Antivirus is centrally managed and covers 100% of endpoints
LAPS is configured for local admin accounts

Backup and DR

All critical systems have a current backup
Backups follow the 3-2-1 rule
Recovery from backup has been tested within the last 90 days
RPO and RTO are defined and meet business requirements
A documented DR plan exists with assigned responsibilities

What to Do with Audit Results

An audit without action is just paper. We structure findings into three priority levels:

Critical risks — issues that can lead to data loss, business downtime, or security compromise right now. Fixed immediately: non-functional backups, accounts with excessive privileges and no oversight, missing patches on externally exposed services.

Medium risks — issues that create vulnerabilities or reduce manageability. Fixed within 30–60 days: flat network structure, lack of monitoring, undocumented GPOs.

Improvements — items that increase infrastructure maturity. Planned into the quarterly roadmap: implementing LAPS, migrating to modern VPN protocols, automating backup recovery testing.

After prioritization, we build an operating model: who is responsible for what, which changes go through a change management process, and which metrics are tracked under the SLA.

How We Conduct Audits

Our process consists of four steps:

Step 1. Short call (15–20 minutes). We assess the scale of your infrastructure, critical services, and current pain points. We determine whether we are a good fit for each other.

Step 2. Access and data collection (1–2 days). We receive limited access to the environment. We collect inventory data, configurations, and service status using standardized scripts and tools. We do not make any changes.

Step 3. Analysis and report (2–3 days). We produce a report with a detailed description of each zone’s condition, identified risks, and prioritized recommendations. The report is written in plain language — you can share it with executive leadership.

Step 4. Operating model and SLA. Based on the audit, we propose an ongoing infrastructure management model: ownership boundaries, change management procedures, and SLA terms for incident response and resolution.

Conclusion

An IT infrastructure audit is not a one-off event — it is the starting point for transitioning from chaotic firefighting to a managed model. You get a complete picture of the current state, understand the risks, and have a concrete action plan.

If you recognized your situation in the scenarios described above or answered “no” to several checklist items — let’s talk.

We start with a short 15–20 minute call. Then — an audit and an operating model.

← Back to expertise