Process assessment
As servers are reduced through consolidation, it's easy to understand that the criticality of each server increases, and every failure could have an higher impact on the environment. It is essential to ensure that a consistent set of processes for management and control is in place, to provide a sufficient level of quality and to avoid future fragmentation.
The process assessment stage is intended to examine the current situation and to outline any improvement that is necessary. It is therefore important to:
- identify the person responsible for each process
- understand if each process is adequately documented and described
- check if the results of each process are available for further analysis, stored properly and easy accessible
- analyze the existing tools and how they are used; identify if there is room for improvements (ie: using additional or alternative tools)
Let's some typical processes.
Test and release processes
The stability of a production server is strictly dependent on the process used to test and apply any changes to it. In a consolidated environment, each server is usually dedicated to more than a single service, therefore it is really important to apply changes in a strictly controlled way.
The following tests must be performed before any change is committed:
- product test
- system test
- integration test
- stress test
System management
There are a number of critical activities that need to be carried out to ensure the availability of a production machine. In a consolidated environment, servers must be monitored, and events managed properly as soon as they occur. It is important to have at least the following activities planned:
- system monitoring
- events management
- failures diagnostic and fix
Disaster recovery
It is rare, but sometimes major events (disasters) can occur and compromise part or even all the environment. The time frame between the disaster and the recovering of critical functionalities must be as short as possible. It is important to ensure:
- 'out-of-band' communication facilities
- backup of critical data
- reliability of recovery procedures