02 Mar FMEA (Failure Mode Effect Analysis) in HVAC systems
In this post, we will be discussing a very important aspect of the development of risk analysis in pharmaceutical HVAC systems, which is FMEA (Failure Mode Effect Analysis).
According to ICH guideline Q9 on quality risk management: Quality risk management is a systematic process for the assessment, control, communication and review of risks to the quality of the drug (medicinal) product across the product lifecycle.
And one of the tools it offers is FMEA to achieve this goal.
2. What is FMEA?
According to the above-mentioned ICH Q9:
FMEA provides for an evaluation of potential failure modes for processes and their likely effect on outcomes and/or product performance. Once failure modes are established, risk reduction can be used to eliminate, contain, reduce or control the potential failures. FMEA relies on product and process understanding. FMEA methodically breaks down the analysis of complex processes into manageable steps. It is a powerful tool for summarizing the important modes of failure, factors causing these failures and the likely effects of these failures.
3. FMEA (Failure Mode Effect Analysis) in HVAC systems Development
In practice, we will perform three types of analysis: probability, severity, and probability of detection, to which we will assign a score. In my case, I use a range of 1 to 5, with 1 being the lowest probability/severity/detection, and 5 being the highest.
Let’s see an example of each of these analyses, analyzing the risk of failure of an air handling unit’s supply fan.
3.1 Probability of failure
Here we will analyze if the failure is very probable or not:
1 – Very low probability of failure, which could happen less than once a year
2 – Low probability. The failure can occur a maximum of 3 times a year
3 – Moderate probability. If the failure can occur once a month
4 – High probability. If the failure is likely to occur weekly
5 – Very high probability. The failure can occur daily.
In our example, although the failure of a fan is possible, the probability is normally higher than once a year, so we will assign a score of 1. To reach this conclusion, we had to previously analyze the type of technology used. That is, we assume that the fan to be used corresponds to a plug-fan or EC type, with a lower probability of failure than, for example, a belt-driven fan. The latter is already outdated due to their low efficiency, but in the event that we had them, we would have to increase the probability to, say, 2-3.
3.2. Severity of failure (S)
In this analysis, we will consider both the impact on GMP and other aspects that affect the business or organization.
1 – Very low severity, with no influence on product quality and no economic impact.
2 – Low severity, can cause a technical or organizational problem that is relatively easy to resolve.
3 – Moderate severity, triggers an alarm, although within control parameters, or an economic impact within reason.
4 – High severity, involving a critical out-of-tolerance condition, economic impact on a piece of equipment or a part of it.
5 – Very high severity. The consequence would be a rejected product or being placed in quarantine. Serious economic or organizational impact. Damage to the organization’s reputation.
In the example of the fan failure, we can easily assign a score of 5, as we will lose the pressure gradient in the room and air will not be recirculated through the HEPA filters, also losing the ISO 14644 classification.
3.3. Detection probability (D)
The next parameter we will evaluate is the detection probability. That is, in case of a failure, how quickly or easily will it be detected. In this sense, we can assign a score based on:
1 – Very high detection probability. This is possible because the failure result is continuously monitored (for example, through BMS monitoring) or is under continuous supervision.
2 – High detection probability. The failure is detected relatively quickly once it occurs.
3 – Moderate detection probability. The failure is detected, but through periodic sampling, for example. Additionally, the sampling frequency is not very high.
4 – Low detection probability. The failure is detected through sampling, which is done at a high frequency.
5 – Very low detection probability. The failure is very difficult to detect, only if it is expressly sought or through demand analysis.
The score we give to the detection probability of a supply fan failure will depend on several factors. One of them is whether the flow rate or supply pressure is monitored with BMS, generating an alarm. If the differential pressure of the rooms is also monitored, we will also obtain an alarm, and therefore the detection will be immediate, and a score of 1 can be assigned in this case. If we do not have a monitoring system, the score will be higher as a result.
4. Risk priority ranking calculation
Once we have a value for P, S, and D for each risk, we will proceed to multiply these factors, obtaining a value for the Risk Priority Ranking (RPR). Depending on the value obtained, we can classify the failure risk into three categories:
RPR = P x S x D
- Low risk. RPR ≤ 12
- Moderate risk: 12 < RPR ≤ 20
- High risk: RPR > 20
These values are only a proposal and can be adapted according to your experience and to your particular process.
One aspect I would like to emphasize is the following. The score is obtained by taking into account the measures used to mitigate the effect of the failure. If we obtain a score that is too high, we can have the elements of judgment that allow us to implement measures to decrease the impact of the failure.
With all this information, we can prepare a table that includes the following:
- References to the analyzed system
- Failure number for the system
- Description of the potential failure
- Impact of the failure
- Type of risk (GMP, Business, etc.)
- Corrective actions to decrease the probability
- Corrective actions to increase detection
- The score assigned to Probability, Severity, and Detection
- RPR score: Green (low risk), orange (moderate risk), or red (high risk) product
- Qualification documentation reference. Normally, the corrective actions need to be qualified. We should record the references to such qualification activities.
Thus, the result would be something like this:
5. Typical FMEA (Failure Mode Effect Analysis) in HVAC Systems
Finally, here are some possible failure modes in a pharmaceutical HVAC installation that deserve to be studied to risk assessment. We will not address here issues directly related to the HVAC/Cleanroom layout interface, such as the definition of pressure cascades, PAL/MAL definition, personnel/material flows, etc., which are part of a GMP compliance risk analysis. We focus only on HVAC installations and associated elements. The list can be extended.
- Fan failure
- Louvers blockage due to snow in exterior air intakes
- Flooding in condensate trays
- Condensate carryover
- Air aspiration/loss in drain traps
- Leakage in cold water, hot water, steam valves, etc.
- Freezing of coils in low-temperature applications
- Condensation in pipes/ducts
- Obsolescence of components (control elements, fans, etc.)
- Fan vibrations
- Clogging of pre and intermediate filters
- Clogging of HEPA filters
- Leakage in HEPA filters
- Air/energy loss in ducts
- Air/sensible/latent energy infiltration in ducts
- Ensuring maintenance of air changes
- Temperature out of limits
- Relative humidity out of limits
- Variations in differential pressures
- Risk of cross-contamination between suites. AHU segregation
- Risk of cross-contamination in biological facilities (use of HEPA filters in return air?)
- And so forth…