next up previous
Next: Calculation of Up: Reliability and Availability Analysis Previous: Calculation of

Markov-Chain Model for Reliability Evaluation

Figure 11: Markov state diagram for Protocol 1.
\begin{figure}\centering \centerline{\hbox{\epsfig{figure=figures/markovpb.eps, width=3.5in}}} \end{figure}

Figure 11 shows the Markov state diagram for Protocol 1, which can also be applied to the other protocols. In this diagram, $i:mPnB$ signifies that the state number/index is $i$, and there are $m$ and $n$ failed nodes in the primary and backup group, respectively. All the states shown are working states, with the exception of $DL$, which is the data loss state. The total number of states in the Markov state diagram is denoted by $S$ and is equal to $(N+1)(N+2)/2$. The Markov chain begins with State 1 ($1:0P0B$), followed by State 2($2:1P0B$), and so on.

To facilitate the solution to this model, we derive a function, given in Eqn. 2, that maps from the system state with $m$ failed primary nodes and $n$ failed backup nodes to the state index $i$ of the Markov state diagram:


\begin{displaymath}
i = \frac{1}{2}(m+n)(m+n+1) + (n+1)
\end{displaymath} (2)

Similarly, the inverse mapping function is given in Eqn. 3 and 4.

$\displaystyle n$ $\textstyle =$ $\displaystyle i -1 - \frac{x(x+1)}{2}$ (3)
$\displaystyle m$ $\textstyle =$ $\displaystyle x - n$ (4)

where $x = \left\lceil\frac{\sqrt{8i+1} - 3}{2}\right\rceil $.

Figure 12: The transition probability between different states.
\begin{figure}\centering \centerline{\hbox{\epsfig{figure=figures/transitionNew.eps, width=3in}}} \end{figure}

Figure 12 shows the transition rate between the neighboring states. In the diagram, $P_{ij}$ denotes the probability that the system remains functional, also referred to as safety probability, given that one more primary node fails while $m$ primary nodes and $n$ backup nodes have already failed. Similarly, $P_{ik}$ denotes the probability, or safety probability, of the system remaining functional when one more backup node fails while $m$ primary nodes and $n$ backup nodes have already failed. $P_{ij}$ can be calculated as Eqn. 5.

$\displaystyle P_{ij}$ $\textstyle =$ $\displaystyle \hat{P}((m+1)PnB\,\vert\,mPnB)$  
  $\textstyle =$ $\displaystyle \frac{\hat{P}((m+1)PnB \cap mPnB)}{P(mPnB)}$  
  $\textstyle =$ $\displaystyle \frac{\hat{P}((m+1)PnB)}{P(mPnB)}$ (5)

where $\hat{P}$ is the safety probability when $m$ nodes in the primary group and $n$ nodes in the backup group fail simultaneously. Eqn. 6 gives the calculation of $\hat{P}$.
\begin{displaymath}
\hat{P}(mPnB) = \left\{ \begin{array}{rl}
\frac{{N\choose m...
... $m+n \leq N$;} \\
0 & \mbox{otherwise.}
\end{array}\right.
\end{displaymath} (6)

Similarly, we have

\begin{displaymath}
P_{ik}=\frac{\hat{P}(mP(n+1)B)}{\hat{P}(mPnB)}
\end{displaymath} (7)

The transition probability from State $i$ to the data loss state, denoted as $q_{i, DL}$, can be calculated as Eqn. 8.

$\displaystyle q_{i,DL}$ $\textstyle =$ $\displaystyle \mbox{loss caused by one more primary node failure}$  
    $\displaystyle + \; \mbox{loss caused by one more backup node failure}$  
    $\displaystyle + \; \mbox{loss caused by network switch failure }$  
  $\textstyle =$ $\displaystyle (N-m) \lambda [(1-P_{ij}) + P_{ij}(1- P_{c})]$  
    $\displaystyle +(N-n)\lambda(1-P_{ik}) + \lambda_s$  
  $\textstyle =$ $\displaystyle (N-m)\lambda (1-P_{c}P_{ij})$  
    $\displaystyle +(N-n)\lambda(1-P_{ik}) + \lambda_s$ (8)

The stochastic transitional probability matrix is defined as $Q = \left[q_{ij}\right]$, where $1\leq i, j\leq S$ and $q_{ij}$ is the transition probability from State $i:m_iPn_iB$ to State $j:m_jPn_jB$. In summary, $q_{ij}$ can be calculated as follows.

If $i < j$, then

\begin{displaymath}
q_{ij} = \left\{ \begin{array}{rl}
(N-m_i)\lambda P_{ij}P_{...
..._j = n_i + 1$;} \\
0 & \mbox{otherwise.}
\end{array}\right.
\end{displaymath} (9)

If $i > j$, then

\begin{displaymath}
q_{ij} = \left\{ \begin{array}{rl}
m_j\mu & \mbox{if $m_j =...
... = n_i + 1$\ ;} \\
0 & \mbox{otherwise.}
\end{array}\right.
\end{displaymath} (10)

If $i = j$, then

\begin{displaymath}
q_{ii} = 1 - \sum_{i=1, j \not= i}^{j \leq S}q_{ij}-q_{i,DL}
\end{displaymath} (11)

Figure 13: Classic Markov state diagram of RAID-1.
\begin{figure}\centering \centerline{\hbox{\epsfig{figure=figures/markovs.eps, width=3in}}} \end{figure}

If $P_c = 1$, i.e., the primary node and backup node are always kept consistent, like in RAID-1, and a fault-free network is assumed, the model shown in Figure 11 can be simplified to the classic RAID-1 model [28], as shown in Figure 13. This is proven by the fact that numerical results generated by both models with the same set of input parameters are identical.


next up previous
Next: Calculation of Up: Reliability and Availability Analysis Previous: Calculation of
Yifeng Zhu 2003-10-16