A block diagram of the training stage of average voice model using the proposing technique is shown in Fig. 2. First, context dependent models without context clustering are separately trained for respective speakers to derive a decision tree for context clustering common to these speaker dependent models. Then, the decision tree, which we refer to as a shared decision tree, is constructed using an algorithm described in Sect. 3.3 from the speaker dependent models. Finally, all speaker dependent models are clustered using the shared decision tree. A Gaussian pdf of average voice model is obtained by combining all speakers’ Gaussian pdfs at every node of the tree. After the reestimation of parameters of the average voice model using training data of all speak- ers, state duration distributions is obtained for each speaker. Finally, state duration distributions of the av- erage voice model is obtained by applying the same procedure.