With the rapid development of Virtual and Augmented Reality systems, it becomes more and more important to develop an efficient calibration method for optical see-through head-mounted displays (OST-HMDs). In this paper, a modular calibration framework with two calibration phases is proposed. In the first phase, an eye-involved equivalent camera model is proposed in order to compute the spatial position of the human eye directly, in the second phase, the gesture information is integrated to the system with a depth camera. In addition, a fast correction algorithm is introduced to ensure that the calibration result work for new users without additional complex recalibration procedures. The precision of the proposed modular calibration and optimization method is evaluated, and the result shows that the proposed method can simplify the recalibration procedures for OST-HMDs.