Conventional grid-tomographic inversion is based on the travel time of the reflected waves and the residual curvature of the co-imaged points to construct vector equations and thus update the velocities, however, in the complex surface region, factors such as low signal-to-noise ratio of a single shot and insufficient reflection information lead to insufficient accuracy of velocity modelling in the near-surface region, and the lack of layers and fault constraints, which can not accurately restore the actual structure of the deeper layers. The current method widely used in industry is to fuse the shallow velocities obtained from the tomographic inversion by first breaks with the surface elevation plane at a certain depth downward as the fusion surface and the mid-deep velocities obtained from the grid tomography, but this method will show obvious fusion marks in the fusion area and the velocity values at the fusion point will change abruptly. For this reason, this paper starts from the principle of tomography inversion by first breaks, picks up the fusion surface based on the first-to-wave ray density and the difference between the two models, and constructs a fusion function for velocity fusion that fits the geological background of the work area, so as to eliminate the problem of fusion trace and abrupt changes in velocity values at the fusion trace to the maximum extent, and add fault constraints during the subsequent stratigraphic inversion. This method uses velocity fusion, fault-control constraints and improved residual curvature pickup method to improve velocity modelling and imaging accuracy in complex mountain regions. The complex model results prove the correctness, and practical data test results prove the applicability.