Abstract:Abstract
This research randomly divides large-scale written British National Corpus (BNC) and American National Corpus (ANC) into the experimental set and test set, with the former containing 60 samples and the latter 41 samples, totaling 83,864 pairs of texts. A dynamic analysis is made to study the English vocabulary repeat rate by means of computer programs. A mathematic model to calculate vocabulary repeat rate is established and then tested based on the 60 samples in the experimental set. Results show that the distribution curves for vocabulary repeat rates are nonlinear and regular, with only a few outliers;the inferred formula experiences a very small margin of error in the calculation of theoretical repeat rate, and can be used to estimate the theoretical values of vocabulary repeat rate for authentic English texts of different lengths.