摘要摘 要:关系从句在理论语言学、心理语言学、计算语言学、语言习得与教学等研究领域广受关注,但因人工提取标注耗时且易错,严重限制了研究规模。新近问世的Auto Sub Clause计算机程序,通过使用依存句法分析器,能够实现英语多种从句自动提取及标注,有助于解决这一问题,但其具体处理效果犹未可知。本文基于不同类型语料文本,包括英语本族语者、学习者、译者产出的政论、文学类书面语和口语语料文本,系统评估该程序自动提取英语关系从句的准确率,并进一步考察该程序对关系从句各类特性的自动标注效果,包括可及性、生命性、限制性等。研究结果显示,该程序自动提取英语本族语和译文文本中关系从句的召回率和精确率较高,但对学习者文本的提取效果还有待改进;在关系从句特性的自动标注方面,该程序对名词生命性、从句限制性以及核心名词在从句中充当成分等的识别精确率总体表现优秀。针对程序存在的问题和不足,论文进行了分析并提出了改进建议。
关键词:Auto Sub Clause;关系从句;自动提取及标注;多类型语料文本
Abstract:Issues concerning relative clauses have attracted considerable attention in research areas such as theoretical linguistics, psycholinguistics, computational linguistics, and language acquisition and teaching. However, the size of research in previous studies was limited because it is time-consuming and error prone for researchers to extract and annotate relative clauses manually. To address this issue, a recent computer program named AutoSubClause, using dependency parsing, was developed to automatically extract and annotate different types of English subordinate clauses, but its performance remains unknown. In this study, we evaluate the accuracy of the program based on different types of texts including political and literary, written and spoken ones, produced by native speakers, learners, or translators, respectively. We also assess the reliability of its annotation of linguistic features such as accessibility, animacy, and restrictiveness. Results revealed an overall high performance on the extraction of relative clauses from native and translated texts, but the precision for learner’s texts need to be improved. In addition, the program demonstrated a high precision in the automatic annotation of linguistic features such as animacy, restrictiveness, and the roles of head nouns in relative clauses. Limitations of the pro- gram are discussed and suggestions for improvement are provided.
Key words: AutoSubClause; relative clauses; automatic extraction and annotation; multitype texts