سامانه پژوهشی دانشگاه مازندران | An effective approach to candidate retrieval for cross-language plagiarism detection: A fusion of conceptual and keyword-based schemes

عنوان	An effective approach to candidate retrieval for cross-language plagiarism detection: A fusion of conceptual and keyword-based schemes
نوع پژوهش	مقاله چاپ شده
کلیدواژه‌ها	Plagiarism detection, Cross-language plagiarism, Candidate retrieval, Conceptual model, Keyword-based model
چکیده	Due to the rapid growth of documents and manuscripts in various languages all over the world, plagiarism detection has become a challenging task, especially for cross lingual cases. Because of this issue, in today's plagiarism detection systems, a candidate retrieval process is developed as the frst step, in order to reduce the set of documents for comparison to a reasonable number. The performance of the second step of plagiarism detection, which is devoted to a detailed analysis of the candidates is tightly dependent on the candidate retrieval phase. Regarding its high importance, the present study focuses on the candidate retrieval task and aims to extract the minimal set of highly potential source documents, accurately. The paper proposes a fusion of concept-based and keyword-based retrieval models for this purpose. A dynamic interpolation factor is used in the proposed scheme in order to combine the results of conceptual and bag-of-words models. The eﬀectiveness of the proposed model for cross language candidate retrieval is also compared with state-of-the-art models over German-English and Spanish-English language partitions. The results show that the proposed candidate retrieval model outperforms the state-of-the-art models and can be considered as a proper choice to be embedded in cross-language plagiarism detection systems.
پژوهشگران	سید مصطفی فخراحمد (نفر سوم)، محمدهادی صدرالدینی (نفر دوم)، میثم روستائی (نفر اول)

مشخصات پژوهش