Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles

Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles
درجه علمی نشریه: 
فصل: 
دوره: 
۶۶
شماره: 
۶ و ۷
شماره صفحه (از - تا): 
۵۴۹-۵۶۹
چکیده

Purpose – The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database such as Web of Science (WoS), to apply scientometrics indices and compare them with other fields.‎
Design/methodology/approach – The authors propose to apply a statistical classification-based approach for extracting IT-related articles.‎ In this approach, first, a probabilistic model is introduced to model the subject IT, using keyphrase extraction techniques.‎ Then, they retrieve IT-related articles from all Iranian papers in WoS, based on a Bayesian classification scheme.‎ Based on the probabilistic IT model, they assign an IT membership probability for each article in the database, and then they retrieve the articles with highest probabilities.‎
Findings – The authors have extracted a set of IT keyphrases, with 1,497 terms through the keyphrase extraction process, for the probabilistic model.‎ They have evaluated the proposed retrieval approach with two approaches: the query-based approach in which the articles are retrieved from WoS using a set of queries composed of limited IT keywords, and the research area-based approach which is based on retrieving the articles using WoS categorizations and research areas.‎ The evaluation and comparison results show that the proposed approach is able to generate more accurate results while retrieving more articles related to IT.‎
Research limitations/implications – Although this research is limited to the IT subject, it can be generalized for any subject as well.‎ However, for multidisciplinary topics such as IT, special attention should be given to the keyphrase extraction phase.‎ In this research, bigram model is used;‎ however, one can extend it to tri-gram as well.‎
Originality/value – This paper introduces an integrated approach for retrieving IT-related documents from a collection of scientific documents.‎ The approach has two main phases: building a model for representing topic IT, and retrieving documents based on the model.‎ The model, based on a set of keyphrases, extracted from a collection of IT articles.‎ However, the extraction technique does not rely on Term Frequency- Inverse Document Frequency, since almost all of the articles in the collection share a set of same keyphrases.‎ In addition, a probabilistic membership score is defined to retrieve the IT articles from a collection of scientific articles.‎
 

استناد: 

Mohebi, Azadeh, Mehri Sedighi, and Zahra Zargaran.‎‎ 2017.‎‎ Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles.‎‎ Library Review 66 (6 , 7)‎‎: 549-569.‎

مقاله ادواری علمی
دوره انتشار: 
وضعیت انتشار: 
نمایه‌شده در: 

افزودن دیدگاه