عنوان
|
Web pages classification: An effective approach based on text mining techniques
|
نوع پژوهش
|
مقاله ارائه شده
|
کلیدواژهها
|
classification, data mining, machine learning, natural language processing, text mining
|
چکیده
|
Some web pages on Internet contain important content that are useful in a long time period or even forever. On the other hand, there are some web pages that are valuable only in a short time period. It is difficult to classify these types of web pages automatically due to their contents. This is an important task for improving the performance of search engines and web page recommender engines. In this project, webpages were classified into two categories with machine learning algorithms. For this purpose, natural language processing and text mining techniques were used for text pre-processing. Then appropriate information was extracted from texts and eventually web pages were classified by using machine learning algorithms. Compared to other approaches, most of the focus in this project is on text pre-processing stage and new strategies were presented to fill the gap. The results indicate that the proposed approach had better performance than other approaches.
|
پژوهشگران
|
میثم روستائی (نفر دوم)، سید معین باباپور (نفر اول)
|