Polish Keyword Extractor

Dedicated keywords extraction method focused on polish language.

Project information

Polish Keywords Extractor is a dedicated keywords extraction method focused on polish language, inspired by algorithms RAKE and KEA. It has polish lemmatizer, Part-Of-Speech filters and different evaluation solutions (statistic function , or naive bayes classifier). This algorithm has two candidate selection methods, and two candidate evaluation methods. PKE starts with splitting text into sentences. Each sentence is represented as a sequence of words. Next words are normalized (lemmatized) and tagged with Part-Of-Speech properties. Keyword candidates selection is performed in order to find finite number of potential significant phrases. Finally candidates are evaluated by models (classifier or statistic function), and limited number of them with the highest scores are retrieved.