Open resources and instruments for Ukrainian NLP

https://github.com/brown-uk/dict_uk — the Large Electronic Dictionary of Ukrainian (VESUM) counts more than 416 thousand lemmas and is constantly updated. It contains information on inflection of the words, non-standard word forms and their alternatives are marked; abbreviations and contractions accounted for; information on some alternative orthographic norms included; encompasses a large database on proper names; is synchronized with the Ukrainian gazetteer, including place names appeared after the decommunization; features a very compact system of marking inflectional types and tags that enables easy updates and regrouping of existing words; contains data on some rare and spoken forms, eg uncontracted adjectives (гарная) and spoken variant of infinitive (поїхать)
https://github.com/brown-uk/nlp_uk — an instrument of processing the Ukrainian language based on the VESUM dictionary and the LanguageTool engine. Supports tokenization, lemmatization, POS analysis and basic disambiguation. Features an example of realization on python3.
https://github.com/brown-uk/corpus BrUC — a balanced 1-million corpus of modern Ukrainian, the morphological ambiguity is to be resolved.
https://github.com/lang-uk — a part of the BRUK annotated for named entities and also a build-up model for automatic annotation of named entities (people, organizations, locations and others); the UberText corpus, different gazetteers, word vectors, simple tokenizer (splitting text into paragraphs, sentences and tokens) and other useful features
https://github.com/UniversalDependencies/UD_Ukrainian-IU/tree/master — a dependency treebank for Ukrainian
https://github.com/kmike/pymorphy2 — a morphological analyzer without disambiguation; the Ukrainian language is supported using the old version of VESUM
https://stanfordnlp.github.io/stanza/ — the Stanford library for language processing; supports Ukrainian using the UD corpus, see above. Features models for tokenization, lemmatization, POS and syntactic analysis.

UkrNLP-Corpora: Ukrainian CLARIN Knowledge Centre

Open resources and instruments for Ukrainian NLP

How to use this theme