Open resources and instruments for Ukrainian NLP

  • https://github.com/brown-uk/dict_uk — the Large Electronic Dictionary of Ukrainian (VESUM) counts more than 416 thousand lemmas and is constantly updated. It contains information on inflection of the words, non-standard word forms and their alternatives are marked; abbreviations and contractions accounted for; information on some alternative orthographic norms included; encompasses a large database on proper names; is synchronized with the Ukrainian gazetteer, including place names appeared after the decommunization; features a very compact system of marking inflectional types and tags that enables easy updates and regrouping of existing words; contains data on some rare and spoken forms, eg uncontracted adjectives (гарная) and spoken variant of infinitive (поїхать) 
  • https://github.com/brown-uk/nlp_uk — an instrument of processing the Ukrainian language based on the VESUM dictionary and the LanguageTool engine. Supports tokenization, lemmatization, POS analysis and basic disambiguation. Features an example of realization on python3. 
  • https://github.com/brown-uk/corpus BrUC — a balanced 1-million corpus of modern Ukrainian, the morphological ambiguity is to be resolved.
  • https://github.com/lang-uk — a part of the BRUK annotated for named entities and also a build-up model for automatic annotation of named entities (people, organizations, locations and others); the UberText corpus,  different gazetteers, word vectors, simple tokenizer (splitting text into paragraphs, sentences and tokens) and other useful features
  • https://github.com/UniversalDependencies/UD_Ukrainian-IU/tree/master — a dependency treebank for Ukrainian
  • https://github.com/kmike/pymorphy2 — a morphological analyzer without disambiguation; the Ukrainian language is supported using the old version of VESUM
  • https://stanfordnlp.github.io/stanza/ — the Stanford library for language processing; supports Ukrainian using the UD corpus, see above. Features models for tokenization, lemmatization, POS and syntactic analysis.
UkrNLP-Corpora: Ukrainian CLARIN Knowledge Centre

How to use this theme

Every part of this theme can be translated to another language. Even this content you are reading now!

The drop-down in the main menu is called a Locale Picker. It lets you quickly switch between any of the available languages when browsing this website.

For help on setting up more languages, close this popup and click the Languages menu item.