GRAC

General Regionally Annotated Corpus of Ukrainian (GRAC) is a large representative collection of texts in Standard Ukrainian made accessible by means of a sophisticated interface. Using this interface, users may search for words, grammatical forms and their combinations; using subcorpora, they can restrict their search to specific time spans, registers, text types or regions, among others. The query results can be sorted, balanced samples can be extracted and different statistical information collected.

The corpus is a large reference corpus of Standard Ukrainian. It covers actual usage in a wide range of time periods and registers and is meant for linguistic research into the grammar, lexicon, history and sociolinguistics of written standard Ukrainian as well as for use in preparing dictionaries and grammars. It is regionally annotated in the sense that for most texts it includes information on where the text was published, or written, or where its authors originate. Providing this information is in our view very important because of the complicated history of Ukrainian, with multiple and changing standards.

The corpus can be used for advanced study of the language as well as for writing textbooks, learner’s dictionaries and exercises using examples from real texts, taking into consideration the frequencies, collocations etc.

The corpus is not a prescriptive corpus; it seeks to represent Modern Ukrainian in actual usage across a wide time span and covering a wide range of geographic, stylistic and other variations. It contains more than 150 thousand texts by about 35 thousand authors, written between 1816 and the present day.

Please cite GRAC

Maria Shvedova, Ruprecht von Waldenfels, Sergey Yarygin, Andriy Rysin, Vasyl Starko, Tymofij Nikolajenko et al. (2017-2024): GRAC: General Regionally Annotated Corpus of Ukrainian. Electronic resource: Kyiv, Lviv, Jena. Available at uacorpus.org.

GRAC

How to use this theme