The regional markup of the corpus is based on the contemporary administrative structure of Ukraine. This is partly because of pragmatic reasons: administrative borders are clearly defined and it is possible to look them up in standard sources. While the administrative structure does not necessarily reflect the dialectal landscape of Ukraine, this choice does have a sociolinguistic dimension since the administrative regions do present socioeconomic and cultural entities of some relevance that are typically oriented towards the same centers. These administrative regions are then united in macroregions consisting of the Western, Eastern, Central, Southern and Northern area. Kyiv as the capital with people coming from different regions is treated as a separate macroregion. The macroregions are formed taking into account the Ukrainian dialects. The North region includes most of the territories of Northern/Polissya dialects, the West includes the Southwestern dialects, the South, East, and Center regions, respectively, the Steppe, Slobozhanshchyna, and Dnieper dialectal groups.
Below are the graphs showing how our texts are distributed across these macroregions overall in the corpus (Fig. 1) and across time (Fig. 2). Kyiv and the Western macoregion are represented by the largest numbers of texts. The other regions have much less texts.
Macroregion |
Tokens |
% |
W |
172303252 |
46 |
KYV |
118565515 |
32 |
E |
26624696 |
7 |
C |
23900708 |
6 |
S |
16903552 |
5 |
N |
12944789 |
3 |
Figure 1: Composition of GRAC by macroregions
Figure 2: Distribution of tokens by macroregions and years
Media texts (papers, news sites on the web) are marked by the region where the respective media appeared. Other texts are annotated by the region where the author (or the translator, for a translated text) was born, studied or lived for more than ten years.
The regional annotation is thus generally linked to the author of a text where such an author is available. A single text can belong to different regional subcorpora if the author or the translator was born, studied or lived for a long time in different regions. In the process of annotation, biographical information from all kinds of sources is evaluated so that the regional annotation reflects the Ukrainian linguistic biography of the author as closely as possible.
Approximately 85.5% of GRAC v.10 is annotated by region. Texts created in Ukraine that have one macroregion make up 60% of GRAC v.10 corpus.
For regional text markup, GRAC has the attributes DOC.COUNTRY, DOC.MACROREGION (North, West, South, East, Center, Kyiv: Fig. 3), DOC.REGION, and DOC.LOCCODE, which for convenience contains a set of all regional attributes (for example, DOC.COUNTRY = “UA”, DOC.MACROREGION = “C”, DOC.REGION = “CRK”, and DOC.LOCCODE = “UA-C-CRK”).
Figure 3: Macroregions of Ukraine in GRAC
DOC.LOCCODE for Ukraine:
UA-C-CRK — Cherkasy oblast
UA-C-KRV — Kirovohrad oblast
UA-C-KVS — Kyiv oblast
UA-C-PLT — Poltava oblast
UA-E-HRK — Kharkiv oblast
UA-E-SUM — Sumy oblast
UA-KYV-KYV — Kyiv
UA-N-CRG — Chernihiv oblast
UA-N-RVN — Rivne oblast
UA-N-VLN — Volyn oblast
UA-N-ZHT — Zhytomyr oblast
UA-S-DNC — Donetsk oblast.
UA-S-DNP — Dnipropetrovsk oblast
UA-S-HRS — Kherson oblast
UA-S-KRM — Crimea
UA-S-LGN — Luhansk oblast
UA-S-MKL — Mykolaiv oblast
UA-S-ODE — Odesa oblast
UA-S-ZPR — Zaporizhia oblast
UA-W-CRV — Chernivtsi oblast
UA-W-HML — Khmelnytskyi oblast
UA-W-IFR — Ivano-Frankivsk oblast
UA-W-LVV — Lviv oblast
UA-W-TRN — Ternopil oblast
UA-W-VNC — Vinnytsia oblast
UA-W-ZKR — Zakarpattia oblast
Aside from the above macroregions, the countries of the Ukrainian diaspora (the United States, Canada, Poland, Germany, the UK, France etc.) are distinguished in the annotation. DOC.LOCCODE for the Ukrainian diaspora starts with D, followed by a code for post-Soviet countries (DOC.MACROREGION = “V”) and other countries (DOC.MACROREGION = “Z”). The third code specifies the country. For the neighboring Russia, Poland and Czechoslovakia, a fourth code is available to specify further details.
D-V-BY — Belarus
D-V-GE — Georgia (country)
D-V-KZ — Kazakhstan
D-V-MLD — Moldova
D-V-RU — Russia
D-V-RU-KBN — Kuban
D-V-RU-SSL — Eastern Slobozhanshchyna
D-V-TKM — Turkmenistan
D-Z-AR — Argentina
D-Z-AT — Austria
D-Z-AU — Australia
D-Z-BE — Belgium
D-Z-BR — Brazil
D-Z-CA — Canada
D-Z-CH — Switzerland
D-Z-CZE — Czech Republic
D-Z-CZE-SVK — Czechoslovakia (before 1992)
D-Z-DE — Germany
D-Z-EET — Estonia
D-Z-ES — Spain
D-Z-FR — France
D-Z-GB — United Kingdom
D-Z-IL — Israel
D-Z-IT — Italy
D-Z-LT — Lithuania
D-Z-LV — Latvia
D-Z-PL — Poland
D-Z-PL-HLM — Kholm region
D-Z-RO — Romania
D-Z-SRB — Serbia
D-Z-SVK — Slovakia
D-Z-SWE — Sweden
D-Z-USA — United States
Every part of this theme can be translated to another language. Even this content you are reading now!
The drop-down in the main menu is called a Locale Picker. It lets you quickly switch between any of the available languages when browsing this website.
For help on setting up more languages, close this popup and click the Languages menu item.