Click the links below to:

Access Geirfan

The Geirfan wordlist is a curated list of 500 of the most frequent words in the Welsh language, designed for use by learners at A1/A2 levels of proficiency. This vocabulary list was developed using an innovative symbiosis of corpus-based methods (using data from the CorCenCC corpus) and expert-led introspection and reflection; an approach which can be replicated and adapted for use in any other language context. This document gives details of the approach used to compile this vocabulary list. The list itself is provided in the Appendices, as follows:

  • Appendix A: the most frequent 750 words from CorCenCC
  • Appendix B: the basic 500-word list, without additions
  • Appendix C: the working list of additions, as an alphabetical list

Click here to request a copy of Geirfan. Details on how to cite Geirfan is available here. Details of a publication relevant to the creation of Geirfan can be found here.  

https://geirfan.cymru/

The website Geirfan was developed in parallel with our frequency lists, to demonstrate the data’s potential in creating teaching materials. Geirfan’s initial 500-word target is based on our wordlist derived from CorCenCC, refined and augmented by the input of Welsh-language tutors and other language experts. The illustrative examples provided in Geirfan’s dictionary entries are also derived from CorCenCC, and this project’s frequency data is used to ensure the automatic selection of illustrative sentences featuring high-frequency vocabulary wherever possible. The frequency data also contributes to identifying the collocates, phrases, and idioms listed in Geirfan’s entries, so that the dictionary’s users can be given useful information about the linguistic forms they are most likely to encounter in using Welsh day-to-day.

Back to top

Access word frequency lists

A range of word frequency lists from the CorCenCC corpus (Yr Amliadur) are available here. These include:

  • Top 100 words in CorCenCC (rank ordered list)
  • Top 1000 words in CorCenCC (ordered alphabetically)
  • Top 100 lemmas in CorCenCC (rank ordered list)
  • Top 1000 lemmas in CorCenCC (ordered alphabetically)
  • Top 100 lemmas in CorCenCC (open-class words only)
  • Top 1000 words in CorCenCC (open-class words only; ordered alphabetically)
  • Top 500 nouns in CorCenCC (rank ordered list)
  • Top 500 verbs in CorCenCC (rank ordered list)
  • Top 500 adjectives in CorCenCC (rank ordered list)
  • Top 50 adverbs in CorCenCC (rank ordered list)
  • Top 50 interjections in CorCenCC (rank ordered list)
  • Top 100 open-class words in the written component of CorCenCC (rank ordered list)
  • Top 100 open-class words in the spoken component of CorCenCC (rank ordered list)
  • Top 100 open-class words in the e-language component of CorCenCC (rank ordered list)

Click here to request a copy of the full frequency lists. These frequency lists include those listed above, in addition to the following:

  • All frequency data, sorted alphabetically (excel file)
  • All frequency data, in frequency order (excel file)
  • The most-frequent 5000 words, with separate sheets for each 500-word frequency band (excel file)

These word frequency lists inform us which words and lemmas are most often used in the Welsh language (generally and within/across specific modes of communication). Details on how to cite these lists are available here.
Back to top