CEFR Rating
When importing a new resource, we look up the word frequency and rank of each of them. For example, the word
the
in English has a frequency of 0.0537. This means, that about 5% of all words in an average text are the
!.
This also makes the
the number one English word (rank 1). From this rank, we estimate a CEFR level for each word.
We start at A1 with really common words and then we put words in higher categories (A2, B1, etc.) the lower their
frequency gets. So for example, words like comprise
or assess
land in the C2 level. From this, we can compute a frequency distribution. This tells us, how the text is made up in terms of CEFR words.
We then sprinkle some AI into the mix that gives us back the overall CEFR rating of the text, which also considers the grammatical structures used.