- •Why can’t we say that English has the biggest vocabulary?
- •Why can’t we count words in the language?
- •Problem with morphemes
- •Problems that lexicographers face compiling a dictionary
- •Corpora
- •Representative and well-balanced collections of texts.
- •Additional information on the properties of texts
- •History of British lexicography
- •Electronic dictionaries
- •Classification of dictionaries
- •Object of description
- •Hierarchical vs. Non-hierarchical relationships within the lexicon
- •Terminology of lexicology
- •Anglo-Saxon and Celtic part of the English wordstock
- •Peculiarities of Latin and Greek borrowings
- •Stratification of the English vocabulary
- •How do words change their meanings?
- •Lexicology vs Lexicography
- •‘A dictionary’ and other related terms
- •The organisation of a dictionary entry
- •History of lexicography
- •History of American and Russian lexicography
Representative and well-balanced collections of texts.
Contains all the types of written and oral texts present in the language (various genres of fiction, journalistic, academic, and business, as well as dialectal and sociolectal, texts).
Additional information on the properties of texts
This is achieved by means of annotation.
Main users: linguistis and non-linguists (researchers of literature, history and other humanitarian subjects, teachers, journalists, writers and others interested in it).
The Oxford English Corpus
The corpus contains over 2.5 billion words of real 21st-century English; this is the largest lexical corpus in the world.
People can track and record the very latest developments in language today.
It represents all types of English, from literary novels and specialist journals to everyday newspapers and magazines, and even the language of blogs, emails, and Internet message boards.
And contains language from all parts of the world – from Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore, and South Africa.
The British National Corpus
100 million word collection of samples of written and spoken language.
The written part of the BNC (90%) includes extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text.
The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations (recorded by volunteers selected from different age, region and social classes in a demographically balanced way) and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins.
The Corpus of Contemporary American English
The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English.
The corpus contains more than 560 million words of text (20 million words each year 1990-2017) and it is equally divided (50% and 50%) among spoken, fiction, popular magazines, newspapers, and academic texts.
National Corpus of Russian
The National Corpus of Russian includes primarily prosaic original texts representing the Russian literary language, but also translated works (in parallel with the original), poetic texts, as well as texts, noniterative forms of modern Russian: spoken (spoken, public and non-public), dialect.
The main corpus of texts representing the Russian literary language can be divided into two main arrays, which have their own features: modern written texts (mid-20th to early 21st century) and early texts (mid-18th to mid-20th century).
Modern writing:
modern drama, memoirs and biographical literature, journalistic journalism and literary criticism, newspaper journalism and news, scientific, popular and educational text, Religious and Philosophical Texts, engineering texts, official business and legal texts, Everyday texts (including texts not intended for publication: personal correspondence, diaries, etc.)
