International Corpus Linguistics Research Unit (ICLRU)

The International Corpus Linguistics Research Unit, with its focus on the empirical investigation of linguistic data collected in naturalistic situations in the field, supports researchers working on a number of languages in a range of domains.

Corpus material gathered by researchers attached to the Unit includes:

  1. The Beeching Corpus of Spoken French [K1] (155.00 words).
  2. Parallel (translation) corpora [K2] (zip file), including aligned translations in English and French, German, Spanish and Vietnamese in specialist areas such as Accounting, Cancer and Oncology, Legal, Official documentation, Physiotherapy, Renewable Energy.
  3. Small Corpora of Mosetén and Pirahã (spoken in the Bolivian Andes and Northwestern Brazil, respectively). Contact:
  4. Corpus of British and American song lyrics (500,000 words). Contact:
  5. Corpus of British and American Political Speeches (1 million words). Contact:
  6. The Bristol Corpus of Role-play dialogues (36,000 words).
  7. The Bristol Corpus of Learner Language, co-ordinated by Jeanine Treffers-Daller, now Director of the Centre for Literacy and Multilingualism, University of Reading. Email
  8. The John Turlik Corpus of Learner Language - Introduction to data and John Turlik Corpus (zip file)

Much of the work of researchers allied to the Unit is devoted to the study of language variation and change, language acquisition, intercultural communication, metaphor, stylometry and translation.

Back to top