The goal of the project presented on these pages is to develop the Russian Reference Corpus, amounting to approximately 100 million words and covering the wide range of text types and registers in modern Russian. Basically, this is an attempt to create the Russian equivalent of the British National Corpus. The corpus format is based on the XMLized TEI scheme and recommendations from EAGLES.
The typology for encoding text types is also based on the EAGLES guidelines. A short description of the project aims and methods is available from:
Methods and tools for development of the Russian Reference Corpus. To appear in D. Archer, A. Wilson, P. Rayson (eds.) Corpus Linguistics Around the World. Amsterdam: Rodopi. In PDF
The English version of the site is under development.
Access to the pilot version of the corpus is possible from http://ruscorpora.ru (Russian only, powered by Yandex) and from the corpus collection page (powered by Leeds CQP), the latter page includes not only the Reference Corpus, but also fiction, newspapers and a large Russian Internet corpus. You can also check the frequency list of modern Russian.
Have a look at my corpus linguistics page to see the description of current activities in corpus linguistics within the Centre for Translation Studies.