Forty years of working with corpora: from Ibsen to Twitter, and beyond

How to Cite

Hofland, Knut, Paul Meurer, and Andrew Salway. 2013. “Forty Years of Working With Corpora: From Ibsen to Twitter, and Beyond”. Bergen Language and Linguistics Studies 3 (1).


We provide an overview of forty years of work with language corpora by the research group that started in 1972 as the Norwegian Computing Centre for the Humanities. A brief history highlights major corpora and tools that have been developed in numerous collaborations, including corpora of literature, dialect recordings, learner language, parallel texts, newspaper articles, blog posts and tweets. Current activities are also described, with a focus on corpus analysis tools, treebanks and social media analysis.

Keywords: corpus building; corpus analysis tools; treebanks; social media analysis

Copyright (c) 2013 Knut Hofland, Paul Meurer, Andrew Salway

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Bergen Open Access Publishing