The Bangor Autoglosser allows CHAT files to be glossed (POS-tagged) automatically in Welsh, Spanish and English.
The code (licensed under the GPL v3) is available on GitHub.
A new version, Autoglosser2, is now available, focussed on written Welsh. This version has tidier code, and is a lot faster (22,000 glosses/minute) - see the manual.
For publications about the autoglosser, see the publications page.
The databundle referred to in the ISB8 presentation is available here.
bilingualism@bangor.ac.uk
The Siarad corpus
The Patagonia corpus
The Miami corpus
The support of the Arts and Humanities Research Council (AHRC), the Economic and Social Research Council (ESRC), the Higher Education Funding Council for Wales (HEFCW) and the Welsh Government is gratefully acknowledged.