Friday, 21 June 2019

Cornish corpus word clouds with the assistance of the Institute of the Czech National Corpus

Intro

Kwords is an online tool available at kwords.korpus.cz from the Institute of the Czech National Corpus. It is optimised for Czech or English but it is possible to use for other languages by uploading your own reference text.
I have fed it with the traditional Cornish texts (originally prepared in digital form by Keith Syed of KDL), which I have stripped of all except the Cornish text itself. They are at my bitbucket repository.
My method has been to use Origo Mundi as a reference text and compare everything else to it. Normally you would use a much more extensive corpus but this doesn't really exist for Cornish. There is an option to exclude certain non-content words as pronouns, prepositions, conjuctions and numbers but this is only available in Czech and English.
The word clouds below are the keywords that the software has detected in each text, by comparison of the frequency of words within the text, to that of the words within the reference corpus (i.e. in this case, Origo Mundi).

Passyon agan Arloedh

Kommolenn Ger-alhwedh - Keyword cloud

drehevyskallawuthylhanethapertherodesfolbileniskilagalilatrygystewlysbyttegynskentrowmesterglywaskevysbarthkemmeraslaghysdyghtysgowsasyesuwodhyapaynysjustismariayntredhayndellmaylliyudasgrowsgewsisedhewonpilatyesuskristpederleverisethesavonesskrifysladerworthybisodywvynnaensbedhesveukemmyssynsikrowsvynnasjowlfeubennliesgorryswelasieurmaythkorfnynsythvoshidrepanorthdhedhaymabmeurdhevunnganshagdellpurnnymarnaragmayhayndhewarollywbysow

Bewnans Meryasek

Kommolenn Ger-alhwedh - Keyword cloud

nessaglywisongrassyesKernowsyghtVeryasekplesyesTewdardaderBretengwylvosemperouryurldhragonSylvesterWolkommgrassyesturantwordhiderMariaMeryasekYesudhyarlydhipysibeuKristdisongenowghsynsysjowlgalloesekDheommalelliesbynythayredihwipowwirpurythsursertandenvymeurnivynnmaywnydhymmoeusmardhynlemmynydhenadhbosgansnynowdhymmhebevthhabysArloedhdellvydhtydharagolldhiswarhagDyw

Gwreans an Bys

Kommolenn Ger-alhwedh - Keyword cloud

wondryspryvkevyskynthordyrbythkwethedrekkryswordhiHemmapaynysLusiferserpontdrefennDerHennaderorwtradhywelesomaythragdhoHwidoutwelprysKaymmeslelRakhennavrasHaommajyhennapuroovpubAdamgwrysdhymmovyyweustyvynnnyvydhdhymmdhemarbysnnamayhaghaowgansevTaswrathynhebdhisollDywdharag

Passio Christ

Kommolenn Ger-alhwedh - Keyword cloud

VabAnnastothgrowsloselLeverewghNazareHennatreysPederIouynCrystMyghternThasjustisYedhewonIhesuworthGodhedhywdhykoedhMablowenagwasmaravosHahwimeurdensertannebdhodhoydadredhevmarnypanlemmynywgansbosynwarragnnamahagnidellArloedhwrabysdheolldhisowthmayDywdhahahebvydh

Resurrectio Domini

Kommolenn Ger-alhwedh - Keyword cloud

VernonaprysongwelasdasserghikrysdhasserghiPilatMyghternMariadasserghysThomasCrystIhesubedhhedhywsevvonesdhygowdheuthMabkorfveujoynevvosmarowythmeurhwisurevArloedhnypurmardreniydhymmonanlemmynywtyyndelldhegansragbysowmahavydhthhaghebolldha

Pregothow Tregear

Kommolenn Ger-alhwedh - Keyword cloud

IreneusPERkontrariglorimenyakolonnowYndellmaAwgustinusyaIDENIMagesskrifapartdiskwedhysfolyaynnonmatervnderstondingiibeghosowYnwedhauncientsakramentepystylKatholikQUIAHOCMEhollAUTEMjerytapowerADesonlyverapostlysdyskansQUODMatthewVTsuffrainsubstansYowanndheragkaraDEIstatnaturverinenaderSEDCORPUSyndellmawarbynndhiapanaonlyRomEglosQUIDEerellawtoritaPowlkatholikSkryptorNONchaptraPedergeryowESTINSavyourETeglosfatellKristwrellaworthresonbaraSanshennanodhoYesusskrifystebelhonanusiwrugpogoescherytarannYesuesavosveuaralllieskynsaSpyryskorflelkowsagandhiworthythkighemmaHarinynsoniyllymadreevmaagahennaynwedhonanreshwihaywdhennebdenybosnagansmarynowragollhagdhpaneuswranybys

Tolkein

Kommolenn Ger-alhwedh - Keyword cloud

fydhavgeverdybavtybigilanowodhowskilaallasButterburlammaswydhhwedhelfenesterhedhisomglyweslieskweythwrussworefanomglywasyowynkrybdreylyaspalsgeryowdisliwesensavonpellderesedhasskeusvoesprederusowrEndBriBagbreowglywasElfowhobytlentgoedhasstevelldhosKoesdiwetthaayrgriasyeynGoldberriniwlWeltiwedhwovynnasmartesenerellvrekoselvirassevisvythdrefennhynsMerrikeverPypynShayrdiworthSamBylboTomMesFrodonebesesapoleverisEnaYthmesneppythdheuthbeukleromarnaskothowthEvnansdarasgasagolowfordhmiresethhowldosarallythsevelortovoshwirevIhonanimoyhirwosaorthveunynsvyagaHahidresgulhagmosdreNynsesbosleYgansYmahayndhekynsunnresyenadhndamartaowpanYnnanymanipurdhymmragbys

Skeul an Yeth 1

Kommolenn Ger-alhwedh - Keyword cloud

PamskavesovskolgothlestrierhwegynnowkuntellesredyagathPedersalowmartesenpoyntesensbravDyganasglewavtraowmebylstampyeynfeusikmynysennesonspronterfolennyethrudhnessawodhestaFatlahwedhelgwerthysYowannNosporangadoramarikarrjikadorkonvedhesEsaKernewekavonJoripeswarkeurnebonanNebeserellkernewekOttommamawMariastevellgourrybkiUsiHemmOttenaMrgegingewerlowarthFatellvoeskarregloslyverdybavYwplegEusNagPleDaPyugensostelesengesysPiwYthesedhaRokothYmaanwoeshelPythnowydhpraswarnynshiesachidarasmeslowrNynsohevelnynsythoHiNieurarghansywgwinhemmaKemmerenamahirgenesdalemmynNymardelleusevnawarngansydheowynhadha

Solempnyta

Kommolenn Ger-alhwedh - Keyword cloud

GovvythSowsMesPowyethShakespeareHenrySowsnektavesnebesponynsoleommaNynsvymaywragnyyndhehaow
Screenshot of the full output for Bewnans Meryasek

 

Applications

There are a number of applications of this kind of analysis illustrated in the talk slides here at Workshop on Quantitative Text Analysis for the Humanities and Social Sciences in April 2016 at Brown University.
Comparison of annual addresses by Gustav Husák against a reference of a current corpus to using a communist newspaper. slides Václav Cvrček & Masako Fidler

The future plans = something that everyone "wants" rather than something we "will do". slides Václav Cvrček & Masako Fidler

slides Václav Cvrček


We can see that choosing a different reference corpus leads to different keywords being noted by the software as important.
With Cornish we only have a small corpus available so our future plans are that in the next five-year plan we increase the output of Cornish.

Gwren ni ynkressya agan eskorrans geryow Kernewek! Yn-rag kowetha yn unnveredh kuntellek! Gyllyn!