Monday 29 August 2016

Translation memory in Cornish - effect of using all bigrams and trigrams

Here is the effect of using the two different options in kovtreylyansGUI.py whether to match all bigrams and trigrams between the input text and the corpus bilingual sentences, or only those that have at least one word that is not a stopword (the Python NLTK stopwords corpus (item 67 in NLTK data), which consists of a list of common words that are not likely to have much semantic content).

Input sentence is "There is a red sun on Proxima b"

Full output - the non-stopwords option


trigrams for input sentence are:
[('there', 'is', 'a'), ('is', 'a', 'red'), ('a', 'red', 'sun'), ('red', 'sun', 'on'), ('sun', 'on', 'proxima'), ('on', 'proxima', 'b'), ('proxima', 'b', '.')]

bigrams for input sentence are:
[('there', 'is'), ('is', 'a'), ('a', 'red'), ('red', 'sun'), ('sun', 'on'), ('on', 'proxima'), ('proxima', 'b'), ('b', '.')]

Listing N-grams with a minimum of 1 non-stopword each:
Common trigrams:

Common bigrams:
Eus pluvenn rudh genes, mar pleg?     --  Have you a red pen, please?        

(a red)

Ass yns teg, pennow an menydhyow y'n  --  How beautiful the tops of the      
howlsedhes, an howl rudh a-ughta.     --  mountains are in the sunset, the red
                                      --  sun above them.                    

(red sun)

Additional output using all option


Other N grams containing only stopwords:
Common trigrams:
Mes yma karr a-rag an chi.            --  But there is a car in front of the
                                      --  house.                            

(there is a)

Sur, yma lyver war an voes.           --  Certainly there is a book on the  
                                      --  table.                            

(there is a)

Yma avon vras ha hir yn Almayn.       --  There is a large, long river in   
                                      --  Germany.                          

(there is a)

Yma toll down yn kres an fordh.       --  There is a deep hole in the centre
                                      --  of the road.                      

(there is a)

Yma kador gesys y'n hel.              --  There is a chair left in the hall.

(there is a)

Yma fordh pur gul ryb an pras.        --  There is a very narrow road beside
                                      --  the field.                        

(there is a)

Usi! Hag yma lost hir ryb an hel      --  Yes! And there is a long queue    
ynwedh.                               --  beside the hall too.              

(there is a)

Yma unn chi a-dryv an sinema; chi Mr  --  There is a certain house behind the
Pollglas yw ev.                       --  cinema; it's Mr Pollglas's house. 

(there is a)

Kemmer an kowl ma, yma bollas ragos.  --  Take this soup, there is a bowlful
                                      --  for you.                          

(there is a)

Yma bownder verr ha kul ynter an      --  There is a short, narrow lane     
dhew bras vras.                       --  between the two big fields.       

(there is a)

Yma krys ow kregi war benn an gweli.  --  There is a shirt hanging on the end
                                      --  of the bed.                       

(there is a)

Yma ki owth hartha a-ves.             --  There is a dog barking outside.   

(there is a)

Yma nown dhe'n vebyon. An vamm a      --  The boys are hungry. Mother will  
vynn ri nebes boes dhedha. Mes eus    --  give them some food. But is there 
boes lowr y'n yeynell? Eus. Yma meur  --  enough food in the refrigerator?  
a vara hag amanenn gesys hwath        --  Yes! There is a lot of bread and  
ynwedh.                               --  butter still left as well.        

(there is a)

Yma tren skav dhe dhiw eur marnas     --  There is a fast train at          
teyr mynysenn warn ugens.             --  twenty-three minutes to two.      

(there is a)


Common bigrams:
Yma mebyl gesys ena y'n chi:          --  There is furniture left in the    
kador-vregh, kador, gweder ha         --  house: an arm-chair, chair, mirror
lestrier mes nyns eus moes ena.       --  and dresser but there isn't a table
                                      --  there.                            

(there is), (there is)

Hemm yw pluvenn.                  --  This is a pen.                      (is a)
Henn yw chi.                      --  That is a house.                    (is a)
An drehevyans na yw eglos.        --  That building is a church.          (is a)
Henn yw kenter.                   --  That is a nail.                     (is a)
An drehevyans a-rag an chi yw         --  The building in front of the house
karrji.                               --  is a garage.                      

(is a)

An dra na yw pluvenn.             --  That object is a pen.               (is a)
Honn yw Kernewes.                 --  That is a Cornish woman.            (is a)
Hemm yw aval hweg.                --  This is a sweet apple.              (is a)
An avon Tamer yw avon vras, down.     --  The River Tamar is a large, deep  
                                      --  river                             

(is a)

Hemm yw koes bras.                --  This is a big wood.                 (is a)
Hel an Dre yw drehevyans teg.         --  The town hall is a fine building. 

(is a)

An gour yw den da, dell dybav.        --  The husband is a good person, I   
                                      --  think.                            

(is a)

Gwydhelek yw yeth keltek          --  Irish is a Celtic language.         (is a)
Lyver berr yw lyver da, dell dybav.   --  A short book is a good book, I    
                                      --  think.                            

(is a)

An chi a-ji dhe'n koes yw chi         --  The house in the wood is a new    
nowydh.                               --  house.                            

(is a)

Gwreg an gour ma yw Kernewes dha.     --  This man's wife is a good         
                                      --  Cornishwoman.                     

(is a)

Kres an koes yw le kosel.             --  The middle of the wood is a quiet 
                                      --  place.                            

(is a)

Noy Mr Turner yw maw bras.            --  Mr Turner's nephew is a big boy.  

(is a)

Pow Frynk yw pow pur vras ha pur      --  France is a very big and beautiful
deg.                                  --  country.                          

(is a)

Onan yw brithel.                  --  One is a mackerel.                  (is a)
Hemm yw kerdh hir mes brav yw         --  This is a long walk but it's grand.

(is a)

Gour kloppek yw ev.               --  He is a lame man.                   (is a)
An Gresenn Gernewek yw le da ha dhe   --  The Cornish Centre is a good place
les yw hi rag tus Kernow.             --  and it is useful for the people of
                                      --  Cornwall.                         

(is a)

Yma dew dhen ha dew ugens y'n         --  There are forty-two people in the 
kuntelles. Hemm yw niver da rag       --  meeting. This is a good number for a
kuntelles a'n par ma.                 --  meeting of this kind.             

(is a)

Broder Androw yw pronter yn unn       --  Andrew's brother is a vicar in a  
eglos.                                --  certain church.                   

(is a)

Agan kesva yw onan dha.           --  Our association is a good one.      (is a)
Dyskador yw, dell glewav.         --  He is a teacher, I hear.            (is a)
Ammeth yw tra vras yn Kernow.         --  Agriculture is a big affair in    
                                      --  Cornwall.                         

(is a)

Pow pell yw Ejyp ha tir bras yw       --  Egypt is a distant country and it is
ynwedh.                               --  large also.                       

(is a)

Nag eus! Nyns eus karrji ena.         --  No! There isn't a garage there.   

(there is)

Eus, sur!                       --  There is, certainly!              (there is)
Nag eus!                        --  There isn't!                      (there is)
Yma nebonan a-ji dhe'n eglos na.      --  There is someone inside that church.

(there is)

Yma neppyth a-ragh an chi ma.         --  There is something in front of this
                                      --  house.                            

(there is)

Nyns eus kenter omma, dell hevel.     --  There is no nail here, it seems.  

(there is)

Eus! Yma an eglos ryb hel an dre      --  There is! The church is beside the
                                      --  town hall.                        

(there is)

Eus, dell hevel.                --  There is, it seems.               (there is)
Nag eus! Nyns eus tra omma.           --  There isn't. There's nothing here.

(there is)

Eus gwin gesys? Eus! Yma gwin y'n     --  Is there (any) wine left? There is!
gegin.                                --  There's wine in the kitchen.      

(there is)

Yma nebonan y'n gegin lemmyn.         --  There is someone in the kitchen now.

(there is)

Eus. Hi a wra glaw lemmyn.      --  There is. It's raining now.       (there is)
Piw yw Mr Lock ytho? Ottena! An gour  --  Who is Mr Lock then? Look! that man
ena yw Mr Lock, an gour hir na.       --  there is Mr. Lock, that tall man. 

(there is)

Nyns eus le gesys yn kres an dre      --  There isn't a place left in the town
lemmyn.                               --  centre now.                       

(there is)

Nyns eus nebonan gesys yn hel an      --  There is no-one left in the town  
dre.                                  --  hall.                             

(there is)

Nyns eus arghans lowr rag boes.       --  There is not enough money for food.

(there is)

Yma unn karr a-rag an chi.            --  There is one car in front of the  
                                      --  house.                            

(there is)

Yma unn garrek pur vras ena.          --  There is one very large rock there.

(there is)

Eus! Yma boes war blat y'n gegin      --  Yes! There is food on a plate in the
                                      --  kitchen.                          

(there is)

Nyns eus kyttrin dhe Druru kyns       --  There is no bus to Truro before four
peder eur.                            --  o'clock.                          

(there is)

Yma po korev po gwin gans an goen.    --  There is beer or wine with the    
                                      --  dinner.                           

(there is)

Yma spas lowr y'n le na lemmyn.       --  There is enough room in that place
                                      --  now.                              

(there is)

Y'n hel (yma onan).             --  In the hall (there is one).       (there is)
Eus! yma, dell hevel.           --  Yes! There is, it seems.          (there is)
An vamm re worras an kinyow war an    --  Mother has put the dinner on the  
voes lemmyn. Kynsa yma kowl onyon.    --  table now. First there is onion   
                                      --  soup.                             

(there is)

Eus nebes koffi gesys ragov? Nag      --  Is there a little coffee left for 
eus. Yma te hepken.                   --  me? No. There is tea only.        

(there is)

Re dhiwedhes os, ow howeth, ha nyns   --  You are too late, my friend, and  
eus hanafas a goffi ragos.            --  there is no cup of coffee for you.

(there is)

Nyns eus karr ow tos.           --  There is no car coming.           (there is)
Nyns eus nebonan ow kelwel.     --  There is no one calling.          (there is)
Eus!                            --  There is!                         (there is)
Nag eus!                        --  There is not!                     (there is)
Joy re skrifas dhodho mes nyns eus    --  Joy has written to him but there is
gorthyp hwath.                        --  no reply yet.                     

(there is)

Ottena an kok mes nyns eus den ynno.  --  There is the fishing boat but     
                                      --  there's no one in it.             

(there is)

Nyns eus karr vyth y'n fordh.         --  There is no car at all on the road.

(there is)

Nyns eus gesys kestenenn y'n koes,    --  There isn't a chestnut tree left in
dell dybav.                           --  the wood, I think.                

(there is)

Prag y tybydh yndella? Drefenn nag    --  Why do you think so? Because there
eus ken fordh dhe dybi.               --  is no other way to think.         

(there is)


It is clear there are a large number of matches to "there is a" and its substrings "there is" and "is a". For a long input text, the number of matches of this kind can become very large.

Notice that for the sentence
Yma mebyl gesys ena y'n chi: -- There is furniture left in the
kador-vregh, kador, gweder ha -- house: an arm-chair, chair, mirror
lestrier mes nyns eus moes ena. -- and dresser but there isn't a table
-- there.

(there is), (there is)

there is a match to the bigram "there is" twice, since "there isn't" in the English sentence was tokenized to "there is n't" since NLTK is aware that isn't is grammatically speaking, two words.

No comments:

Post a Comment