I have developed the translation memory software a little further as part of my
TaklowKernewek tools.
It now has a GUI:
data:image/s3,"s3://crabby-images/0a1f8/0a1f8306d76c54a236062ad2609964ffeb29e498" alt="" |
Using only bigrams and trigrams from the corpus that contain at least one non stopword (based on NLTK stopwords corpus). |
data:image/s3,"s3://crabby-images/ae033/ae03364ad662e975c62ea54f9a4ef6a0c83cbcd5" alt="" |
Showing all bigrams and trigrams outputs a long list of sentences containing ('is', 'the'). |
data:image/s3,"s3://crabby-images/55d5b/55d5b994285495a96571c9a787117d4e3c7cd8b7" alt="" |
Sentences in the corpus that contain multiple trigrams in common with the input are ranked highest, and similarly with bigrams. |
After improvement to the text wrapping of the output sentences to split longer lines:
No comments:
Post a Comment