Programs: Natural Language Processing
Natural language processing is an area in which quite a lot depends on what language you are analysing; because of this, some of my algorithms may be better suited for Polish texts (although I'm not sure about this). Anyway, that shouldn't stop you from trying ;) Be careful about the encoding - the programs use either UTF-8, ISO8859-2, or the default encoding of your system.
- Lab 1: text statistics - counting the number of occurences of words in text
- stat3.c (6 KB) - C version
- stat.phps (481 B) - PHP version
- stat.pl (407 B) - Perl version
- stat.py (499 B) - Python version
- TextStatistics.java (1.7 KB) - Java version
- pjn-lab1.zip (5.7 KB) - all files
- Lab 4: recognising text's language using N-grams and generating random texts
- pjn-lab4.zip (7.6 KB) - C#/Mono
- Lab 5: spell checker using different algorithms (Levenshtein, N-gram, Soundex and Metaphone)
- pjn-lab5.zip (667 KB) - Java version
- Everything: nlp-all-en.zip (676.9 KB)