GitHub - davstott/davsFilters

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README		README
charCountGraph.py		charCountGraph.py
charFreqCount.py		charFreqCount.py
charFreqCreate.py		charFreqCreate.py
dav.py		dav.py
dav2.py		dav2.py
dav3.py		dav3.py
day1.html		day1.html
englishFreqs		englishFreqs
map2.gif		map2.gif
map3.gif		map3.gif
merge.py		merge.py
ngram.py		ngram.py
ngramDump.txt		ngramDump.txt
ngramGenerate.py		ngramGenerate.py
ngramLoad.py		ngramLoad.py
ngramQuery.py		ngramQuery.py
otherFreqs		otherFreqs
pickledCharFreqs		pickledCharFreqs

Repository files navigation

Dav's initial wanderings after the Intro to AI course, finally put into a repos to keep things tidy and sensible. Expect incomplete randomness.

It feels possible to use less ram with the merge sort

The ngram code is starting to come together a little bit, although I've only trained it with one of Google's files so far. Using 2-grams doesn't give much contrast between English and French, but limiting it to 3 and 4 does better

echo "the queen sometimes likes pie" | python ngramQuery.py
echo "en mon vacance, je m'apelle Dav" | python ngramQuery.py


charCountGraph.py filename1 filename2 ... filenameN
  uses matplotlib to overlay line graphs of the character frequencies in a list of filenames or a single standard input.  Useful to compare between languages.

charFreqCreate.py
  reads the data files englishFreqs and otherFreqs and creates a python structure containing letter frequencies by language, stored in the data file pickledCharFreqs

charFreqCount.py filename1 filename2
  using the letter frequency counts stored in pickledCharFreqs, reads all the data from all the filenames specified and returns a relative probability score for each language. 
  The scores are not in any particular units (well, technically natural log postieror probabilities) so just use the numbers to sort the language names, but a higher number means a greater likelihood that the provided text matches any particular language.