Tuesday, 13 December 2016
The Common Ground Algorithm - A Possible Remedy for Filter Bubbles
Saturday, 27 February 2016
FizzBuzz
def FizzBuzz():
for i in xrange(1,100):
word=''.join(('Fizz' if i%3==0 else ''),
('Buzz' if i%5==0 else ''))
print i if word=='' else word
Thursday, 22 October 2015
Integrating Java with Python the Easy Way
Tuesday, 9 June 2015
Emily Has Moved
Tuesday, 26 May 2015
Introducing Emily - my latest Fantastical Device
Emily is a semantic recommendation system for blogs that I've been working on. If you give it an Atom or RSS feed from a blog, it will create a feed of items from other blogs that hopefully match your interests.
It does this by using significant associations between words to infer your interests. Suppose a randomly-chosen sentence from your blog has a probability P(A) of containing word A, and a probability P(B) of containing word B. If there were no relationship between the words, we would expect the probability of a sentence containing both words to be P(AB)=P(A)P(B). If there is significant information contained in the relationship between the words, they will cooccur more frequently than this, and we can quantify this with an entropy, H=log2 P(AB) - log2 P(A) - log2 P(B)
Emily uses the strengths of these associations to calculate the similarity between two blogs. Then, if you post an article that makes your blog more similar to somebody else's blog than it was before, that article is recommended to them.
This has been an interesting project for me. I've learned about Google App Engine, pubsubhubbub and Atom. What I need now is for people to try it out. I'm looking forward to when Emily starts finding things for me.
Tuesday, 17 June 2014
NoSQL for Conlangers
In his blog, fellow-conlanger +Wm Annis writes that the best database format for dictionaries is text.
All his points are valid, but at one point he says The standard is SQL, and that got me thinking. I've done a fair bit of work with SQL, and can do scary things with it, but I wouldn't choose to use it. It's inflexible and clunky. You have to decide your schema in advance, and if your requirements change at a later date, you have no choice but to rebuild entire tables. Anything more complex than a simple one-to-one relationship requires a second table and a join. SQL basically expects you to fit your data to the model, and what you need is to fit the model to your data. Using an ORM like SQLAlchemy doesn't help - it's just a layer of abstraction on top of an inherently clunky system.
For a good dictionary system, you need the flexibility of a NoSQL database. One popular system, that I've done a lot of work with, is MongoDB. This stores documents in JSON format, so a dictionary entry might look like this
{"word":"kitab",
"definitions":[{"pos":"noun",
"definition":"book"]},
"inflections":{"plural":{"nominative":"kutuub"}},
"related":["muktib","kataaba"]}
If a field exists for some words but not others, you only need to put it in the relevant entries. If a field is variable length, you can store it in an array. One slight disadvantage is that cross-referencing between entries can be a little tricky.
Another possibility is ZODB. This is an object persistance system for Python objects. In many ways it's similar to MongoDB, but there's one important difference. If a member of a stored object is itself an object that inherits from persistant, what is stored in the parent object is a reference to that object. Cross-referencing is therefore completely transparent. The only small disadvantage is that it's Python-specific, but unless you really need to write your dictionary software in a different language, that shouldn't be a big problem.
You might also want to consider a graph database like Neo4j. This stores data as a network of nodes and edges, like this
kitab-[:MEANS]->book
kitab-[:PLURAL]->kutuub-[:MEANS]->books
In theory, this is the most flexible form of database. I wouldn't say it was easy to learn or use, though.
There are plenty of other NOSQL databases, these are just the ones I'd use, but I think they're all more suitable for dictionary software than SQL. But do make sure you have a human-readable backup.