Tuesday, 13 December 2016
The Common Ground Algorithm - A Possible Remedy for Filter Bubbles
Saturday, 27 February 2016
FizzBuzz
def FizzBuzz():
for i in xrange(1,100):
word=''.join(('Fizz' if i%3==0 else ''),
('Buzz' if i%5==0 else ''))
print i if word=='' else word
Thursday, 22 October 2015
Integrating Java with Python the Easy Way
Tuesday, 9 June 2015
Emily Has Moved
Thursday, 4 June 2015
Developing Emily - Revision 24: Porting to OpenShift. AppEngine wasn't suitable for the computationally intense
Modify /trunk/Emily.py
Modify /trunk/EmilyBlogModel.py
Modify /trunk/EmilyTreeNode.py
Modify /trunk/emily.js
Porting to OpenShift. AppEngine wasn't suitable for the computationally intense parts of Emily.
from Subversion commits to project emily-found-a-thing on Google Code http://ift.tt/1G9GWoV
via IFTTT
Thursday, 21 May 2015
Developing Emily - Revision 23: Ready to launch
Modify /trunk/Emily.py
Modify /trunk/EmilyBlogModel.py
Modify /trunk/EmilyTreeNode.py
Add /trunk/emily.js
Ready to launch
from Subversion commits to project emily-found-a-thing on Google Code http://ift.tt/1IN7SNv
via IFTTT
Tuesday, 17 June 2014
NoSQL for Conlangers
In his blog, fellow-conlanger +Wm Annis writes that the best database format for dictionaries is text.
All his points are valid, but at one point he says The standard is SQL, and that got me thinking. I've done a fair bit of work with SQL, and can do scary things with it, but I wouldn't choose to use it. It's inflexible and clunky. You have to decide your schema in advance, and if your requirements change at a later date, you have no choice but to rebuild entire tables. Anything more complex than a simple one-to-one relationship requires a second table and a join. SQL basically expects you to fit your data to the model, and what you need is to fit the model to your data. Using an ORM like SQLAlchemy doesn't help - it's just a layer of abstraction on top of an inherently clunky system.
For a good dictionary system, you need the flexibility of a NoSQL database. One popular system, that I've done a lot of work with, is MongoDB. This stores documents in JSON format, so a dictionary entry might look like this
{"word":"kitab",
"definitions":[{"pos":"noun",
"definition":"book"]},
"inflections":{"plural":{"nominative":"kutuub"}},
"related":["muktib","kataaba"]}
If a field exists for some words but not others, you only need to put it in the relevant entries. If a field is variable length, you can store it in an array. One slight disadvantage is that cross-referencing between entries can be a little tricky.
Another possibility is ZODB. This is an object persistance system for Python objects. In many ways it's similar to MongoDB, but there's one important difference. If a member of a stored object is itself an object that inherits from persistant, what is stored in the parent object is a reference to that object. Cross-referencing is therefore completely transparent. The only small disadvantage is that it's Python-specific, but unless you really need to write your dictionary software in a different language, that shouldn't be a big problem.
You might also want to consider a graph database like Neo4j. This stores data as a network of nodes and edges, like this
kitab-[:MEANS]->book
kitab-[:PLURAL]->kutuub-[:MEANS]->books
In theory, this is the most flexible form of database. I wouldn't say it was easy to learn or use, though.
There are plenty of other NOSQL databases, these are just the ones I'd use, but I think they're all more suitable for dictionary software than SQL. But do make sure you have a human-readable backup.
Saturday, 26 April 2014
Experimenting with IFTTT
I've just started trying out IFTTT. Partly this is because the Feedburner feed for this blog has needed manually prompting to update my Twitter feed, but also because I'm investigating using it to post automatically to Blogger on behalf of my friends at Speculative Grammarian.
To do this, I'm using a feed from one of my Google Code projects. It's a semantic recommendation system I've been working on. I call it Emily, because it finds things (or at least, it will do when it's up and running). Code updates from the project should be appearing here.
Wednesday, 5 March 2014
One of my Fantastical Devices is on PyPI
sudo pip install Markov, and try it out. If you feel you can help me improve it, contact me and I can add you to the Google Code project.
Monday, 24 June 2013
A Couple of my Fantastical Devices
Saturday, 9 February 2013
Custom Sorting For Conlangs again
class CustomSorter(object): def __init__(self,alphabet): self.alphabet=alphabet def __call__(self,word): head,tail=self.separate(word1) key=[self.alphabet.index(head1)] if len(tail): key.extend(self(tail)) return key def separate(self,word): candidates=self.Candidates(word) while candidates==[]: word=word[1:] candidates=self.Candidates(word) candidates.sort(key=len) head=candidates.pop() tail=word[len(head):] return head,tail def Candidates(self,word): return [letter for letter in self.alphabet if word.startswith(letter)]
Saturday, 28 July 2012
Custom sorting for conlangs
class CustomSorter(object): def __init__(self,alphabet): self.alphabet=alphabet def __call__(self,word1,word2): comp=0 if word1=='' and word2=='': comp=0 elif word1=='': comp=-1 elif word2=='': comp=1 else: head1,tail1=self.separate(word1) head2,tail2=self.separate(word2) if head1==head2: comp=self(tail1,tail2) else: comp=self.alphabet.index(head1)-self.alphabet.index(head2) return comp def separate(self,word): candidates=self.Candidates(word) while candidates==[]: word=word[1:] candidates=self.candidates(word) candidates.sort(key=len) head=candidates.pop() tail=word[len(head):] return head,tail def Candidates(self,word): return [letter for letter in self.alphabet if word.startswith(letter)]