Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Tuesday, 13 December 2016

The Common Ground Algorithm - A Possible Remedy for Filter Bubbles

People have a tendency towards Confirmation Bias, whereby they seek out things that confirm their existing opinions and avoid things that challenge them. On social networks and recommendation systems, this can lead to the development of a filter bubble, whereby their sources of information come to be structured around what they already believe. This, of course, acts as an obstacle to healthy discussion between people of differing opinions, and causes their positions to become ever more deeply entrenched and polarised. Instead of seeing those with whom they differ as being decent people who have something of value to offer them, and who may be persuadable on some of their differences, people start seeing their opponents as the enemy. To prevent this, people need something that will put them in touch with people with whom they have generally opposing viewpoints. Of course, we can't just confront people with contrary opinions - this will risk provoking hostile reactions. What we need is to show people what they have in common with those whose opinions are different, so that they can build trust and begin to interact in a healthy way. As an attempt to do this, I present The Common Ground Algorithm. This uses a combination of topic modelling and sentiment analysis to characterise a user's opinions. It then finds people whose opinions are generally opposed to theirs, and identifies the topics on which they share common ground, recommending posts where they agree on something with people they disagree with in general. I've coded up a reference implementation in Python, and am releasing it under the MIT Licence to encourage its use and further development.

Saturday, 27 February 2016

FizzBuzz

def FizzBuzz():
    for i in xrange(1,100):
        word=''.join(('Fizz' if i%3==0 else ''),
                              ('Buzz' if i%5==0 else ''))
        print i if word=='' else word

Thursday, 22 October 2015

Integrating Java with Python the Easy Way

I have an idea for something I want to build, which will involve a speech recognition component, written in Java and a Hidden Markov Model, written in Python. So that means I have to integrate components written in two different languages. What's the best way of doing it? One way would be to run Python on the JVM. There is a Python implementation for the JVM, Jython, but from what I've heard it's painfully slow. Since I'm aiming for something as close to real time as possible, it's unlikley to meet my needs. It did occur to me that there could be a faster way to run Python on the JVM. Pypy is a self-hosting, JIT-compliled implementation of Python, which is much faster than the reference implementation. If its code generation phase were modified to emit Java Bytecode, then Pypy could run on the JVM. This approach, which I call Jypy, would be a worthwhile project for somebody who knows Java Bytecode. Unfortunately, I'm not that person. However, I then thought about the architecture of my project. I'd already realised that it would have to be organised as a number of concurrent processes, communicating via pipes. I then realised that meant that I didn't need to run Python on the JVM at all. The Java and Python components could each run in their own processes, and didn't need to share any resources. The only integration I needed was pipes. You know the sense of delight when you realise that something complicated is actually simple? That's how I felt when I worked that out.

Tuesday, 9 June 2015

Emily Has Moved

As those of you who've tried out my semantic recommendation system, Emily, will have noticed, it didn't work. The reason was, I'd used the wrong cloud platform. Google App Engine isn't meant for anything that needs as much computation as Emily does, so I've ported Emily to OpenShift. This has the advantage that it gives me much more control of how I write the code, and I can use things like MongoDB and multiprocessing. Let's try this again!

Thursday, 4 June 2015

Developing Emily - Revision 24: Porting to OpenShift. AppEngine wasn't suitable for the computationally intense

Changed Paths:
    Modify    /trunk/Emily.py
    Modify    /trunk/EmilyBlogModel.py
    Modify    /trunk/EmilyTreeNode.py
    Modify    /trunk/emily.js

Porting to OpenShift. AppEngine wasn't suitable for the computationally intense parts of Emily.

from Subversion commits to project emily-found-a-thing on Google Code http://ift.tt/1G9GWoV
via IFTTT

Thursday, 21 May 2015

Developing Emily - Revision 23: Ready to launch

Changed Paths:
    Modify    /trunk/Emily.py
    Modify    /trunk/EmilyBlogModel.py
    Modify    /trunk/EmilyTreeNode.py
    Add    /trunk/emily.js

Ready to launch

from Subversion commits to project emily-found-a-thing on Google Code http://ift.tt/1IN7SNv
via IFTTT

Tuesday, 17 June 2014

NoSQL for Conlangers

In his blog, fellow-conlanger +Wm Annis writes that the best database format for dictionaries is text.

All his points are valid, but at one point he says The standard is SQL, and that got me thinking. I've done a fair bit of work with SQL, and can do scary things with it, but I wouldn't choose to use it. It's inflexible and clunky. You have to decide your schema in advance, and if your requirements change at a later date, you have no choice but to rebuild entire tables. Anything more complex than a simple one-to-one relationship requires a second table and a join. SQL basically expects you to fit your data to the model, and what you need is to fit the model to your data. Using an ORM like SQLAlchemy doesn't help - it's just a layer of abstraction on top of an inherently clunky system.

For a good dictionary system, you need the flexibility of a NoSQL database. One popular system, that I've done a lot of work with, is MongoDB. This stores documents in JSON format, so a dictionary entry might look like this

{"word":"kitab",
  "definitions":[{"pos":"noun",
                            "definition":"book"]},
"inflections":{"plural":{"nominative":"kutuub"}},
"related":["muktib","kataaba"]}

If a field exists for some words but not others, you only need to put it in the relevant entries. If a field is variable length, you can store it in an array. One slight disadvantage is that cross-referencing between entries can be a little tricky.

Another possibility is ZODB. This is an object persistance system for Python objects. In many ways it's similar to MongoDB, but there's one important difference. If a member of a stored object is itself an object that inherits from persistant, what is stored in the parent object is a reference to that object. Cross-referencing is therefore completely transparent. The only small disadvantage is that it's Python-specific, but unless you really need to write your dictionary software in a different language, that shouldn't be a big problem.

You might also want to consider a graph database like Neo4j. This stores data as a network of nodes and edges, like this

kitab-[:MEANS]->book
kitab-[:PLURAL]->kutuub-[:MEANS]->books

In theory, this is the most flexible form of database. I wouldn't say it was easy to learn or use, though.

There are plenty of other NOSQL databases, these are just the ones I'd use, but I think they're all more suitable for dictionary software than SQL. But do make sure you have a human-readable backup.

Saturday, 26 April 2014

Experimenting with IFTTT

I've just started trying out IFTTT. Partly this is because the Feedburner feed for this blog has needed manually prompting to update my Twitter feed, but also because I'm investigating using it to post automatically to Blogger on behalf of my friends at Speculative Grammarian.

To do this, I'm using a feed from one of my Google Code projects. It's a semantic recommendation system I've been working on. I call it Emily, because it finds things (or at least, it will do when it's up and running). Code updates from the project should be appearing here.

Wednesday, 5 March 2014

One of my Fantastical Devices is on PyPI

I've mentioned in previous posts that I've been working on a Python library for Hidden Markov Models. I've been encouraged to put this up on the Python Package Index, so, after a little while getting the hang of registering and uploading a project here it is. It's alpha, or course, so there are probably plenty of bugs to be found in it, but if you want to play with something I've made, all you have to do is type
sudo pip install Markov
, and try it out. If you feel you can help me improve it, contact me and I can add you to the Google Code project.

Monday, 24 June 2013

A Couple of my Fantastical Devices

with the recent news about the Voynich Manuscript, as mentioned in my last post, I thought it opportune to share a couple of pieces of code I'd written. First off, as I mentioned earlier, a couple of years ago I wrote a Python implementation of Montemurro and Zanette's algorithm for calculating the entropy of words in documents. If you're interested in using the technique yourself, you may want to have a look. Secondly, my own attempts to uncover the syntax use a Python library for Hidden Markov Models that I created. It probably still has a few bugs in it, but it's attracted a bit of interest online, and I'm hoping to develop it further. So, if you're at all interested in AI, computational linguistics, or analytics, please have a look at these. Feedback is welcome, as is anybody who wishes to contribute further to these projects.

Saturday, 9 February 2013

Custom Sorting For Conlangs again

I've just revised that Python code I posted a while back for sorting lists of strings in customized alphabetical orders. I realized that it would be more efficient to implement it as a key function (which is evaluated once per item) rather than a cmp funtion (which is evaluated for each pair of items). Fortunately Python compares lists in a similar way to strings, thus making it possible.
 class CustomSorter(object):
    def __init__(self,alphabet):
        self.alphabet=alphabet

    def __call__(self,word):
        head,tail=self.separate(word1)
        key=[self.alphabet.index(head1)]
        if len(tail):
            key.extend(self(tail))
        return key

    def separate(self,word):
        candidates=self.Candidates(word)
        while candidates==[]:
            word=word[1:]
            candidates=self.Candidates(word)
        candidates.sort(key=len)
        head=candidates.pop()
        tail=word[len(head):]
        return head,tail

        def Candidates(self,word):
            return [letter for letter in self.alphabet if word.startswith(letter)]

Saturday, 28 July 2012

Custom sorting for conlangs

Sorting a list of words into alphabetical order is usually a trivially easy task. But if your words are in a conlang, alphabetical order might be different from usual. Here's a Python class that, when instantiated with a list of strings (alphabet), creates a callable object that can be used as the cmp argument of Python's list.sort() method. alphabet can contain digraphs, in which case matching is greedy, and the CustomSorter will ignore any characters not found in alphabet, which is useful for separating pairs of characters that might otherwise resemble digraphs. If you're using Python 3, you'll have to wrap the CustomSorter in comp_to_key. (Hope preformatted text works)
class CustomSorter(object):
    def __init__(self,alphabet):
        self.alphabet=alphabet

    def __call__(self,word1,word2):
        comp=0
        if word1=='' and word2=='':
            comp=0
        elif word1=='':
            comp=-1
        elif word2=='':
            comp=1
        else:
            head1,tail1=self.separate(word1)
            head2,tail2=self.separate(word2)
            if head1==head2:
                comp=self(tail1,tail2)
            else:
                comp=self.alphabet.index(head1)-self.alphabet.index(head2)
        return comp

    def separate(self,word):
        candidates=self.Candidates(word)
        while candidates==[]:
            word=word[1:]
            candidates=self.candidates(word)
        candidates.sort(key=len)
        head=candidates.pop()
        tail=word[len(head):]
        return head,tail

        def Candidates(self,word):
            return [letter for letter in self.alphabet if word.startswith(letter)]