The Fantastical Devices of Pete the Mad Scientist: internet

Showing posts with label internet. Show all posts

Tuesday, 13 December 2016

The Common Ground Algorithm - A Possible Remedy for Filter Bubbles

People have a tendency towards Confirmation Bias, whereby they seek out things that confirm their existing opinions and avoid things that challenge them. On social networks and recommendation systems, this can lead to the development of a filter bubble, whereby their sources of information come to be structured around what they already believe. This, of course, acts as an obstacle to healthy discussion between people of differing opinions, and causes their positions to become ever more deeply entrenched and polarised. Instead of seeing those with whom they differ as being decent people who have something of value to offer them, and who may be persuadable on some of their differences, people start seeing their opponents as the enemy. To prevent this, people need something that will put them in touch with people with whom they have generally opposing viewpoints. Of course, we can't just confront people with contrary opinions - this will risk provoking hostile reactions. What we need is to show people what they have in common with those whose opinions are different, so that they can build trust and begin to interact in a healthy way. As an attempt to do this, I present The Common Ground Algorithm. This uses a combination of topic modelling and sentiment analysis to characterise a user's opinions. It then finds people whose opinions are generally opposed to theirs, and identifies the topics on which they share common ground, recommending posts where they agree on something with people they disagree with in general. I've coded up a reference implementation in Python, and am releasing it under the MIT Licence to encourage its use and further development.

Wednesday, 17 June 2015

The Bootstrap Problem

A post on Data Community DC discusses Why You Should Not Build a Recommendation Engine. The main point is that recommendation engines need a lot of data to work properly, and you're unlikely to have that when you start out.

I know the feeling. In a previous job I created a recommendation engine for a business communication system. It used tags on the content and user behaviour to infer the topics that the user was most likely to be interested in, and recommend content accordingly. Unfortunately, my testbed was my employer's own instance of the product, and the company was a start-up that was too small to need its own product. I never really got a handle on how well it worked.

This brings me to Emily. Emily isn't a product. It's a personal portfolio project. I had an idea for a recommendation system that would infer users' interests from content they posted in blogs, and recommend similar content. The problem is, the content it recommends comes from the other users, so at its current early stage of operation, it doesn't have much to recommend. The more people use it, the better it will become, but what's the incentive to be an early adopter?

What I seem to have at the moment is a recommendation engine that needs somebody to recommend it.

Tuesday, 9 June 2015

Emily Has Moved

As those of you who've tried out my semantic recommendation system, Emily, will have noticed, it didn't work. The reason was, I'd used the wrong cloud platform. Google App Engine isn't meant for anything that needs as much computation as Emily does, so I've ported Emily to OpenShift. This has the advantage that it gives me much more control of how I write the code, and I can use things like MongoDB and multiprocessing. Let's try this again!

Tuesday, 26 May 2015

Introducing Emily - my latest Fantastical Device

Emily is a semantic recommendation system for blogs that I've been working on. If you give it an Atom or RSS feed from a blog, it will create a feed of items from other blogs that hopefully match your interests.

It does this by using significant associations between words to infer your interests. Suppose a randomly-chosen sentence from your blog has a probability P(A) of containing word A, and a probability P(B) of containing word B. If there were no relationship between the words, we would expect the probability of a sentence containing both words to be P(AB)=P(A)P(B). If there is significant information contained in the relationship between the words, they will cooccur more frequently than this, and we can quantify this with an entropy, H=log2 P(AB) - log2 P(A) - log2 P(B)

Emily uses the strengths of these associations to calculate the similarity between two blogs. Then, if you post an article that makes your blog more similar to somebody else's blog than it was before, that article is recommended to them.

This has been an interesting project for me. I've learned about Google App Engine, pubsubhubbub and Atom. What I need now is for people to try it out. I'm looking forward to when Emily starts finding things for me.

Thursday, 28 April 2011

Delicious has a new owner

As you can see, I'm a user of the social bookmarking site Delicious, or to use its original name, del.icio.us. It was previously owned by Yahoo!, which announced last December that they didn't want it anymore. This led to panic amongst Delicious users, as a rumour went round that the site was going to close. Yahoo! quickly issued a statement that they weren't closing it, they were selling it, and now the new owners have been announced - AVOS, a company founded by the users of YouTube. Hopefully they might develop the site a bit more - Delicious has had the feel of a resource with untapped potential for a while.

And while we're at it, can we have the cool URL back please?