Wednesday, March 22, 2017

Thoughtflow.io - connecting people by sentence sentiment clustering

To present the idea and implementation of Thoughtflow, I start off by asking the following. What is common between an Overwatch server outage rage forum, a terrorist nest, a Bruce Lee quote enthusiasts gathering place, a puppy death cry-over, a chat group full of people who just farted, an ad hoc hiking trip or swingers party organizing interface and whatever you can imagine where resonating thoughts meet and empathy flourishes? Let me cut the BS and get to the answer.

Thoughtflow is designed to extract sentiment (meaning) out of sentences and group them according to similarity. Think of a search bar, you type in your thought as a sentence and you get grouped up with others who typed in analogous sentences.

One level deeper: Thoughtflow uses machine intelligence to turn human readable text to a numerical vector, which is then used to address a high dimensional space. In this metric space, distances between vectors can be defined and used to tie close ones together, hence to establish groups of related sentences - that is, cluster them.

Outline of Thoughtflow: web UI accepts sentences, forwards it to Skip-Thought, which then spits out the encoded sentence as a numerical vector. The sentence vector is consumed by DBSCAN and is grouped up by semantic similarity with other live sentences/thoughts in the system. In groups, users may post on the wall, comment, or vote - at the moment.

So, how could you turn a sentence into a vector, while holding onto the meaning? Or a (simpler) alternative, how could you turn a word into a vector in the same manner? That's what word2vec models are for! Here's how they work.