Sunday, December 24, 2017

AI in Recruitment: Word2Vec Opens up Interesting Possibilities

CVs and jobs are text heavy and like all the challenges which exist with natural language – multiple ways of describing the same concept, ambiguity, synonyms etc.  Meaningful interpretation of text requires extracting this knowledge in a machine understandable form. Among others similar problems exist in speech recognition, machine translation and conversational systems like Siri.

AI systems that process images work on high dimensional vector representation for each pixel embedded in a two-dimensional image. Most of the information needed to recognise images is present in the two-dimensional vectors.  However, most text processing system use a “bag of words” representation of text, that is each word is represented by a ID. For example, Infosys and TCS may be represented as say, ID75698 and ID 98603. And don’t use the contextual relationship between the two words, which otherwise recruiters or jobseekers can understand and process.

Latent Semantic Analysis is a technique which condenses the statistical count of co-occurring words into topics or concepts. It has been shown that Latent Semantic Analysis would recognise a shallow kind of topical similarity and not work well where subtle semantic relationship between words is present.

In contrast, predictive methods like Word2Vec learn the function that captures the salient statistical characteristics of the distribution of sequence of words. The function can associate each word with a continuous-valued vector representation that corresponds to a point in a feature space.

Word2Vec takes raw text as an input and the training of the Word2Vec model (skip-gram) is to arrive at vector representations of words that best predict a window of surrounding words. One can imagine that each dimension of that space corresponds to a semantic or grammatical characteristic of words.

The hope is that similar words get to be closer to each other in that space- that is we may expect Infosys Technologies and TCS as companies to be much closer to each in this feature space. That opens up new possibilities for AI in recruitment.