CVs and jobs are text heavy and like all the challenges
which exist with natural language – multiple ways of describing the same
concept, ambiguity, synonyms etc. Meaningful interpretation of text requires
extracting this knowledge in a machine understandable form. Among others
similar problems exist in speech recognition, machine translation and
conversational systems like Siri.
AI systems that process images work on high dimensional
vector representation for each pixel embedded in a two-dimensional image. Most
of the information needed to recognise images is present in the two-dimensional
vectors. However, most text processing
system use a “bag of words” representation of text, that is each word is
represented by a ID. For example, Infosys and TCS may be represented as say, ID75698
and ID 98603. And don’t use the contextual relationship between the two words,
which otherwise recruiters or jobseekers can understand and process.
Latent Semantic Analysis is a technique which condenses the
statistical count of co-occurring words into topics or concepts. It has been
shown that Latent Semantic Analysis would recognise a shallow kind of topical
similarity and not work well where subtle semantic relationship between words
is present.
In contrast, predictive methods like Word2Vec learn the
function that captures the salient statistical characteristics of the
distribution of sequence of words. The function can associate each word with a
continuous-valued vector representation that corresponds to a point in a
feature space.
Word2Vec takes raw text as an input and the training of the Word2Vec
model (skip-gram) is to arrive at vector representations of words that best
predict a window of surrounding words. One can imagine that each dimension of
that space corresponds to a semantic or grammatical characteristic of words.
The hope is that similar words get to be closer to each
other in that space- that is we may expect Infosys Technologies and TCS as
companies to be much closer to each in this feature space. That opens up new
possibilities for AI in recruitment.
To Learn more on Word2Vec - Please see Getting Started with Word2Vec
- Vivek Jain
Please also see my blog post on (1) AI in Recruitment - Understanding Skills and Designations, (2) Story of Naukri Job Alerts, (3) AI in Recruitment - Do Job Descriptions Represent the Intent of the Recruiter? and (4) AI in Recruitment - Is Mumbai closer to Delhi than Agra?
- Vivek Jain
Please also see my blog post on (1) AI in Recruitment - Understanding Skills and Designations, (2) Story of Naukri Job Alerts, (3) AI in Recruitment - Do Job Descriptions Represent the Intent of the Recruiter? and (4) AI in Recruitment - Is Mumbai closer to Delhi than Agra?