Sunday, December 24, 2017

AI in Recruitment: Word2Vec Opens up Interesting Possibilities

CVs and jobs are text heavy and like all the challenges which exist with natural language – multiple ways of describing the same concept, ambiguity, synonyms etc.  Meaningful interpretation of text requires extracting this knowledge in a machine understandable form. Among others similar problems exist in speech recognition, machine translation and conversational systems like Siri.

AI systems that process images work on high dimensional vector representation for each pixel embedded in a two-dimensional image. Most of the information needed to recognise images is present in the two-dimensional vectors.  However, most text processing system use a “bag of words” representation of text, that is each word is represented by a ID. For example, Infosys and TCS may be represented as say, ID75698 and ID 98603. And don’t use the contextual relationship between the two words, which otherwise recruiters or jobseekers can understand and process.

Latent Semantic Analysis is a technique which condenses the statistical count of co-occurring words into topics or concepts. It has been shown that Latent Semantic Analysis would recognise a shallow kind of topical similarity and not work well where subtle semantic relationship between words is present.

In contrast, predictive methods like Word2Vec learn the function that captures the salient statistical characteristics of the distribution of sequence of words. The function can associate each word with a continuous-valued vector representation that corresponds to a point in a feature space.

Word2Vec takes raw text as an input and the training of the Word2Vec model (skip-gram) is to arrive at vector representations of words that best predict a window of surrounding words. One can imagine that each dimension of that space corresponds to a semantic or grammatical characteristic of words.

The hope is that similar words get to be closer to each other in that space- that is we may expect Infosys Technologies and TCS as companies to be much closer to each in this feature space. That opens up new possibilities for AI in recruitment.

Thursday, October 12, 2017

Naukri.com featured as an important case study in KrantiNation

Naukri.com has been featured as an important case study in KrantiNation for using Machine Learning. According to the book author Pranjal Sharma, Machine Learning is a key technology for the 4th Industrial Revolution.

For more details on the book, please see KrantiNation: India and the Fourth Industrial Revolution

Saturday, September 23, 2017

AI in Recruitment : Is Mumbai closer to Delhi than Agra?

Jobseekers prefer to work closer home, their native town or their current location. They may also prefer specific locations because there are more job opportunities in that city. For example, Mumbai is a hub for financial services and Bangalore for IT jobs. That said, IT companies now have centers across all major metros and even in small cities like Indore, Jaipur, Trivandrum.

Jobseekers are willing to move from (say) Agra to Delhi, however, it is hard for an organization to convince anyone to move from Delhi to Agra. Charm of a large metropolis, with its educational, health, entertainment and modern lifestyle, is attracting talent towards larger cities. It has become a one way street.

As a recruiter (and hiring manager), when I look at a candidate, is he more like to move to Mumbai from Delhi? or will he prefer to move to a location near Delhi, say Agra? Often, geographical distance does not represent the user preferences. Unless there is some personal connect with a smaller town or incentives are offered with a promise for better location in the future, candidates are unwilling to move to smaller city or town. (Note - Agra is also developing very fast, preferences can change in the future).


Location is a simple "Yes" or "No", yet there are many variables which come into play in the Indian context. Some of the jobseekers want to live close to family and some away from it.  And preferences evolve as "the family" evolves and needs of the family change. A large number of jobseekers are willing to change location for the "better opportunity".

Location Preference Within a City
Yet, we see several employees depart because Gurgaon or Noida are too far from their current residence. Within a city, geographical distance or the daily commute is a major driver for employee satisfaction. An employee who was unhappy with his daily commute may eventually change the city itself (and not change his residence within the city).

AI Algorithm Must Understand the Preferences
The nuances of large and small city, distance within the city and also, personal preferences are all challenges for the AI algorithm to overcome.

- Vivek Jain

Please also see my blog post on (1) AI in Recruitment - Understanding Skills and Designations, (2) Story of Naukri Job Alerts, and (3) AI in Recruitment - Do Job Descriptions Represent the Intent of the Recruiter? 

Saturday, September 16, 2017

Naukri RMS - Nominated for IDC AP Digital Transformation Awards 2017

Naukri RMS received the IDC India Digital Transformational Award last month.  Congratulations Naukri team and thanks IDC. Naukri RMS is the new age Recruitment Management System which automates the recruitment process end-to-end from Requisition to Offer.  With over 3000 customers is three years of its launch, Naukri RMS has become the leader in this space.

Naukri RMS (earlier known as Naukri CSM) has been nominated for the Regional Awards - IDC AP Digital Transformation Awards 2017.



For more details on IDC Digital Awards, please visit - IDC Digital Summit 2017



Thursday, September 14, 2017

AI in Recruitment - Do Job Descriptions Represent the Intent of the Recruiter?

Job descriptions are essential part of recruitment. Once hiring manager creates a requisition and gets it approved, a recruiter will work with hiring manager to create a job description. A job description has dual purpose -

(1) it helps to attract jobseekers by pitching the unique attributes of the role for which recruiter is hiring, the reasons why a jobseeker will like to work in the advertised role, and

(2) it enables the recruiter to specify what kind of candidates she is looking for and also for jobseekers to know whether they are qualified for the requirement or not.

Job descriptions however may fail to deliver on the above two promise.

Recruiters may not have a job description to begin with, and they end up writing it with sketchy details on what a person is expected to do. Often the requirement evolves as the hiring manager and the recruiter meets jobseekers. Once the recruiting team knows what kind of skills are available and if no matching candidates for given set of requirements are found, hiring managers may modify their requirements.

Will recruiters update the job descriptions and re-advertise the positions with the new and updated job descriptions? Sometimes, yes and sometimes, no. If there are sufficient candidates available in the already received "applies", the recruiting team may decide to rely on the existing candidates and not re-advertise the updated requirements.

Now, if the job description is very well documented and the recruiter has already hired against the same position earlier, we can expect the job descriptions to represent the intent of the recruiter. That said, the AI algorithm is typically built on historic job descriptions and the response of the recruiters (in aggregate) to applies, hence, some of the "ambiguity" in the recruiter response is already embedded in the AI algorithm. This "ambiguity" may not always be helpful to the recruiter.

-Vivek Jain

Note- Even if job descriptions completely represent the intent of the recruiter, does the AI algorithm completely understand what is specified by the recruiter in the job description?

Please also see my blog post on (1) AI in Recruitment - Understanding Skills and Designations, and (2) Story of Naukri Job Alerts

Monday, September 11, 2017

AI in Recruitment - Understanding Designations and Skills

Relevance of jobs for candidates and candidates for recruiters is the most important challenge for AI in recruitment. Whether it is an Application Tracking System or a job portal, recruiters want easy mechanism to identify the most relevant candidate. That said, only a recruiter knows what she wants. The AI Algorithm only knows the job description which she shares with the system (there still exists a gap between what she wants and what the description says).

Over the last few years, this has been area of major focus and attention for our team at Naukri.com. I will discuss here on some elements which are important in solving this challenge.

Challenge 1: Complexity of Indian Economy - No one sector or Industry dominates

India is a large country with several 100 industries and sectors with companies of varying size. Every organization has many unique roles and designations that employees carry. Even within the organized sector, we have more than few 1000 roles and may be more than 50,000 designations. AI Algorithm needs to understand what each of the designations stand for.

Challenge 2: Creative Designations

Every organization is creative with designations and often internal designations are created to balance the organization challenges and individual aspirations. In many companies, Software Developers carry the designations like Software Engineer, SSE -1, SSE -2, Member of Technical Staff. However, few companies call their Quality Engineers as Software Engineers.

Often designations are created to represent evolving role descriptions based on the unique organization requirements. For example, few years ago, Mid-Office was created as a designation to distinguish teams from Front Office and Back Office. Similarly, we have seen new age professions emerge, for example, Digital Marketing, SEO Specialist, Social Media Marketing Manager, Data Scientist and so on.

For a system to understand the requirement, AI Algorithm must first understand the designations and the similar designations or related designations which other companies may have.

Challenge 3: Some Designations carry no information about role

Often designations are devoid of specific domains and also, role information. Some jobseekers write designations as Vice President, Manager, Senior Manager, Officer etc.

Challenge 4: Skills, Regions, Divisions are part of Designations

Skills are also part of designations which are often used to differentiate employees in the same role with specialized focus skills or areas of responsibility. For example, Software Developer, C++ Developer, Java Developer, Senior Engineer- COBOL and so on. In Sales function, we may have designations like Sales Regional Manager, Territory Manager - Bhopal, Area Sales Manager- Mangalore, Regional Manager - Paints and Specialty Chemicals etc. As we can observe, Cities and business units have been appended to these designations to differentiate sales managers playing similar role with special focus areas.

The challenge to disambiguate designations is not trivial as new designations are created on an ongoing basis. Skills are often used by jobseekers to distinguish themselves vis-a-vis other jobseekers.

AI algorithm needs a library of Designations & Skills and their inter-relationships. Have we solved the matching problem with regards to designations and skill sets? May be to a large extent. Yet there is scope of improvement and our effort continues. There are many other elements which play an important role in identifying relevant candidates, which I intend to talk about in later articles.

- Vivek Jain

Note - The challenge of overstated or understated skills is a conundrum which can only be solved by assessments. In my view, most jobseekers still faithfully represent what they know and what they don't know. And those who don't, are typically eliminated through the assessment process. Often an expert recruiter will look at signals beyond the stated skills, for example, the educational institution from which the jobseeker graduated or the company the jobseeker is working in.

Also see my blog post on Story of Naukri Job Alerts.

Tuesday, June 20, 2017

My Keynote Presentation at Data Science Conclave 2017 in Chennai

I am sharing my my keynote presentation at Data Science Conclave 2017 in Chennai. Thanks Rajesh for the invite.

1. Major improvements in accuracy in speech recognition and image recognition opens up a new field in human computer interaction. With computers able to correctly interpret almost all interactions without direct contact with keyboard or mouse, a major data source has opened up for Data Scientists to explore.
2. A system which is 80% accurate may not usable, however, when accuracy crosses 95%, there is a major turnaround in large scale adoption.
3. Self driving cars will lead to major leaps in technologies for object recognition -> not just previously known objects, also to anticipate and correctly handle unexpected objects.
4. In my view, there are four key dimensions of Data science, these are Data, Domain Expertise, Machine learning algorithms and Technology of Deployment. Value creation is possible across all the dimensions of Data Science. Better quality data, higher volume of relevant and contextual data can create value, and domain expertise remains critical in making successful deployments of data science projects. Our focus on machine learning algorithms is important, however, value creation happens across all the four dimensions.
5. We have seen a 5X increase in jobs which require machine learning and neural networks expertise.

Data Science is now mainstream and it is important for every organization to invest in Data Science and benefit from it.

https://www.slideshare.net/vjain99/data-science-conclave-keynote-presentation