ResumeRecommendationReview

Open in Colabarrow-up-rightOpen in GitHubarrow-up-right

Overview

The ResumeRecommendationReview system is a comprehensive solution designed to simplify and enhance the job application process for individuals seeking corporate positions. The system is divided into two main components, each tailored to address key challenges faced by job seekers:

  1. Company Recommendation Using advanced matching algorithms, the system analyzes a user’s uploaded resume and compares it with job postings on LinkedIn. Based on this analysis, it identifies and recommends companies that align closely with the candidate’s qualifications, skills, and career aspirations.

  2. Resume Evaluation and Enhancement For the recommended companies, the system conducts a detailed evaluation of the user’s resume. It highlights strengths, identifies areas for improvement, and provides actionable suggestions for tailoring the resume to better fit the expectations of target roles. This ensures candidates can present their qualifications in the most impactful way possible.

By integrating these two components, the ResumeRecommendationReview system streamlines the job application journey, empowering users to:

  • Discover job opportunities that best match their unique profile.

  • Optimize their resumes for maximum impact, increasing their chances of securing interviews and job offers.

Key Features:

  • CV/Resume Upload: Users begin by uploading their existing CV or resume in a supported file format (e.g., PDF) The system extracts relevant keywords, experiences, and skill sets to build a user profile.

  • Job Matching with LinkedIn Postings: The platform automatically scans LinkedIn job listings (and potentially other job boards) for roles that align with the user’s skill set and career interests. A matching algorithm ranks and recommends a list of the most relevant companies and positions for the candidate to consider.

  • Comparison & Evaluation (LLM-as-a-Judge) The system leverages a Large Language Model (LLM) to analyze the uploaded resume and specific job requirements. It evaluates the alignment between the user's experience and the job description, identifying strengths, skill gaps, and areas in need of improvement. Additionally, the system evaluates the recommendation performance using cosine similarity to measure the semantic alignment and NDCG (Normalized Discounted Cumulative Gain) to assess the ranking quality of the recommendations.

  • Automated Resume Enhancement: Based on the LLM evaluation, the system provides a detailed report highlighting sections that need modification. Suggested edits may include restructuring experience points, emphasizing relevant skills, or adding keywords that match the job posting’s expectations.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setuparrow-up-right for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorialarrow-up-right for more details.

Data Preparation and Preprocessing

This section covers the data preparation and preprocessing steps required for the Resume Recommendation System. The key stages include:

  • Processing resume data (PDF)

  • Processing LinkedIn job postings

For the LinkedIn job postings data, this tutorial uses the dataset available on Kaggle: arshkon/linkedin-job-postingsarrow-up-right.

Using the raw data directly to build the recommendation system may lead to suboptimal performance. Therefore, the data is refined and preprocessed to focus specifically on recruitment-related information to enhance the accuracy and relevance of the recommendations.

Install and Import Required Libraries

Text Splitting Configuration

Set up configurations to divide the extracted text into manageable sizes, ensuring smooth processing:

Parameter Descriptions:

  • chunk_size: The maximum length of each text chunk, ensuring the text is divided into manageable sections.

  • chunk_overlap: The length of overlapping text between chunks, providing continuity and context for downstream tasks.

  • separators: The delimiters used to split the text, such as line breaks or punctuation, to optimize the splitting process.

Defining the Pydantic Model

In this section, we define a structured data model using Pydantic, which ensures validation and consistency in the data extracted from resumes. This model is critical for organizing key sections of a resume into a format that the system can analyze effectively.

Analyzing Interests in Resumes

The analyze_interests function is designed to extract and summarize the key areas of interest and research focus from a resume. It uses a Large Language Model (LLM) to process the resume text and provide a concise summary, helping to identify the candidate's academic and professional interests effectively.

Purpose

  • Extracts main areas of interest and research focus from the provided resume text.

  • Generates a brief summary (2-3 sentences) that highlights the candidate's academic and career patterns.

  • Focuses solely on interests and research areas to provide targeted insights.

Analyzing Career Fit in Resumes

The analyze_career_fit function evaluates a candidate's resume to recommend the most suitable job roles along with their respective fit scores. By leveraging a Large Language Model (LLM), this function identifies key areas of expertise and rates the candidate's suitability for various technical positions.

Purpose

  • Recommends job roles based on the candidate's skills, research background, and career trajectory.

  • Assigns a fit score (0.0 to 1.0) for each role, reflecting the candidate's alignment with the position.

Processing Resumes to Extract Key Job-Related Information

The process_resume function analyzes a resume file, extracting and processing key information relevant to job applications. It combines text extraction, interest analysis, and career fit evaluation to generate structured, weighted insights from the resume.

Function Overview

Purpose

  • Extract key job-related information from resumes in PDF format.

  • Use LLM analysis to evaluate the candidate's skills, experience, projects, achievements, and education.

  • Assign weights to each section based on relevance to the target job role.

Resume Processing Example

Here's an example of how to use the process_resume function to extract structured data from a resume:

LinkedIn Data Preprocessing

This step involves loading job posting data and extracting only the necessary details. The dataset used for this tutorial is sourced from Kaggle: arshkon/linkedin-job-postingsarrow-up-right.

  • company_name: The name of the company offering the job posting.

  • title: The title of the job being offered.

  • description: A detailed description of the job, including responsibilities, qualifications, and expectations.

  • max_salary: The maximum salary offered for the position.

  • med_salary: The median salary for the position, providing an average range for the offered pay.

  • min_salary: The minimum salary offered for the position.

  • skills_desc: A list or summary of the required or preferred skills for the position.

  • work_type: The type of work arrangement, such as full-time, part-time, remote, or hybrid.

Purpose of These Columns These selected columns are essential for processing job posting data. They allow the system to:

  • Extract relevant metadata for recommendation and filtering.

  • Match resumes to job postings based on skills, and job details.

  • Provide users with clear and actionable job-related information.


Efficient CSV Reading with Encoding Detection

This function provides a robust way to read CSV files by dynamically handling encoding issues. CSV files often come in various encodings, and incorrect encoding can cause errors when reading the file. The function attempts to read the file with the most common encodings and falls back to a detection library if necessary.

Function: read_csv_with_encoding

Purpose To efficiently read a CSV file while handling potential encoding issues, ensuring compatibility with a wide range of file formats.


How It Works

  1. Attempt to Read with UTF-8: The function first tries to read the file using UTF-8 encoding, which is the most commonly used encoding.

    • If successful, the function returns the loaded DataFrame.

    • If a UnicodeDecodeError occurs, it proceeds to the next step.

  2. Encoding Detection with chardet: If UTF-8 fails, the function uses the chardet library to detect the file's encoding:

    • Reads the first 10KB of the file for faster detection.

    • Extracts the detected encoding from the result.

  3. Retry with Detected Encoding: The function attempts to read the file again using the detected encoding:

    • If successful, the DataFrame is returned.

    • If another UnicodeDecodeError occurs, it falls back to a common encoding.

  4. Fallback to CP949: If both UTF-8 and the detected encoding fail, the function defaults to CP949 encoding, commonly used for files in East Asian languages like Korean.

Text Cleaning Function

Here’s a utility function designed to clean and preprocess text data for better consistency and quality:

If there are any null values in the company name field, those entries are excluded. (While other fields may also have null values, this step focuses only on excluding records with null in the company name.)

Processing Job Postings Data

The process_job_postings function integrates and processes job information from a LinkedIn dataset to create structured documents for analysis or recommendation purposes.

This function takes a DataFrame of LinkedIn job postings and processes each entry into a standardized format, combining relevant details like company name, job title, required skills, and salary information.

Setting Up ChromaDB and Storing Data

Using ChromaDB for Storing and Retrieving Resume and Job Posting Data In this section, we will explore how to use ChromaDB to store resume and job posting data as vector representations and perform similarity-based searches.

What is ChromaDB?

ChromaDB is a vector database that allows text data to be stored as embeddings, enabling efficient similarity-based searches. In our Resume Recommendation System, ChromaDB is used for the following purposes:

  • Vectorizing Text: Converting resume and job posting text into vector representations.

  • Efficient Similarity Search: Performing fast searches based on the similarity of embeddings.

  • Metadata-Based Search and Filtering: Enhancing search results with filters like job title, or company name.

Setup Steps Preparing Required Libraries

Before starting, import the necessary libraries:

Roles of Each Library:

  • langchain_community.vectorstores: Provides integration with ChromaDB.

  • langchain_openai: Enables the use of OpenAI embedding models.

  • chromadb: Provides vector database functionality.

Initializing ChromaDB

Set up ChromaDB and create collections:

Why Use PersistentClient?

  • Permanent Data Storage: Ensures that data is not lost when the application or session ends.

  • Data Persistence Across Sessions: Allows the system to retain data for use in future queries without requiring re-upload or re-processing.

  • Ease of Backup and Recovery: Provides a reliable way to save and restore data for robustness and fault tolerance.

Storing Data This step involves saving resume and job posting data into ChromaDB for efficient querying and management. Origin data has too many data, so we use only 500 data

Example of Job_documents_

Company Recommendation System

This section focuses on recommending companies that align with the candidate's resume and evaluates the recommendations using two key metrics:

  1. Cosine Similarity for Recommendation Evaluation:

    • Measures the similarity between the candidate's resume and the job posting.

    • A higher cosine similarity score indicates a stronger match between the candidate's profile and the company's job requirements.

  2. NDCG (Normalized Discounted Cumulative Gain) for Recommendation Evaluation:

    • Assesses the quality of the ranking of recommended companies.

    • A higher NDCG score signifies that the most relevant companies appear at the top of the recommendation list, reflecting better ranking performance.

Understanding the Scores

  • High Scores:

    • Indicate a strong alignment between the resume and the recommended companies (Cosine Similarity).

    • Demonstrate that the ranking system effectively prioritizes the most relevant companies (NDCG).

  • Low Scores:

    • Suggest weaker matches between the resume and job postings or suboptimal ranking of recommendations.

The goal is to achieve high scores in both metrics, ensuring accurate and effective company recommendations for the candidate.

Job Recommendation System with Weighted Similarity Search

This implementation utilizes a Job Recommendation System to match resumes with the most relevant job postings. By combining cosine similarity and weighted scoring, the system ensures accurate and tailored recommendations.


  • Personalized Matching: Matches resumes to job postings with high accuracy.

  • Flexible Scoring: Incorporates weighted factors to prioritize specific job attributes.

  • Enhanced Readability: Formats job descriptions for easy review.

Resume and Job Recommendation Evaluation System

This implementation introduces a comprehensive evaluation system for job recommendations based on resumes.

The system leverages Discounted Cumulative Gain (DCG) and Normalized Discounted Cumulative Gain (NDCG) to measure the quality of recommendations. Additionally, precision and recall metrics are calculated for further analysis.

Excute Evaluation

LLM-Based Resume Evaluation System

This section outlines the implementation of a system that uses a Large Language Model (LLM) to evaluate resumes by comparing them against job descriptions. The system provides actionable insights to improve resumes and assists in aligning candidates’ qualifications with job requirements.

What is LLM-as-a-Judge?

The LLM-as-a-Judge system leverages the advanced reasoning and natural language understanding capabilities of an LLM to serve as an impartial evaluator in the hiring process. By acting as a "judge," the LLM compares a candidate’s resume to job requirements, evaluates their alignment, and provides actionable feedback.

Key features of the LLM-as-a-Judge system include:

  • Contextual Understanding: It comprehends detailed job descriptions and resumes beyond simple keyword matching, enabling nuanced evaluations.

  • Feedback Generation: Provides insights into the candidate's strengths and areas for improvement.

  • Decision Support: Assists hiring managers or applicants by generating a recommendation on the candidate's suitability for the role.

This system bridges the gap between human evaluation and automated analysis, ensuring more accurate and tailored results in the recruitment process.


Functionalities

The LLM-as-a-Judge system provides the following functionalities:

  • Detailed Analysis: Analyzes resumes and job requirements in detail, identifying key qualifications and expectations.

  • Alignment Evaluation: Assesses how well the candidate's skills and experiences match the job requirements.

  • Strengths and Improvement Areas: Identifies the candidate's strengths and offers suggestions for improvement.

  • Role Suitability Recommendation: Provides a final recommendation on whether the candidate is a good fit for the role.


Key Components

1. CriterionEvaluation

The CriterionEvaluation class evaluates individual aspects of the resume based on predefined criteria:

  • score (int): A score from 1 to 5 representing the performance for a specific criterion.

  • reasoning (str): A detailed explanation of why the score was assigned.

  • evidence (List[str]): Specific elements from the resume that support the evaluation.

  • suggestions (List[str]): Targeted recommendations for improving the evaluated area.


2. DetailedEvaluation

The DetailedEvaluation class provides a comprehensive evaluation of the resume by aggregating results across multiple criteria:

  • technical_fit: Assessment of technical skills and their relevance to the job.

  • experience_relevance: Evaluation of how well the candidate’s work experience aligns with the role.

  • industry_knowledge: Examination of the candidate’s understanding of the target industry.

  • education_qualification: Review of academic background and certifications.

  • soft_skills: Analysis of interpersonal and communication skills.

  • overall_score (int): A total score (0-100) summarizing the resume's performance.

  • key_strengths (List[str]): Highlights of the resume's strongest areas.

  • improvement_areas (List[str]): Areas requiring enhancement for better alignment with the job.

  • final_recommendation (str): A conclusion on the candidate’s suitability for the position.


3. LLMJudge

The LLMJudge class uses an LLM to evaluate resumes against job descriptions by analyzing criteria such as technical fit, experience relevance, and soft skills.

  • Responsibilities:

    • Processes resume text and job information.

    • Uses a structured prompt to guide the LLM in scoring and providing feedback.

    • Outputs a DetailedEvaluation object containing scores, evidence, and suggestions.

  • Features:

    • Dynamic prompt generation for precise LLM instructions.

    • Predefined evaluation criteria with customizable weights and descriptions.


4. ResumeEvaluationSystem

The ResumeEvaluationSystem orchestrates the entire resume evaluation process, from text extraction to generating improvement reports.

  • Responsibilities:

    • Processes resumes to extract clean text for analysis.

    • Selects the most relevant jobs for evaluation based on similarity scores.

    • Generates detailed reports summarizing the evaluation and suggestions.

  • Methods:

    • evaluate_with_recommendations: Evaluates a resume against the top n recommended jobs.

    • format_evaluation_report: Converts the DetailedEvaluation object into a readable report.


Example: Resume Evaluation Results


πŸ’‘ Overall Score: 85/100


Evaluation Summary

  • πŸ”§ Technical Fit (30%): 4/5

    • Reasoning: Strong Python and SQL skills; lacks cloud experience.

    • Suggestions: Add cloud certifications like AWS.

  • πŸ‘” Experience Relevance (25%): 4/5

    • Reasoning: Relevant projects but no measurable outcomes.

    • Suggestions: Quantify achievements (e.g., "Increased sales by 15%").

  • 🎯 Industry Knowledge (15%): 3/5

    • Reasoning: Limited mention of industry expertise.

    • Suggestions: Include domain-specific certifications or research.

  • πŸ“š Education Qualification (15%): 5/5

    • Reasoning: Relevant degree and certifications.

  • 🀝 Soft Skills (15%): 4/5

    • Reasoning: Leadership demonstrated; teamwork examples sparse.

    • Suggestions: Add examples of collaboration.


Recommendations

  • Key Strengths: Strong technical skills, leadership, relevant education.

  • Improvements: Add measurable outcomes, industry expertise, teamwork examples.

  • Final Recommendation: Highly suitable; minor revisions suggested.

LLM-Based Resume Evaluation System

This system leverages a Large Language Model (LLM) to evaluate resumes against job descriptions systematically. It provides detailed feedback based on predefined evaluation criteria, helping candidates understand their strengths, areas for improvement, and overall suitability for specific roles.

Excute Evaluation

LLM-Based Resume Revise System

This tutorial demonstrates how to create a system that evaluates and improves resumes using a Large Language Model (LLM).

The system provides actionable suggestions to optimize resumes for specific job descriptions, enhancing the candidate’s chances of securing a role.


Key Components

  1. EnhancementSuggestion Model The EnhancementSuggestion model defines the structure for improvement suggestions:

  • section: The specific resume section being improved (e.g., "Skills" or "Work Experience").

  • current_content: The original content of the section.

  • improved_content: The suggested improvement for the section.

  • explanation: A detailed explanation of why the improvement is recommended.


  1. ResumeEnhancement Model The ResumeEnhancement model provides a holistic improvement report:

  • improvements: A list of section-specific suggestions.

  • keyword_optimization: Suggested keywords to include in the resume for optimization.

  • general_suggestions: Overall suggestions for structure and presentation.

  • action_items: Practical, actionable items for the candidate to implement.


  1. ResumeEnhancementSystem The ResumeEnhancementSystem class uses an LLM to analyze resumes and generate detailed, job-specific improvement suggestions. This system:

  • Accepts the resume text, job information, and evaluation results as inputs.

  • Produces a structured output aligning with the ResumeEnhancement model.

  • Focuses on realistic, actionable improvements tailored to the target job.

  1. IntegratedResumeSystem The IntegratedResumeSystem combines evaluation and enhancement processes into a seamless workflow:

  • Step 1: The ResumeEvaluationSystem evaluates the resume against job requirements, providing initial scoring and feedback.

  • Step 2: The ResumeEnhancementSystem builds upon the evaluation results to generate actionable suggestions for improvement.

  • Step 3: A comprehensive improvement report is created, highlighting section-specific improvements, keyword optimizations, and general suggestions.


Example Output Format

πŸ“‹ Section-Specific Improvements

  • Section: Work Experience Current: Managed team projects in retail operations. Improved: Led cross-functional teams to increase sales by 15% within six months. Reason: Emphasizes measurable outcomes and aligns with the leadership skills required for the target role.

  • "Cross-functional leadership"

  • "Revenue growth"

  • "Data-driven decision-making"

πŸ’‘ General Suggestions

  • Use consistent formatting for job titles and dates.

  • Highlight certifications relevant to the target job.

βœ… Actionable Steps

  1. Update work experience details to emphasize achievements.

  2. Include certifications and training relevant to the role.

  3. Incorporate suggested keywords into the skills and summary sections.

Excute Evaluation

you can choose how many jobs you want to evaluate by changing the top_n value.

Last updated