ResumeRecommendationReview
Author: Ilgyun Jeong
Peer Review: Jaehun Choi, Dooil Kwak
Proofread : Juni Lee
This is a part of LangChain Open Tutorial
Overview
The ResumeRecommendationReview system is a comprehensive solution designed to simplify and enhance the job application process for individuals seeking corporate positions. The system is divided into two main components, each tailored to address key challenges faced by job seekers:
Company Recommendation Using advanced matching algorithms, the system analyzes a userβs uploaded resume and compares it with job postings on LinkedIn. Based on this analysis, it identifies and recommends companies that align closely with the candidateβs qualifications, skills, and career aspirations.
Resume Evaluation and Enhancement For the recommended companies, the system conducts a detailed evaluation of the userβs resume. It highlights strengths, identifies areas for improvement, and provides actionable suggestions for tailoring the resume to better fit the expectations of target roles. This ensures candidates can present their qualifications in the most impactful way possible.
By integrating these two components, the ResumeRecommendationReview system streamlines the job application journey, empowering users to:
Discover job opportunities that best match their unique profile.
Optimize their resumes for maximum impact, increasing their chances of securing interviews and job offers.
Key Features:
CV/Resume Upload: Users begin by uploading their existing CV or resume in a supported file format (e.g., PDF) The system extracts relevant keywords, experiences, and skill sets to build a user profile.
Job Matching with LinkedIn Postings: The platform automatically scans LinkedIn job listings (and potentially other job boards) for roles that align with the userβs skill set and career interests. A matching algorithm ranks and recommends a list of the most relevant companies and positions for the candidate to consider.
Comparison & Evaluation (LLM-as-a-Judge) The system leverages a Large Language Model (LLM) to analyze the uploaded resume and specific job requirements. It evaluates the alignment between the user's experience and the job description, identifying strengths, skill gaps, and areas in need of improvement. Additionally, the system evaluates the recommendation performance using cosine similarity to measure the semantic alignment and NDCG (Normalized Discounted Cumulative Gain) to assess the ranking quality of the recommendations.
Automated Resume Enhancement: Based on the LLM evaluation, the system provides a detailed report highlighting sections that need modification. Suggested edits may include restructuring experience points, emphasizing relevant skills, or adding keywords that match the job postingβs expectations.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
Data Preparation and Preprocessing
This section covers the data preparation and preprocessing steps required for the Resume Recommendation System. The key stages include:
Processing resume data (PDF)
Processing LinkedIn job postings
For the LinkedIn job postings data, this tutorial uses the dataset available on Kaggle: arshkon/linkedin-job-postings.
Using the raw data directly to build the recommendation system may lead to suboptimal performance. Therefore, the data is refined and preprocessed to focus specifically on recruitment-related information to enhance the accuracy and relevance of the recommendations.
Install and Import Required Libraries
Text Splitting Configuration
Set up configurations to divide the extracted text into manageable sizes, ensuring smooth processing:
Parameter Descriptions:
chunk_size: The maximum length of each text chunk, ensuring the text is divided into manageable sections.chunk_overlap: The length of overlapping text between chunks, providing continuity and context for downstream tasks.separators: The delimiters used to split the text, such as line breaks or punctuation, to optimize the splitting process.
Defining the Pydantic Model
In this section, we define a structured data model using Pydantic, which ensures validation and consistency in the data extracted from resumes. This model is critical for organizing key sections of a resume into a format that the system can analyze effectively.
Analyzing Interests in Resumes
The analyze_interests function is designed to extract and summarize the key areas of interest and research focus from a resume. It uses a Large Language Model (LLM) to process the resume text and provide a concise summary, helping to identify the candidate's academic and professional interests effectively.
Purpose
Extracts main areas of interest and research focus from the provided resume text.
Generates a brief summary (2-3 sentences) that highlights the candidate's academic and career patterns.
Focuses solely on interests and research areas to provide targeted insights.
Analyzing Career Fit in Resumes
The analyze_career_fit function evaluates a candidate's resume to recommend the most suitable job roles along with their respective fit scores. By leveraging a Large Language Model (LLM), this function identifies key areas of expertise and rates the candidate's suitability for various technical positions.
Purpose
Recommends job roles based on the candidate's skills, research background, and career trajectory.
Assigns a fit score (0.0 to 1.0) for each role, reflecting the candidate's alignment with the position.
Processing Resumes to Extract Key Job-Related Information
The process_resume function analyzes a resume file, extracting and processing key information relevant to job applications. It combines text extraction, interest analysis, and career fit evaluation to generate structured, weighted insights from the resume.
Function Overview
Purpose
Extract key job-related information from resumes in PDF format.
Use LLM analysis to evaluate the candidate's skills, experience, projects, achievements, and education.
Assign weights to each section based on relevance to the target job role.
Resume Processing Example
Here's an example of how to use the process_resume function to extract structured data from a resume:
LinkedIn Data Preprocessing
This step involves loading job posting data and extracting only the necessary details. The dataset used for this tutorial is sourced from Kaggle: arshkon/linkedin-job-postings.
company_name: The name of the company offering the job posting.title: The title of the job being offered.description: A detailed description of the job, including responsibilities, qualifications, and expectations.max_salary: The maximum salary offered for the position.med_salary: The median salary for the position, providing an average range for the offered pay.min_salary: The minimum salary offered for the position.skills_desc: A list or summary of the required or preferred skills for the position.work_type: The type of work arrangement, such as full-time, part-time, remote, or hybrid.
Purpose of These Columns These selected columns are essential for processing job posting data. They allow the system to:
Extract relevant metadata for recommendation and filtering.
Match resumes to job postings based on skills, and job details.
Provide users with clear and actionable job-related information.
Efficient CSV Reading with Encoding Detection
This function provides a robust way to read CSV files by dynamically handling encoding issues. CSV files often come in various encodings, and incorrect encoding can cause errors when reading the file. The function attempts to read the file with the most common encodings and falls back to a detection library if necessary.
Function: read_csv_with_encoding
Purpose To efficiently read a CSV file while handling potential encoding issues, ensuring compatibility with a wide range of file formats.
How It Works
Attempt to Read with UTF-8: The function first tries to read the file using UTF-8 encoding, which is the most commonly used encoding.
If successful, the function returns the loaded DataFrame.
If a
UnicodeDecodeErroroccurs, it proceeds to the next step.
Encoding Detection with
chardet: If UTF-8 fails, the function uses thechardetlibrary to detect the file's encoding:Reads the first 10KB of the file for faster detection.
Extracts the detected encoding from the result.
Retry with Detected Encoding: The function attempts to read the file again using the detected encoding:
If successful, the DataFrame is returned.
If another
UnicodeDecodeErroroccurs, it falls back to a common encoding.
Fallback to CP949: If both UTF-8 and the detected encoding fail, the function defaults to CP949 encoding, commonly used for files in East Asian languages like Korean.
Text Cleaning Function
Hereβs a utility function designed to clean and preprocess text data for better consistency and quality:
If there are any null values in the company name field, those entries are excluded. (While other fields may also have null values, this step focuses only on excluding records with null in the company name.)
Processing Job Postings Data
The process_job_postings function integrates and processes job information from a LinkedIn dataset to create structured documents for analysis or recommendation purposes.
This function takes a DataFrame of LinkedIn job postings and processes each entry into a standardized format, combining relevant details like company name, job title, required skills, and salary information.
Setting Up ChromaDB and Storing Data
Using ChromaDB for Storing and Retrieving Resume and Job Posting Data In this section, we will explore how to use ChromaDB to store resume and job posting data as vector representations and perform similarity-based searches.
What is ChromaDB?
ChromaDB is a vector database that allows text data to be stored as embeddings, enabling efficient similarity-based searches. In our Resume Recommendation System, ChromaDB is used for the following purposes:
Vectorizing Text: Converting resume and job posting text into vector representations.
Efficient Similarity Search: Performing fast searches based on the similarity of embeddings.
Metadata-Based Search and Filtering: Enhancing search results with filters like job title, or company name.
Setup Steps Preparing Required Libraries
Before starting, import the necessary libraries:
Roles of Each Library:
langchain_community.vectorstores: Provides integration with ChromaDB.langchain_openai: Enables the use of OpenAI embedding models.chromadb: Provides vector database functionality.
Initializing ChromaDB
Set up ChromaDB and create collections:
Why Use PersistentClient?
Permanent Data Storage: Ensures that data is not lost when the application or session ends.Data Persistence Across Sessions: Allows the system to retain data for use in future queries without requiring re-upload or re-processing.Ease of Backup and Recovery: Provides a reliable way to save and restore data for robustness and fault tolerance.
Storing Data This step involves saving resume and job posting data into ChromaDB for efficient querying and management. Origin data has too many data, so we use only 500 data
Example of Job_documents_
Company Recommendation System
This section focuses on recommending companies that align with the candidate's resume and evaluates the recommendations using two key metrics:
Cosine Similarity for Recommendation Evaluation:
Measures the similarity between the candidate's resume and the job posting.
A higher cosine similarity score indicates a stronger match between the candidate's profile and the company's job requirements.
NDCG (Normalized Discounted Cumulative Gain) for Recommendation Evaluation:
Assesses the quality of the ranking of recommended companies.
A higher NDCG score signifies that the most relevant companies appear at the top of the recommendation list, reflecting better ranking performance.
Understanding the Scores
High Scores:
Indicate a strong alignment between the resume and the recommended companies (Cosine Similarity).
Demonstrate that the ranking system effectively prioritizes the most relevant companies (NDCG).
Low Scores:
Suggest weaker matches between the resume and job postings or suboptimal ranking of recommendations.
The goal is to achieve high scores in both metrics, ensuring accurate and effective company recommendations for the candidate.
Job Recommendation System with Weighted Similarity Search
This implementation utilizes a Job Recommendation System to match resumes with the most relevant job postings. By combining cosine similarity and weighted scoring, the system ensures accurate and tailored recommendations.
Personalized Matching: Matches resumes to job postings with high accuracy.
Flexible Scoring: Incorporates weighted factors to prioritize specific job attributes.
Enhanced Readability: Formats job descriptions for easy review.
Resume and Job Recommendation Evaluation System
This implementation introduces a comprehensive evaluation system for job recommendations based on resumes.
The system leverages Discounted Cumulative Gain (DCG) and Normalized Discounted Cumulative Gain (NDCG) to measure the quality of recommendations. Additionally, precision and recall metrics are calculated for further analysis.
Excute Evaluation
LLM-Based Resume Evaluation System
This section outlines the implementation of a system that uses a Large Language Model (LLM) to evaluate resumes by comparing them against job descriptions. The system provides actionable insights to improve resumes and assists in aligning candidatesβ qualifications with job requirements.
What is LLM-as-a-Judge?
The LLM-as-a-Judge system leverages the advanced reasoning and natural language understanding capabilities of an LLM to serve as an impartial evaluator in the hiring process. By acting as a "judge," the LLM compares a candidateβs resume to job requirements, evaluates their alignment, and provides actionable feedback.
Key features of the LLM-as-a-Judge system include:
Contextual Understanding: It comprehends detailed job descriptions and resumes beyond simple keyword matching, enabling nuanced evaluations.Feedback Generation: Provides insights into the candidate's strengths and areas for improvement.Decision Support: Assists hiring managers or applicants by generating a recommendation on the candidate's suitability for the role.
This system bridges the gap between human evaluation and automated analysis, ensuring more accurate and tailored results in the recruitment process.
Functionalities
The LLM-as-a-Judge system provides the following functionalities:
Detailed Analysis: Analyzes resumes and job requirements in detail, identifying key qualifications and expectations.Alignment Evaluation: Assesses how well the candidate's skills and experiences match the job requirements.Strengths and Improvement Areas: Identifies the candidate's strengths and offers suggestions for improvement.Role Suitability Recommendation: Provides a final recommendation on whether the candidate is a good fit for the role.
Key Components
1. CriterionEvaluation
The CriterionEvaluation class evaluates individual aspects of the resume based on predefined criteria:
score(int): A score from 1 to 5 representing the performance for a specific criterion.reasoning(str): A detailed explanation of why the score was assigned.evidence(List[str]): Specific elements from the resume that support the evaluation.suggestions(List[str]): Targeted recommendations for improving the evaluated area.
2. DetailedEvaluation
The DetailedEvaluation class provides a comprehensive evaluation of the resume by aggregating results across multiple criteria:
technical_fit: Assessment of technical skills and their relevance to the job.experience_relevance: Evaluation of how well the candidateβs work experience aligns with the role.industry_knowledge: Examination of the candidateβs understanding of the target industry.education_qualification: Review of academic background and certifications.soft_skills: Analysis of interpersonal and communication skills.overall_score(int): A total score (0-100) summarizing the resume's performance.key_strengths(List[str]): Highlights of the resume's strongest areas.improvement_areas(List[str]): Areas requiring enhancement for better alignment with the job.final_recommendation(str): A conclusion on the candidateβs suitability for the position.
3. LLMJudge
The LLMJudge class uses an LLM to evaluate resumes against job descriptions by analyzing criteria such as technical fit, experience relevance, and soft skills.
Responsibilities:
Processes resume text and job information.
Uses a structured prompt to guide the LLM in scoring and providing feedback.
Outputs a
DetailedEvaluationobject containing scores, evidence, and suggestions.
Features:
Dynamic prompt generation for precise LLM instructions.
Predefined evaluation criteria with customizable weights and descriptions.
4. ResumeEvaluationSystem
The ResumeEvaluationSystem orchestrates the entire resume evaluation process, from text extraction to generating improvement reports.
Responsibilities:
Processes resumes to extract clean text for analysis.
Selects the most relevant jobs for evaluation based on similarity scores.
Generates detailed reports summarizing the evaluation and suggestions.
Methods:
evaluate_with_recommendations: Evaluates a resume against the topnrecommended jobs.format_evaluation_report: Converts theDetailedEvaluationobject into a readable report.
Example: Resume Evaluation Results
π‘ Overall Score: 85/100
Evaluation Summary
π§ Technical Fit (30%): 4/5
Reasoning: Strong Python and SQL skills; lacks cloud experience.
Suggestions: Add cloud certifications like AWS.
π Experience Relevance (25%): 4/5
Reasoning: Relevant projects but no measurable outcomes.
Suggestions: Quantify achievements (e.g., "Increased sales by 15%").
π― Industry Knowledge (15%): 3/5
Reasoning: Limited mention of industry expertise.
Suggestions: Include domain-specific certifications or research.
π Education Qualification (15%): 5/5
Reasoning: Relevant degree and certifications.
π€ Soft Skills (15%): 4/5
Reasoning: Leadership demonstrated; teamwork examples sparse.
Suggestions: Add examples of collaboration.
Recommendations
Key Strengths: Strong technical skills, leadership, relevant education.
Improvements: Add measurable outcomes, industry expertise, teamwork examples.
Final Recommendation: Highly suitable; minor revisions suggested.
LLM-Based Resume Evaluation System
This system leverages a Large Language Model (LLM) to evaluate resumes against job descriptions systematically. It provides detailed feedback based on predefined evaluation criteria, helping candidates understand their strengths, areas for improvement, and overall suitability for specific roles.
Excute Evaluation
LLM-Based Resume Revise System
This tutorial demonstrates how to create a system that evaluates and improves resumes using a Large Language Model (LLM).
The system provides actionable suggestions to optimize resumes for specific job descriptions, enhancing the candidateβs chances of securing a role.
Key Components
EnhancementSuggestion Model The
EnhancementSuggestionmodel defines the structure for improvement suggestions:
section: The specific resume section being improved (e.g., "Skills" or "Work Experience").current_content: The original content of the section.improved_content: The suggested improvement for the section.explanation: A detailed explanation of why the improvement is recommended.
ResumeEnhancement Model The
ResumeEnhancementmodel provides a holistic improvement report:
improvements: A list of section-specific suggestions.keyword_optimization: Suggested keywords to include in the resume for optimization.general_suggestions: Overall suggestions for structure and presentation.action_items: Practical, actionable items for the candidate to implement.
ResumeEnhancementSystem The
ResumeEnhancementSystemclass uses an LLM to analyze resumes and generate detailed, job-specific improvement suggestions. This system:
Accepts the resume text, job information, and evaluation results as inputs.
Produces a structured output aligning with the
ResumeEnhancementmodel.Focuses on realistic, actionable improvements tailored to the target job.
IntegratedResumeSystem The
IntegratedResumeSystemcombines evaluation and enhancement processes into a seamless workflow:
Step 1: The
ResumeEvaluationSystemevaluates the resume against job requirements, providing initial scoring and feedback.Step 2: The
ResumeEnhancementSystembuilds upon the evaluation results to generate actionable suggestions for improvement.Step 3: A comprehensive improvement report is created, highlighting section-specific improvements, keyword optimizations, and general suggestions.
Example Output Format
π Section-Specific Improvements
Section: Work Experience Current: Managed team projects in retail operations. Improved: Led cross-functional teams to increase sales by 15% within six months. Reason: Emphasizes measurable outcomes and aligns with the leadership skills required for the target role.
π Recommended Keywords
"Cross-functional leadership"
"Revenue growth"
"Data-driven decision-making"
π‘ General Suggestions
Use consistent formatting for job titles and dates.
Highlight certifications relevant to the target job.
β
Actionable Steps
Update work experience details to emphasize achievements.
Include certifications and training relevant to the role.
Incorporate suggested keywords into the skills and summary sections.
Excute Evaluation
you can choose how many jobs you want to evaluate by changing the top_n value.
Last updated