Uploading a CV - Recruitier

Overview

Uploading a CV is the fastest way to add a candidate to Recruitier. When you upload a file, the platform’s AI reads the entire document, extracts structured data — name, title, location, skills with confidence scores, experience level, years of experience, education — and creates a candidate profile ready for you to review and confirm. The entire process, from file upload to a fully populated profile, typically takes under 10 seconds. The extraction is powered by Gemini 3 Flash Preview, which processes up to 50,000 characters of CV text (approximately 12,500 tokens) in a single pass, understanding context across the entire document rather than just scanning for keyword labels.

Supported File Formats

Recruitier accepts the following CV file formats, each processed by a specialized text extraction engine:

Format	Extension	Extraction Engine	Notes
PDF	`.pdf`	PyMuPDF (fitz)	Most common format. Works with text-based and formatted PDFs. Extracts text from all pages while preserving reading order. Password-protected PDFs are not supported.
Word Document	`.docx`	python-docx	Microsoft Word format. Text is extracted from both paragraphs and tables, so header-style layouts where the candidate name sits in a table cell are fully supported.
Plain Text	`.txt`	Direct read	Simple text files. Useful for pasting raw CV content when other formats are not available.

Scanned PDFs that contain only images (no selectable text) cannot be processed. The text extraction engine needs actual text content in the file — it does not perform OCR (optical character recognition). If a candidate’s CV is a scanned document, consider asking them for a digital version, converting it to text-based PDF using an OCR tool first, or using the LinkedIn import option instead.

The .doc format (legacy Word) is listed as supported but .docx is strongly preferred. If you encounter extraction issues with older .doc files, ask the candidate for a .docx or PDF version.

The Upload Wizard: Step by Step

The upload wizard has four steps. After completing the final step, AI job matching starts automatically.

Step 1: Upload CV

Navigate to the Candidates section and click the Upload CV button in the top right. A wizard dialog opens with two tabs at the top: Upload CV and LinkedIn URL.To upload a CV, make sure the Upload CV tab is selected. You can either drag and drop a file onto the upload area or click to browse your computer. The maximum file size is configurable but typically set to 10MB — more than enough for any standard CV.Once the file is uploaded, Recruitier’s AI processes it automatically: extracting text from the document, analyzing it with Gemini, geocoding the location, and extracting skills with confidence scores. This typically takes under 10 seconds. The wizard then advances to the next step.

You can also import a candidate from LinkedIn directly from this same wizard by switching to the LinkedIn URL tab. See Import from LinkedIn for details.

Step 2: Review Profile

After extraction, you are presented with the candidate’s name and job title for review. Both fields are pre-filled from the AI extraction but fully editable.The wizard checks for duplicate names in your account. If a candidate with the same name already exists, you receive a notification and can adjust the name before proceeding.

Take a moment to verify the candidate’s name and title. These fields directly affect matching quality because the title influences the Title vector embedding (35% of semantic search weight) and helps the AI understand the candidate’s role and seniority level. A title like “Senior Python Developer” produces very different match results than “Software Engineer”.

Step 3: Review Skills

The AI extracts skills from the CV with confidence scores indicating how certain it is about each skill. Skills are displayed as interactive chips that you can toggle on or off. A color-coded confidence indicator shows high (green), medium (accent), and low (red) confidence levels.At this stage you can:

Toggle skills on or off by clicking them (confirmed skills show a checkmark)
Remove skills entirely by clicking the X button on each skill chip
Add missing skills by typing in the input field and pressing Enter or clicking the add button

A counter at the bottom shows how many skills are selected out of the total extracted. You can also skip this step entirely and go straight to location.

Skills confirmation is the critical gate for match quality. Skills carry 45% of the vector search weight — more than any other factor. Investing 30 seconds in skill review pays dividends in match relevance.

See Skills & Expertise for a detailed guide on managing skills effectively.

Step 4: Set Location

Configure the candidate’s location and search radius:

Location with autocomplete for precise geocoded coordinates (latitude/longitude). Start typing to see suggestions.
Search radius slider in kilometers (default 40 km, adjustable from 5 km to 200 km)

Once you click Start Matching, the wizard completes and you are redirected to the candidate detail page. AI job matching starts automatically in the background. A real-time progress indicator on the candidate page shows each matching stage as it executes (Analyzing, Searching, Processing, Scoring). You do not need to wait — you can continue working while matching runs.

Additional preferences like salary expectations, job type, and flexibility can be configured later from the candidate’s profile edit dialog. The wizard focuses on the essential fields needed to start matching.

See Location & Preferences for details on each setting and how they affect matching.

What Happens Behind the Scenes

When a CV is uploaded, the following processing chain executes in order:

Text Extraction

The file is parsed using the appropriate library:

PDF: PyMuPDF (fitz) extracts text from all pages, preserving reading order and handling multi-column layouts
DOCX: python-docx extracts text from both paragraph elements and table cells, ensuring CV layouts with table-based headers are fully captured
TXT: Direct file read with encoding detection

Table text extraction is particularly important for Word documents. Many professional CV templates place the candidate’s name, contact details, and key information in table cells that form the header area. Without table extraction, this critical data would be lost.

AI Profile Extraction

The extracted text (truncated to 50,000 characters if needed) is sent to Gemini 3 Flash Preview along with a structured extraction prompt. The prompt instructs the AI to identify and return data in a specific JSON format covering all supported fields.The AI uses contextual understanding to extract data that goes beyond simple pattern matching:

A mention of “5 years leading development teams” informs both the years of experience calculation and the experience level assignment
A certification like “AWS Certified Solutions Architect” is captured both as a skill and as education/certification data
Location strings like “Amsterdam area” or “Randstad” are recognized as geographic references

Skill Extraction with Confidence

The AI identifies skills throughout the CV and assigns each a confidence score between 0 and 1. The score reflects how explicitly and prominently the skill appears:

A skill listed in a dedicated “Skills” section with emphasis gets 0.95+
A skill mentioned in a job description context gets 0.75-0.89
A skill inferred from a related technology or role gets 0.60-0.74

Skill Normalization

Extracted skills are mapped to canonical forms using the SKILL_ALIASES taxonomy. This taxonomy contains over 1,000 mappings from common variations to standardized skill names:

“JS”, “Javascript”, “JavaScript”, “JavaScript Programming” all map to “javascript”
“React.js”, “ReactJS”, “React” all map to “react”
“Amazon Web Services”, “AWS” both map to “aws”
“PostgreSQL”, “Postgres”, “psql” all map to “postgresql”

This normalization prevents duplicate skills and ensures that matching works correctly regardless of how a skill was written in the CV versus how it appears in job postings. The research backing this approach (IBM AAAI) showed a 29% improvement in Mean Reciprocal Rank when using skill taxonomy normalization.

Location Geocoding

If a location string is found in the CV, it is geocoded through a multi-step process:

Dutch city database check — The location is first compared against a comprehensive list of Dutch cities for fast, accurate resolution
Nominatim API fallback — If not found in the Dutch database, the location is sent to the Nominatim geocoding API
AI normalization — For ambiguous or non-standard location strings, AI-based normalization is used to resolve the location

The geocoded result includes latitude, longitude, and a normalized location display name. These coordinates are essential for the distance-based scoring in the matching pipeline.

Profile Creation and Cleanup

The candidate profile is created in the database with all extracted data. Before creation, any previously abandoned incomplete candidate profiles (from interrupted upload wizards) are automatically cleaned up.The original CV file is stored as a binary attachment alongside the candidate profile, allowing you to download it at any time from the candidate detail page.

The original CV file is stored alongside the candidate profile in the database as a binary (LargeBinary) field. You can download it at any time from the candidate detail page in its original format (PDF, DOCX, or TXT).

Handling Edge Cases

CV Text Is Too Short

If the extracted text contains fewer than 50 characters, the system returns an error. This usually indicates:

A scanned PDF without OCR (image-only content)
A corrupted file
A file that is not actually a CV (e.g., a blank template)

Solution: Try converting the file to a different format, re-save the PDF from the original application, or ask the candidate for a text-based version. If the candidate’s CV is genuinely a scanned document, use the LinkedIn import option instead.

Name Could Not Be Extracted

The candidate’s name is the only strictly required field. If the AI cannot determine a name from the CV content, the upload fails with an explanation. This can happen with:

Heavily formatted CVs where the name is embedded in an image
CVs in non-standard layouts where the name is not prominently placed
Anonymized CVs where the name has been intentionally redacted

Solution: If the name extraction fails, you can try re-uploading with a different file format, or manually enter the candidate’s name before upload if the platform supports it.

Duplicate Candidate Names

Recruitier checks for duplicate candidate names within your account. If you already have a candidate with the same name, you receive a conflict notification.Solution: Either rename the new candidate to include a distinguishing detail (e.g., “Jan de Vries (Amsterdam)” vs. “Jan de Vries (Rotterdam)”), or update the existing profile with the new CV data.

File Size Limits

Files that exceed the maximum size (typically 10MB) are rejected before processing begins. Standard CVs are well within this limit, but CVs with embedded high-resolution images, portfolio samples, or embedded videos may exceed it.Solution: Remove large images or embedded media before uploading. For portfolio-heavy candidates, strip the portfolio section and note it in the candidate summary instead.

Non-Standard CV Layouts

While the AI handles a wide variety of CV formats, extremely creative layouts (infographics, heavily designed PDFs with non-standard text flow) may produce lower-quality extractions. The text extraction follows the document’s reading order, which in heavily designed PDFs may not match the visual layout.Solution: If a creative CV produces poor extraction results, ask the candidate for a plain-text or standard Word version. You can always store the creative CV as a reference while using the standard version for AI extraction.

Free Tier Candidate Limit

Free tier accounts are limited to 3 active candidates. If you have already reached this limit, the upload will be blocked until you either upgrade to a paid plan (Pro or Agency) or deactivate existing candidates.Solution: Upgrade to Pro or Agency for unlimited candidates, or remove candidates you are no longer actively working with.

Tips for Best Results

Use Text-Based PDFs

CVs exported directly from Word, Google Docs, or any text editor produce the best extraction results. The AI can analyze the full content and extract data with high confidence. Scanned documents with image-only content cannot be processed at all.

Standard Layouts Work Best

While the AI handles a wide variety of formats, traditional CV layouts with clear section headings (Experience, Education, Skills) produce the most accurate extractions. The AI can identify skills from any part of the CV, but clearly labeled sections boost confidence scores.

Always Review Skills

The AI is good but not perfect. A quick 30-second review of extracted skills ensures your candidate is matched against the right opportunities. Since skills carry 45% of the vector search weight, this small investment has an outsized impact on match quality.

Verify Title and Experience Level

The candidate’s title influences the Title vector (35% of search weight) and the experience level affects which jobs are surfaced. Ensuring these are accurate is the second-highest-impact action after skill confirmation.

Advanced

The Full Extraction Pipeline in Detail

The CV upload pipeline is a carefully orchestrated sequence that balances speed with thoroughness. Here is what happens at each layer: File Processing Layer: The system first identifies the file type by extension and routes it to the appropriate extraction engine. For PDFs, PyMuPDF processes each page in sequence, extracting text blocks and reassembling them in reading order. For DOCX files, python-docx iterates through both paragraph elements and table cells — this dual iteration is critical because many professional CV templates use tables for layout, placing the candidate’s name and contact details in table headers. AI Processing Layer: The extracted text is sent to Gemini 3 Flash Preview with a carefully engineered prompt that specifies the exact JSON structure expected in return. The prompt includes instructions for handling edge cases like:

Multiple names (pick the most prominent one)
Multiple locations (prefer the current/most recent)
Ambiguous experience levels (use the years-of-experience heuristic: 0-2=junior, 3-5=medior, 6-10=senior, 11+=lead)
Skills listed in different languages (normalize to English canonical forms)

Post-Processing Layer: After the AI returns structured data, several normalization steps run:

Skills are normalized through the SKILL_ALIASES taxonomy (1,000+ mappings)
Location is geocoded through the GeocodingService (Dutch city list, Nominatim API, AI normalization)
Experience level is validated against years of experience
Duplicate or near-duplicate skills are removed

How CV Text Is Used Later in Matching

The CV text stored on the candidate profile is not just for display. It plays an active role in the matching pipeline:

Embedding generation: The CV text (along with the candidate’s title and skills) is used to generate the Experience vector — one of the three vectors used for semantic search. This vector captures broader professional context, domain knowledge, and work style that skills and titles alone cannot represent.
AI Scoring: During Stage 5 of the matching pipeline, up to 8,000 characters of the CV text are sent to the AI alongside each matched job’s description (up to 6,000 characters). The AI uses this to evaluate role fit, skills fit, experience fit, and secondary fit with specific evidence from the CV.

The 50,000 character limit on CV text storage is generous enough for any standard CV (most CVs are 2,000-5,000 characters). However, for AI scoring, only the first 8,000 characters are used. This means the most important information should appear early in the CV. Fortunately, most CVs already follow this convention with name, title, summary, and recent experience appearing first.

Connection to Other Features

Skills flow: Skills extracted during CV upload feed directly into the skill confirmation workflow. Once confirmed, they drive the Skills vector (45% weight) in matching.
Location flow: The geocoded location feeds into the Location & Preferences system. If the candidate provides a specific address, it can be refined using the Google Maps autocomplete picker later.
LinkedIn enrichment: If you later add a LinkedIn URL to a CV-uploaded candidate, the system can enrich the profile with additional data. However, a new import will not overwrite CV-extracted data — it supplements it.
Re-uploading a CV: You can update a candidate’s CV by uploading a new file. This triggers re-extraction and can update skills, title, and other fields. If skills have changed, the system detects this and can trigger re-matching.

Power-User Tips

Batch your CV processing. While you cannot upload multiple CVs simultaneously, you can optimize your workflow by uploading a CV, quickly confirming the name and title (skip other fields for now), immediately confirming skills, setting basic location preferences, and then moving to the next candidate while matching runs in the background. Return later to refine individual profiles.

Use the summary field strategically. The AI extracts the candidate’s own professional summary, but you can edit this field to add recruiter context: “Looking for senior backend roles only, available from March 2026, prefers scale-ups over enterprise.” This context helps you remember candidate preferences that the AI cannot infer from the CV.

For multilingual CVs, the AI handles Dutch, English, and German CVs well. However, skills are always normalized to English canonical forms. If a CV is in Dutch and mentions “Projectmanagement”, it will be normalized to “project management” in the skills list.

Confirm skills and expertise for accurate matching
Set location and preferences to target the right jobs
Understand how matching works to get the most out of the AI pipeline
Import from LinkedIn as an alternative to CV upload

​Overview

​Supported File Formats

​The Upload Wizard: Step by Step

​What Happens Behind the Scenes

​Handling Edge Cases

​Tips for Best Results

Use Text-Based PDFs

Standard Layouts Work Best

Always Review Skills

Verify Title and Experience Level

​Advanced

​The Full Extraction Pipeline in Detail

​How CV Text Is Used Later in Matching

​Connection to Other Features

​Power-User Tips

​Related

Overview

Supported File Formats

The Upload Wizard: Step by Step

What Happens Behind the Scenes

Handling Edge Cases

Tips for Best Results

Advanced

The Full Extraction Pipeline in Detail

How CV Text Is Used Later in Matching

Connection to Other Features

Power-User Tips

Related