Overview
Uploading a CV is the fastest way to add a candidate to Recruitier. When you upload a file, the platform’s AI reads the entire document, extracts structured data — name, title, location, skills with confidence scores, experience level, years of experience, education — and creates a candidate profile ready for you to review and confirm. The entire process, from file upload to a fully populated profile, typically takes under 10 seconds. The extraction is powered by Gemini 3 Flash Preview, which processes up to 50,000 characters of CV text (approximately 12,500 tokens) in a single pass, understanding context across the entire document rather than just scanning for keyword labels.Supported File Formats
Recruitier accepts the following CV file formats, each processed by a specialized text extraction engine:| Format | Extension | Extraction Engine | Notes |
|---|---|---|---|
.pdf | PyMuPDF (fitz) | Most common format. Works with text-based and formatted PDFs. Extracts text from all pages while preserving reading order. Password-protected PDFs are not supported. | |
| Word Document | .docx | python-docx | Microsoft Word format. Text is extracted from both paragraphs and tables, so header-style layouts where the candidate name sits in a table cell are fully supported. |
| Plain Text | .txt | Direct read | Simple text files. Useful for pasting raw CV content when other formats are not available. |
The
.doc format (legacy Word) is listed as supported but .docx is strongly preferred. If you encounter extraction issues with older .doc files, ask the candidate for a .docx or PDF version.The Upload Wizard: Step by Step
The upload wizard has four steps. After completing the final step, AI job matching starts automatically.Step 1: Upload CV
Navigate to the Candidates section and click the Upload CV button in the top right. A wizard dialog opens with two tabs at the top: Upload CV and LinkedIn URL.To upload a CV, make sure the Upload CV tab is selected. You can either drag and drop a file onto the upload area or click to browse your computer. The maximum file size is configurable but typically set to 10MB — more than enough for any standard CV.Once the file is uploaded, Recruitier’s AI processes it automatically: extracting text from the document, analyzing it with Gemini, geocoding the location, and extracting skills with confidence scores. This typically takes under 10 seconds. The wizard then advances to the next step.
Step 2: Review Profile
After extraction, you are presented with the candidate’s name and job title for review. Both fields are pre-filled from the AI extraction but fully editable.The wizard checks for duplicate names in your account. If a candidate with the same name already exists, you receive a notification and can adjust the name before proceeding.
Step 3: Review Skills
The AI extracts skills from the CV with confidence scores indicating how certain it is about each skill. Skills are displayed as interactive chips that you can toggle on or off. A color-coded confidence indicator shows high (green), medium (accent), and low (red) confidence levels.At this stage you can:
- Toggle skills on or off by clicking them (confirmed skills show a checkmark)
- Remove skills entirely by clicking the X button on each skill chip
- Add missing skills by typing in the input field and pressing Enter or clicking the add button
Step 4: Set Location
Configure the candidate’s location and search radius:See Location & Preferences for details on each setting and how they affect matching.
- Location with autocomplete for precise geocoded coordinates (latitude/longitude). Start typing to see suggestions.
- Search radius slider in kilometers (default 40 km, adjustable from 5 km to 200 km)
Additional preferences like salary expectations, job type, and flexibility can be configured later from the candidate’s profile edit dialog. The wizard focuses on the essential fields needed to start matching.
What Happens Behind the Scenes
When a CV is uploaded, the following processing chain executes in order:Text Extraction
The file is parsed using the appropriate library:
- PDF: PyMuPDF (fitz) extracts text from all pages, preserving reading order and handling multi-column layouts
- DOCX: python-docx extracts text from both
paragraphelements andtablecells, ensuring CV layouts with table-based headers are fully captured - TXT: Direct file read with encoding detection
AI Profile Extraction
The extracted text (truncated to 50,000 characters if needed) is sent to Gemini 3 Flash Preview along with a structured extraction prompt. The prompt instructs the AI to identify and return data in a specific JSON format covering all supported fields.The AI uses contextual understanding to extract data that goes beyond simple pattern matching:
- A mention of “5 years leading development teams” informs both the years of experience calculation and the experience level assignment
- A certification like “AWS Certified Solutions Architect” is captured both as a skill and as education/certification data
- Location strings like “Amsterdam area” or “Randstad” are recognized as geographic references
Skill Extraction with Confidence
The AI identifies skills throughout the CV and assigns each a confidence score between 0 and 1. The score reflects how explicitly and prominently the skill appears:
- A skill listed in a dedicated “Skills” section with emphasis gets 0.95+
- A skill mentioned in a job description context gets 0.75-0.89
- A skill inferred from a related technology or role gets 0.60-0.74
Skill Normalization
Extracted skills are mapped to canonical forms using the SKILL_ALIASES taxonomy. This taxonomy contains over 1,000 mappings from common variations to standardized skill names:
- “JS”, “Javascript”, “JavaScript”, “JavaScript Programming” all map to “javascript”
- “React.js”, “ReactJS”, “React” all map to “react”
- “Amazon Web Services”, “AWS” both map to “aws”
- “PostgreSQL”, “Postgres”, “psql” all map to “postgresql”
Location Geocoding
If a location string is found in the CV, it is geocoded through a multi-step process:
- Dutch city database check — The location is first compared against a comprehensive list of Dutch cities for fast, accurate resolution
- Nominatim API fallback — If not found in the Dutch database, the location is sent to the Nominatim geocoding API
- AI normalization — For ambiguous or non-standard location strings, AI-based normalization is used to resolve the location
Profile Creation and Cleanup
The candidate profile is created in the database with all extracted data. Before creation, any previously abandoned incomplete candidate profiles (from interrupted upload wizards) are automatically cleaned up.The original CV file is stored as a binary attachment alongside the candidate profile, allowing you to download it at any time from the candidate detail page.
The original CV file is stored alongside the candidate profile in the database as a binary (LargeBinary) field. You can download it at any time from the candidate detail page in its original format (PDF, DOCX, or TXT).
Handling Edge Cases
CV Text Is Too Short
CV Text Is Too Short
If the extracted text contains fewer than 50 characters, the system returns an error. This usually indicates:
- A scanned PDF without OCR (image-only content)
- A corrupted file
- A file that is not actually a CV (e.g., a blank template)
Name Could Not Be Extracted
Name Could Not Be Extracted
The candidate’s name is the only strictly required field. If the AI cannot determine a name from the CV content, the upload fails with an explanation. This can happen with:
- Heavily formatted CVs where the name is embedded in an image
- CVs in non-standard layouts where the name is not prominently placed
- Anonymized CVs where the name has been intentionally redacted
Duplicate Candidate Names
Duplicate Candidate Names
Recruitier checks for duplicate candidate names within your account. If you already have a candidate with the same name, you receive a conflict notification.Solution: Either rename the new candidate to include a distinguishing detail (e.g., “Jan de Vries (Amsterdam)” vs. “Jan de Vries (Rotterdam)”), or update the existing profile with the new CV data.
File Size Limits
File Size Limits
Files that exceed the maximum size (typically 10MB) are rejected before processing begins. Standard CVs are well within this limit, but CVs with embedded high-resolution images, portfolio samples, or embedded videos may exceed it.Solution: Remove large images or embedded media before uploading. For portfolio-heavy candidates, strip the portfolio section and note it in the candidate summary instead.
Non-Standard CV Layouts
Non-Standard CV Layouts
While the AI handles a wide variety of CV formats, extremely creative layouts (infographics, heavily designed PDFs with non-standard text flow) may produce lower-quality extractions. The text extraction follows the document’s reading order, which in heavily designed PDFs may not match the visual layout.Solution: If a creative CV produces poor extraction results, ask the candidate for a plain-text or standard Word version. You can always store the creative CV as a reference while using the standard version for AI extraction.
Free Tier Candidate Limit
Free Tier Candidate Limit
Free tier accounts are limited to 3 active candidates. If you have already reached this limit, the upload will be blocked until you either upgrade to a paid plan (Pro or Agency) or deactivate existing candidates.Solution: Upgrade to Pro or Agency for unlimited candidates, or remove candidates you are no longer actively working with.
Tips for Best Results
Use Text-Based PDFs
CVs exported directly from Word, Google Docs, or any text editor produce the best extraction results. The AI can analyze the full content and extract data with high confidence. Scanned documents with image-only content cannot be processed at all.
Standard Layouts Work Best
While the AI handles a wide variety of formats, traditional CV layouts with clear section headings (Experience, Education, Skills) produce the most accurate extractions. The AI can identify skills from any part of the CV, but clearly labeled sections boost confidence scores.
Always Review Skills
The AI is good but not perfect. A quick 30-second review of extracted skills ensures your candidate is matched against the right opportunities. Since skills carry 45% of the vector search weight, this small investment has an outsized impact on match quality.
Verify Title and Experience Level
The candidate’s title influences the Title vector (35% of search weight) and the experience level affects which jobs are surfaced. Ensuring these are accurate is the second-highest-impact action after skill confirmation.
Advanced
The Full Extraction Pipeline in Detail
The CV upload pipeline is a carefully orchestrated sequence that balances speed with thoroughness. Here is what happens at each layer: File Processing Layer: The system first identifies the file type by extension and routes it to the appropriate extraction engine. For PDFs, PyMuPDF processes each page in sequence, extracting text blocks and reassembling them in reading order. For DOCX files, python-docx iterates through both paragraph elements and table cells — this dual iteration is critical because many professional CV templates use tables for layout, placing the candidate’s name and contact details in table headers. AI Processing Layer: The extracted text is sent to Gemini 3 Flash Preview with a carefully engineered prompt that specifies the exact JSON structure expected in return. The prompt includes instructions for handling edge cases like:- Multiple names (pick the most prominent one)
- Multiple locations (prefer the current/most recent)
- Ambiguous experience levels (use the years-of-experience heuristic: 0-2=junior, 3-5=medior, 6-10=senior, 11+=lead)
- Skills listed in different languages (normalize to English canonical forms)
- Skills are normalized through the SKILL_ALIASES taxonomy (1,000+ mappings)
- Location is geocoded through the GeocodingService (Dutch city list, Nominatim API, AI normalization)
- Experience level is validated against years of experience
- Duplicate or near-duplicate skills are removed
How CV Text Is Used Later in Matching
The CV text stored on the candidate profile is not just for display. It plays an active role in the matching pipeline:- Embedding generation: The CV text (along with the candidate’s title and skills) is used to generate the Experience vector — one of the three vectors used for semantic search. This vector captures broader professional context, domain knowledge, and work style that skills and titles alone cannot represent.
- AI Scoring: During Stage 5 of the matching pipeline, up to 8,000 characters of the CV text are sent to the AI alongside each matched job’s description (up to 6,000 characters). The AI uses this to evaluate role fit, skills fit, experience fit, and secondary fit with specific evidence from the CV.
The 50,000 character limit on CV text storage is generous enough for any standard CV (most CVs are 2,000-5,000 characters). However, for AI scoring, only the first 8,000 characters are used. This means the most important information should appear early in the CV. Fortunately, most CVs already follow this convention with name, title, summary, and recent experience appearing first.
Connection to Other Features
- Skills flow: Skills extracted during CV upload feed directly into the skill confirmation workflow. Once confirmed, they drive the Skills vector (45% weight) in matching.
- Location flow: The geocoded location feeds into the Location & Preferences system. If the candidate provides a specific address, it can be refined using the Google Maps autocomplete picker later.
- LinkedIn enrichment: If you later add a LinkedIn URL to a CV-uploaded candidate, the system can enrich the profile with additional data. However, a new import will not overwrite CV-extracted data — it supplements it.
- Re-uploading a CV: You can update a candidate’s CV by uploading a new file. This triggers re-extraction and can update skills, title, and other fields. If skills have changed, the system detects this and can trigger re-matching.
Power-User Tips
Related
- Confirm skills and expertise for accurate matching
- Set location and preferences to target the right jobs
- Understand how matching works to get the most out of the AI pipeline
- Import from LinkedIn as an alternative to CV upload

