
LeBonCoin Scraper: Anti-Scraping Bypass & Data Extraction
Robust Python web scraper extracting seller contacts from Leboncoin.fr with 100% success rate using ScrapFly's anti-scraping technology.
Project Overview
The Challenge
Leboncoin.fr implements sophisticated anti-scraping protection that blocks automated requests and bot traffic, making traditional scraping approaches ineffective.
Phone numbers are only revealed after JavaScript execution and user interaction, requiring full browser rendering capabilities rather than simple HTML parsing.
Access to local listings requires French IP addresses due to geolocation restrictions, creating barriers for international scraping operations.
Frequent HTML structure changes and dynamic CSS classes require robust extraction strategies to maintain reliability over time.
Need to balance extraction success rate with API costs for scalable operations, optimizing the cost-per-contact metric.
Developed a production-grade web scraper for Leboncoin.fr, France's largest classified ads platform, achieving 100% phone extraction success rate in production tests.
Implemented sophisticated anti-scraping bypass using ScrapFly's residential proxy network with French geolocation and JavaScript rendering capabilities to reveal dynamically-loaded phone numbers.
Designed a robust 3-tier fallback extraction strategy combining HTML selectors, phone link detection, and regex patterns to handle frequent HTML structure changes and ensure data capture reliability.
Optimized for cost efficiency with configurable limits and environment-based configuration, achieving $0.33 per 10 contacts while maintaining high success rates and scalability from light to heavy usage.
Technical Architecture
ScrapFly Integration Layer: Handles API authentication, anti-scraping protection bypass, and residential proxy configuration with French geolocation
Two-Phase Scraping Pipeline: Phase 1 collects ad URLs from search results, Phase 2 extracts detailed information including phone numbers
Multi-Method Phone Extraction: Implements 3-tier fallback strategy using HTML selectors, phone links, and regex patterns with French format validation
Data Processing Pipeline: BeautifulSoup HTML parsing with robust price extraction handling Unicode spaces and structured JSON output
Key Challenges & Solutions
Anti-Scraping Detection
Implemented ScrapFly's ASP bypass with residential proxy rotation, realistic browser headers, and auto-retry mechanism to avoid detection and blocking.
JavaScript-Rendered Content
Enabled full JavaScript execution with 3-second rendering wait, DOM readiness checks, and auto-scroll to trigger lazy-loaded phone numbers.
Dynamic HTML Selectors
Built 3-tier fallback extraction using data-qa-id attributes, tel: links, and regex patterns to handle frequent HTML structure changes.
Cost Optimization
Implemented configurable limits, environment-based tuning, and efficient two-phase approach to minimize API costs while maintaining quality.
Data Validation
Created comprehensive validation system with regex patterns, length checks, prefix validation, and duplicate detection for data quality.
Impact & Results
Achieved 100% phone extraction success rate in production tests (10/10 contacts)
Optimized performance to 60 seconds execution time for 10 contacts
Reduced cost to $0.33 per 10 contacts through efficient API usage
Enabled scalable operations from $4.50/month (light) to $30/month (heavy usage)
Open-sourced with MIT license, comprehensive documentation, and validation reports
Key Features
- Anti-scraping protection bypass with ScrapFly ASP
- Residential French proxies for geolocation compliance
- JavaScript rendering for dynamic content extraction
- Multi-method phone extraction with 3-tier fallback
- Robust price parsing with Unicode space handling
- Environment-based configuration (.env support)
- Comprehensive error handling and logging
- Production-validated with detailed test reports
Technologies Used
Project Gallery


Project Details
Client
Personal Project
Timeline
2025
Role
Solo Developer
More Projects
© 2025 Firas Jday. All rights reserved.