Skip to content

Spider Chef 🕷️👨‍🍳

SpiderChef is a powerful, recipe-based web scraping tool that makes data extraction systematic and reproducible. By defining scraping procedures as "recipes" with sequential "steps," SpiderChef allows you to craft elegant, maintainable data extraction workflows.

                   /\
                  /  \
                 |  _ \                   _
                 | / \ \   .--,--        / \
                 |/   \ \  `.  ,.'      /   \
                 /     \ |  |___|  /\  /     \
                /|      \|  ~  ~  /  \/       \
        _______/_|_______\ (o)(o)/___/\_____   \
       /      /  |        (______)     \    \   \_
      /      /   |                      \    \
     /      /    |                       \    \
    /      /     |                        \    \
   /     _/      |                         \    \
  /             _|                          \    \_
_/                                           \_      

Features

  • Recipe-Based Architecture: Define extraction workflows as YAML recipes
  • Modular Step System: Build complex scraping logic from reusable components
  • Async Support: Handle both synchronous and asynchronous extraction steps
  • Type Safety: Fully typed for better development experience
  • Extensible Design: Easily create custom steps for specialized extraction needs

Installation

# If you want to use the cli
pip install spiderchef[cli]

# If you just want the library usage
pip install spiderchef

Why SpiderChef?

Traditional web scraping often involves writing complex, difficult-to-maintain code that mixes HTTP requests, parsing, and business logic. SpiderChef separates these concerns by:

  • Breaking extraction into discrete, reusable steps
  • Defining workflows as declarative recipes
  • Handling common extraction patterns with built-in steps
  • Making scraping procedures reproducible and maintainable

Whether you're scraping product data, monitoring prices, or extracting research information, SpiderChef helps you build structured, reliable data extraction pipelines.