Getting Started with SpiderChef
This guide will help you get up and running with SpiderChef quickly. We'll cover installation, basic usage, and a simple example to demonstrate how SpiderChef works.
Installation
SpiderChef can be installed using pip:
CLI Usage
SpiderChef comes with a command-line interface that makes it easy to run recipes:
# Run a recipe
spiderchef cook recipes/example.yaml
# Create a new recipe template
spiderchef recipe new my_extraction
Basic Library Usage
Here's a simple example of using SpiderChef in your Python code:
import asyncio
from spiderchef import Recipe
# Import a recipe from a YAML file
recipe = Recipe.from_yaml('recipe_example.yaml')
# Run the recipe
asyncio.run(recipe.cook())
Your First Recipe
Here's a basic recipe example that fetches a webpage and extracts information:
base_url: https://example.com
name: ProductExtractor
steps:
- type: fetch
name: fetch_product_page
page_type: text
path: /products
params:
category: electronics
sort: price_asc
- type: regex
name: extract_product_urls
expression: '"(\/product\/[^"]+)"'
- type: join_base_url
name: format_urls
This recipe:
- Fetches the product listing page
- Extracts product URLs using a regex pattern
- Joins the extracted URLs with the base URL
Next Steps
Now that you have a basic understanding of SpiderChef, you can:
- Learn about the Recipe Format
- Explore available Step Types
- Discover how to use Variables in your recipes
- Create Custom Steps for specialized extraction needs