Crawl - Bulkgrid

Use crawl when you already know the exact URLs you want processed.

What crawl is good at

Crawl is a strong fit for:

controlled ingestion of a known page list
generating normalized result content
link discovery and page inspection
feeding retrieval or indexing pipelines

Request examples

curl "$BULKGRID_BASE_URL/api/v1/crawl" \
  -H 'Content-Type: application/json' \
  -H "x-api-key: $BULKGRID_API_KEY" \
  -d '{
    "urls": [
      "https://example.com/docs",
      "https://example.com/pricing"
    ],
    "strategy": "lexical",
    "options": {
      "formats": ["markdown", "cleanHtml", "links"],
      "timeout": 30000,
      "blockAds": true,
      "useInteractions": true,
      "waitForImages": false
    }
  }'

Important options

formats: choose outputs such as markdown, cleanHtml, rawHtml, and links
timeout: page timeout in milliseconds
waitAfterLoad: extra delay after page load
waitForSelector: wait for a selector before capture
screenshot: request screenshot capture
headers: send additional allowed request headers

Operational guidance

Ask only for the content formats you need. Wider output sets mean more downstream handling and more room for inconsistent assumptions in client code.

Workflow

create the crawl run
poll the run status
list run results
retrieve the result content your application needs

Extraction Deep Crawl

Documentation Index

​What crawl is good at

​Request examples

​Important options

​Operational guidance

​Workflow

What crawl is good at

Request examples

Important options

Operational guidance

Workflow