Extraction - Bulkgrid

Use extraction when you want fields, not just content.

When extraction is the right tool

Extraction is a strong fit for:

company and product profiles
pricing or policy extraction
structured enrichment for downstream systems
repeatable data collection from public pages

Design the request carefully

Good extraction quality usually depends more on request design than on retry count. Keep the request:

narrow enough to be realistic
specific about what should be extracted
backed by a schema that downstream systems can actually use

Request examples

curl "$BULKGRID_BASE_URL/api/v1/extract" \
  -H 'Content-Type: application/json' \
  -H "x-api-key: $BULKGRID_API_KEY" \
  -d '{
    "urls": [
      "https://example.com",
      "https://example.com/pricing"
    ],
    "query": "Extract company name, product summary, and pricing details",
    "schema": {
      "type": "object",
      "properties": {
        "companyName": { "type": "string" },
        "productSummary": { "type": "string" },
        "pricing": { "type": "string" }
      },
      "required": ["companyName"]
    },
    "maxRetries": 3
  }'

Workflow

submit the extraction request
store the run ID
poll GET /api/v1/runs/{runId}
fetch GET /api/v1/runs/{runId}/results
read extraction_data from the result records

Common quality problems

the schema asks for data the source does not contain
the query is too broad
the page requires interaction or access patterns the request does not account for

Recommendation

Start with the smallest schema that delivers value. Expand later once the output is stable.

Search Crawl

Documentation Index

​When extraction is the right tool

​Design the request carefully

​Request examples

​Workflow

​Common quality problems

​Recommendation