Skip to main content
Bulkgrid supports source recrawl behavior directly on sources.

Source modes matter

A source can operate in one of two modes:
  • discover
  • selected_pages
That matters because manual recrawl behaves differently depending on source mode.

Recrawl behavior

Discover mode

Bulkgrid creates a deep crawl run from the source identifier and source crawl configuration. The current implementation applies deep crawl defaults such as:
  • includeExternal: false
  • scoreThreshold: 0
  • normalizeScores: true
  • restrictToStartPath: true

Selected pages mode

Bulkgrid creates a crawl run from the currently known selected document URLs. If no selected URLs are stored yet, Bulkgrid falls back to the source identifier. If neither exists, recrawl fails.

Source refresh configuration

Source creation and update schemas support:
  • crawl interval values such as hourly, daily, weekly, monthly, and custom
  • custom interval minutes
  • crawl config fields such as include and exclude paths
Manual recrawl also links the new run back to the source as a source run with crawl type manual, so the run history stays connected to the source lifecycle.

Monitoring surfaces

Today you can monitor a source through:
  • source status
  • source runs
  • source document listings
  • source change listings
The source status endpoint currently returns whether the source is crawling plus the latest status, total size, and total item count.