Skip to main content
Source management is one of the highest-leverage parts of a Bulkgrid integration. Poor source boundaries create noisy retrieval, wasted crawl work, and harder operational debugging.

What a source is in Bulkgrid

A source is a first-class product object, not just a URL passed into a one-off crawl. The current source API supports domain-type sources with configuration such as:
  • identifier
  • label
  • visibility
  • source_mode
  • crawl_config
  • crawl_interval
  • custom_interval_minutes

Source modes

Current source modes include:
  • discover
  • selected_pages
These modes affect how recrawl behaves and how customers should think about source scope.

Source lifecycle surfaces

For a source, the product supports:
  • source details
  • source status
  • source folders
  • source documents
  • source changes
  • source runs
  • manual recrawl
The public API exposes these as dedicated endpoints, not as one overloaded source response.

What to decide before you crawl

Domain Scope

Decide which domains are in scope and which are never allowed.

Path Rules

Define which paths are included and which should always be excluded.

Document Links

Decide whether linked documents belong in the same ingestion flow.

Knowledge Boundaries

Keep support, marketing, and internal knowledge separated when needed.

Practical recommendation

Start with a small, high-value source boundary. Expand only after you validate retrieval quality and operational behavior. For domain sources, Bulkgrid normalizes the identifier to the URL origin when it creates the source. That means https://docs.example.com/foo and https://docs.example.com/bar are treated as the same source root.