Source Management

Source management is one of the highest-leverage parts of a Bulkgrid integration. Poor source boundaries create noisy retrieval, wasted crawl work, and harder operational debugging.

What a source is in Bulkgrid

A source is a first-class product object, not just a URL passed into a one-off crawl. The current source API supports domain-type sources with configuration such as:

identifier
label
visibility
source_mode
crawl_config
crawl_interval
custom_interval_minutes

Source modes

Current source modes include:

discover
selected_pages

These modes affect how recrawl behaves and how customers should think about source scope.

Source lifecycle surfaces

For a source, the product supports:

source details
source status
source folders
source documents
source changes
source runs
manual recrawl

The public API exposes these as dedicated endpoints, not as one overloaded source response.

What to decide before you crawl

Domain Scope

Decide which domains are in scope and which are never allowed.

Path Rules

Define which paths are included and which should always be excluded.

Document Links

Decide whether linked documents belong in the same ingestion flow.

Knowledge Boundaries

Keep support, marketing, and internal knowledge separated when needed.

Practical recommendation

Start with a small, high-value source boundary. Expand only after you validate retrieval quality and operational behavior. For domain sources, Bulkgrid normalizes the identifier to the URL origin when it creates the source. That means https://docs.example.com/foo and https://docs.example.com/bar are treated as the same source root.

Retry and Cancel Runs Collections and Access

Documentation Index

​What a source is in Bulkgrid

​Source modes

​Source lifecycle surfaces

​What to decide before you crawl