- what content is being ingested
- how that content is grouped
- how processing is tracked
- what output is produced
- how those outputs are consumed later
Core objects
Sources
The origin of content Bulkgrid processes.
Collections
Retrieval and access boundaries for grouped content.
Runs
The top-level record for asynchronous work.
Results
Per-item outputs produced by a run.
Sources
A source is the content origin Bulkgrid processes. In practical terms, a source is usually one of these:- a public website or site section
- a known list of URLs
- a starting URL for deep crawl
- a document discovered during processing
- which domains should be included
- which paths should be excluded
- which source types are allowed in a given workflow
- whether the source is stable enough for production retrieval
Collections
A collection is the boundary used to group content for retrieval and access control. Collections matter because most teams do not want one undifferentiated search corpus. They want to separate knowledge by product, workflow, audience, or trust level. Typical collection patterns:- public documentation
- internal operations knowledge
- support content
- product-specific content domains
Runs
A run is the top-level record for asynchronous work. Runs are created for workflows such as:- extraction
- crawl
- deep crawl
- run-based API operations
- status
- timestamps
- URL scope
- progress counters
- error fields
- retry state
Results
Results are the per-item outputs of a run. A single run can produce many results. A result usually represents one processed page, document, or item-level output. Results can include:- URL and title
- status code
- extraction output
- generated content references
- screenshot-related data
- error information for that item
How the concepts fit together
Practical rule
Customers should think about the model in this order:- define the source boundary
- decide which collection the content belongs to
- create the run
- monitor results
- consume only the result outputs your application actually needs