Core Concepts - Bulkgrid

Bulkgrid is easier to adopt once the core concepts are clear. Customers usually care about five things:

what content is being ingested
how that content is grouped
how processing is tracked
what output is produced
how those outputs are consumed later

Core objects

Sources

The origin of content Bulkgrid processes.

Collections

Retrieval and access boundaries for grouped content.

Runs

The top-level record for asynchronous work.

Results

Per-item outputs produced by a run.

Sources

A source is the content origin Bulkgrid processes. In practical terms, a source is usually one of these:

a public website or site section
a known list of URLs
a starting URL for deep crawl
a document discovered during processing

Customers usually think about sources in terms of scope and trust:

which domains should be included
which paths should be excluded
which source types are allowed in a given workflow
whether the source is stable enough for production retrieval

Collections

A collection is the boundary used to group content for retrieval and access control. Collections matter because most teams do not want one undifferentiated search corpus. They want to separate knowledge by product, workflow, audience, or trust level. Typical collection patterns:

public documentation
internal operations knowledge
support content
product-specific content domains

Runs

A run is the top-level record for asynchronous work. Runs are created for workflows such as:

extraction
crawl
deep crawl
run-based API operations

Each run tracks operational state such as:

status
timestamps
URL scope
progress counters
error fields
retry state

Results

Results are the per-item outputs of a run. A single run can produce many results. A result usually represents one processed page, document, or item-level output. Results can include:

URL and title
status code
extraction output
generated content references
screenshot-related data
error information for that item

How the concepts fit together

Practical rule

Customers should think about the model in this order:

define the source boundary
decide which collection the content belongs to
create the run
monitor results
consume only the result outputs your application actually needs

How Bulkgrid Works Quickstart

Documentation Index

​Core objects

Sources

Collections

Runs

Results

​Sources

​Collections

​Runs

​Results

​How the concepts fit together

​Practical rule

Core objects

Sources

Collections

Runs

Results

How the concepts fit together

Practical rule