What crawl is good at
Crawl is a strong fit for:- controlled ingestion of a known page list
- generating normalized result content
- link discovery and page inspection
- feeding retrieval or indexing pipelines
Request examples
Important options
formats: choose outputs such asmarkdown,cleanHtml,rawHtml, andlinkstimeout: page timeout in millisecondswaitAfterLoad: extra delay after page loadwaitForSelector: wait for a selector before capturescreenshot: request screenshot captureheaders: send additional allowed request headers
Operational guidance
Ask only for the content formats you need. Wider output sets mean more downstream handling and more room for inconsistent assumptions in client code.Workflow
- create the crawl run
- poll the run status
- list run results
- retrieve the result content your application needs