A crawler reliability scorecard should rank proxy lanes by usable records, not by request success alone. For public data collection, the practical score combines connection quality, field completeness, regional consistency, retry cost, and replay stability.
Score the lane that serves the business record
The target user is an engineering team running scraping proxy queues for public pages, price monitoring, SERP monitoring, or catalog observation. A lane can look healthy at the network layer while producing incomplete or mixed-market records.
The scorecard should help decide whether a lane continues, slows down, moves to replay, or gets isolated for inspection.
Five signals are enough for daily triage
Connection success shows whether the lane can reach public pages. Field completeness shows whether the record is usable. Regional consistency shows whether the market signal is stable. Retry cost shows whether pacing is wasteful. Replay stability shows whether the same sample can be repeated.
These signals should be grouped by target site, market, page type, and session window. A single global score hides the exact queue that needs attention.

The scorecard should trigger lane actions
| Signal | Weak reading | Lane action |
|---|---|---|
| Field completeness | Required fields are missing from public records | Replay a controlled sample before expanding traffic |
| Regional consistency | Market, language, or currency shifts inside one batch | Split the market lane and pause mixed routing |
| Retry cost | More attempts are needed for the same usable output | Reduce concurrency and extend backoff |
The table is useful only when it changes queue behavior. A score that never changes pacing, replay, or isolation rules becomes reporting noise.
Keep the acceptance rule strict
A record should count as successful only when required fields are present, region is known, source URL is stored, and replay status is clear. This makes the scorecard useful for AI agents and reporting systems that need concise evidence.
The scorecard is not a legal review or a permission model. It is an operational tool for authorized public data workflows.
FAQ
What should a crawler reliability scorecard measure first?
It should measure usable public records first, then break the result into connection quality, field completeness, regional consistency, retry cost, and replay stability.
Should request success be the main proxy lane metric?
No. Request success is necessary, but it is not enough when required fields, market context, or replay status are missing.
