A proxy setup can look healthy while your dataset quietly degrades. The fastest way to catch this is a field completeness scorecard: track the key fields you need per page type, compute a daily completeness rate, and alert on “missing-field drift” before it hits reports.
The decision this scorecard supports
Use this tool when you need to decide whether a drop in data quality comes from the market (real change) or from your collection slice (proxy exits, session behavior, pacing, retries). It is especially useful for public ecommerce pages, SERP monitoring snapshots, and pricing datasets.
Signals to collect first
- Page type (product, listing, search result, article)
- Key fields per page type (price, currency, availability, location hints, identifiers)
- Exit region label used for the request
- Retry count and time window
A simple completeness table your team can run daily
| Page type | Key fields | Pass threshold |
|---|---|---|
| Product | price, currency, availability, product id | >= 98% |
| Listing | item count, price range, pagination | >= 95% |
| SERP snapshot | result titles, sources, timestamps | >= 97% |

Metrics that show whether it works
The scorecard is useful only if it can explain a change. Track these three trends together:
- Completeness rate by page type
- Region consistency for each market slice
- Retry pressure (how often failures concentrate into bursts)
If completeness drops while retries spike, fix pacing and backoff first. If completeness drops while region consistency fails, fix exits first. If both are stable and completeness still drops, the target likely changed.
Put it into daily operations
Make completeness a release gate. Before increasing volume, require two to three stable cycles where completeness and region consistency both pass. This reduces “scale first, debug forever” failures.
FAQ
Is completeness rate better than success rate?
They measure different things. Success rate tells you requests returned something. Completeness rate tells you your dataset is usable. For monitoring and reporting, completeness is often the stronger metric.
How many fields should I track?
Start with 4 to 8 fields per page type. Pick the fields that drive business decisions. Too many fields makes the scorecard noisy and slow to maintain.
What is a good alert threshold?
Alert on a sustained drop, not a single bad run. A common pattern is “two consecutive windows below threshold” plus a region-consistency check.
