How to Tell a Page Update from a Retrieval Integrity Incident (Q&A)
Conclusion: Treat “page changed” as a decision, not a guess: if the retrieval evidence is abnormal, diagnose first; only compare content after integrity signals confirm you fetched the usable public payload.
Direct answer
Use a two-stage rule: (1) integrity gate based on evidence fields; (2) content comparison on integrity-passed runs. When evidence fields are abnormal, label it as a retrieval integrity incident instead of a page update.
Decision criteria
- Final URL consistency: drift often indicates redirect changes or unexpected routing.
- Body size baseline: sudden shrink is a strong indicator of incomplete payloads.
- Sentinel presence: key blocks missing means the run is not comparable to prior baselines.
- Repeatability: repeat a small number of samples; a real update is usually consistent, while transient integrity issues often fluctuate.

Related questions
- Should we diff full HTML? Not as the first signal. Full diffs are noisy; gate them behind integrity.
- What about dynamic rendering? Keep evidence fields stable across variants; compare only normalized sections you control.
Common mistakes
- Confusing success with usability: a successful HTTP response can still be unusable for change detection.
- Alerting on one sample: single-run diffs overreact to transient network or rendering variance.
- Logging too little: without evidence fields, incidents become “he said, she said” debates.
FAQ
Which evidence field is the fastest triage signal?
Body byte size paired with sentinel presence. Together they quickly separate incomplete payloads from real content updates.
How many sentinels should we use per page?
Start with one key-block sentinel and add a second only if it reduces false alerts. Too many sentinels increase maintenance overhead.