TB Facility Register v.5

Form Documentation

Ministry of Health Malawi
ScanForm by QED
Note

This is live documentation. It is automatically regenerated each time the underlying XLSForm, dbt pipeline, or validation configuration changes. The content on every tab always reflects the currently deployed form.


About this form

The TB Facility Register (v.5) is a paper-based data collection instrument used at health facilities in Malawi to record the registration, diagnosis, treatment, and outcomes of tuberculosis patients. It is issued by the Ministry of Health and digitised using ScanForm — an optical character recognition (OCR) platform developed by QED that converts photographed paper forms into structured electronic data.

Each physical form is printed on an 840 × 297 mm wide-format sheet (a double-A4 landscape spread) and accommodates 5 patient records per page, arranged in rows. The left page captures patient identity, registration, diagnosis, and initial laboratory results; the right page captures follow-up smear results, drug susceptibility testing, treatment outcomes, HIV and ART details, and post-TB lung disease programme data.

Form at a glance
Programme MW-TB
Version v.5
XLSForm release 2026-03-02-v1.0.0
Country Malawi
Issuing body Ministry of Health
Paper size 840 × 297 mm (double-A4 landscape)
Patients / page 5 rows
Total variables 56 per patient row
Mandatory fields 11
Conditional fields 9
Digitisation ScanForm OCR (QED.ai)

What data does this form collect?

The register follows a TB patient from first contact through to final outcome. The sections below summarise the thematic groups of data collected.

Data sections on the TB Facility Register
61 variables total across 12 thematic sections
Section Fields Key Variables
Patient Identity 3 Full name, address, phone number
Registration 3 Registration date, Participant ID, TB registration number
Demographics 3 Age (years or months), sex
🛡️ Risk Group & Referral 4 Occupation / risk group (8 categories including miners), referral source
💊 Treatment 5 Regimen (2RHZE/4RH, BPalM, BPal), site of disease, treatment history, diagnosis method
🔬 Initial Lab Results 10 LAM, Smear, Xpert, Culture — dates and MTB detected/not detected results
🔬 Follow-up Smears 9 2-month, 5-month, 6-month smear dates and graded results
🧻 Imaging & DST 4 X-ray result, DST result (RIF resistance), dates
📊 Treatment Outcome 4 Outcome (6 categories), outcome date, 2nd line escalation, DOT support
🔴 HIV & ART 7 HIV status, test timing, ARV start time, ART ID, CPT status and start date
🫁 Post-TB / PRP 3 PRP enrolment, PTLD diagnosis method, PRP outcome
📝 Comments & Admin 6 Free-text comments, book/page numbers, discard and error flags

How ScanForm digitises this form

ScanForm uses Optical Character Recognition (OCR) to convert photographed paper forms into structured data. Each field on the paper is assigned an OCR model tuned to the type of input expected.

OCR input types on this form
Each field type uses a dedicated recognition model
Input Type How OCR Processes It Example Fields
Digit boxes 🔢 Each box is recognised independently as a digit 0–9 using the `int` model. Age, smear result, registration date
Letter boxes 🔡 Each box is recognised as A–Z. Restricted alphabets reduce errors for known value sets. Age unit (Y/M), ID codes
Oval bubbles The `select_one_or_zero` model detects whether each oval has an X crossing. At most one positive per question. Sex, HIV status, treatment outcome
Pre-printed text 🏷️ Some boxes contain fixed printed characters (e.g. MTB-, ART-, 20). These are not OCR-scanned. TB registration number prefix, year century
Handwriting zones ✏️ Free-text lines are not OCR-processed. They are captured as images for human review. Full name, address, comments

Data quality: a three-layer system

Data quality is enforced at three distinct stages, each acting as a safety net for the next.

Three-layer data quality system
# Layer When Who Acts What Happens
1 📱 In-phone validation At photo capture, before submission Field worker 26 key fields checked per row. Missing required fields and multiple ovals trigger an immediate on-screen alert. Worker corrects the paper form and retakes the photo.
2 🗄️ Pipeline DQA checks After submission, on each pipeline run Data manager 41 fields checked across 5 check types: integer parse, date parse, multiple-answer selection, and clinical logic. Errors exclude records; Warnings flag for review.
3 📊 Dashboard monitoring Ongoing, daily or monthly Supervisor / MoH 25 programme indicators surfaced in dashboards. Trends, outliers and facility comparisons make systemic data quality issues visible at programme level.

Documentation tabs

Use the navigation bar to explore all aspects of this form. Each tab is described below.

Documentation site map
Tab Primary Audience Contents
🏠 Home All This page. Form summary, data collection overview, and orientation guide.
📋 Variables Data managers, researchers Complete field-by-field reference: variable names, input types, required status, conditional logic, OCR models, and constraints for all 56 variables.
🔀 Data Pipeline Technical staff, analysts How raw OCR output flows through 5 dbt transformation layers (Base → Pre-Clean → Clean → Intermediary → Refinery) to produce analysis-ready tables.
📱 In-Phone Validation Field supervisors The 26 automated checks that run instantly on the field worker’s phone when a page is photographed. Errors prompt immediate on-site correction.
📊 DQA Checks Data managers The 5 server-side data quality check blocks that run after submission: integer validation, date validation, multiple-selection detection, and clinical logic.
📈 Dashboard Indicators Programme managers, MoH 25 actionable indicators derivable from this form, with numerators, denominators, disaggregations, and recommended visualisations.

Correct completion instructions

These instructions are printed on the back of the physical form.

Filling out

  • Use the square boxes for numbers and letters.
  • Write exactly one letter or digit per box.
  • Use CAPITAL letters only.
  • Cross an oval bubble with an X to select an option.
  • Questions marked with an asterisk 605 are mandatory.

Correcting mistakes

  • Shade the incorrect answer completely.
  • Write the correct answer close to it, outside the original box.
  • Do not use correction fluid.

Example: If W was written incorrectly, shade it and write the correct letter nearby.

Code Full Meaning
New New case — never previously treated
Relapse Previously treated — cured or completed treatment
RALF Return after lost to follow-up
Fail Treatment after treatment failure
Other Previous treatment outcome unknown
Unknown Treatment history unknown

About ScanForm

ScanForm is a paper-based data digitisation platform developed by QED.ai for use in international development and public health programmes. Forms are designed in an extended XLSForm standard (XLS-ScanForm), printed and used in the field in the usual way, then photographed with a smartphone. The ScanForm mobile app applies OCR models to recognise handwritten digits, letters, and marked ovals, and immediately validates the captured data before uploading to the cloud.

Note

Document version: Auto-generated from XLSForm MW-TB_TB_Register.xlsx (release 2026-03-02-v1.0.0). For questions about this documentation contact the QED data team.