TB Facility Register v.5
Form Documentation
This is live documentation. It is automatically regenerated each time the underlying XLSForm, dbt pipeline, or validation configuration changes. The content on every tab always reflects the currently deployed form.
About this form
The TB Facility Register (v.5) is a paper-based data collection instrument used at health facilities in Malawi to record the registration, diagnosis, treatment, and outcomes of tuberculosis patients. It is issued by the Ministry of Health and digitised using ScanForm — an optical character recognition (OCR) platform developed by QED that converts photographed paper forms into structured electronic data.
Each physical form is printed on an 840 × 297 mm wide-format sheet (a double-A4 landscape spread) and accommodates 5 patient records per page, arranged in rows. The left page captures patient identity, registration, diagnosis, and initial laboratory results; the right page captures follow-up smear results, drug susceptibility testing, treatment outcomes, HIV and ART details, and post-TB lung disease programme data.
| Form at a glance | |
| Programme | MW-TB |
| Version | v.5 |
| XLSForm release | 2026-03-02-v1.0.0 |
| Country | Malawi |
| Issuing body | Ministry of Health |
| Paper size | 840 × 297 mm (double-A4 landscape) |
| Patients / page | 5 rows |
| Total variables | 56 per patient row |
| Mandatory fields | 11 |
| Conditional fields | 9 |
| Digitisation | ScanForm OCR (QED.ai) |
What data does this form collect?
The register follows a TB patient from first contact through to final outcome. The sections below summarise the thematic groups of data collected.
| Data sections on the TB Facility Register | |||
| 61 variables total across 12 thematic sections | |||
| Section | Fields | Key Variables | |
|---|---|---|---|
| Patient Identity | 3 | Full name, address, phone number | |
| Registration | 3 | Registration date, Participant ID, TB registration number | |
| Demographics | 3 | Age (years or months), sex | |
| 🛡️ | Risk Group & Referral | 4 | Occupation / risk group (8 categories including miners), referral source |
| 💊 | Treatment | 5 | Regimen (2RHZE/4RH, BPalM, BPal), site of disease, treatment history, diagnosis method |
| 🔬 | Initial Lab Results | 10 | LAM, Smear, Xpert, Culture — dates and MTB detected/not detected results |
| 🔬 | Follow-up Smears | 9 | 2-month, 5-month, 6-month smear dates and graded results |
| 🧻 | Imaging & DST | 4 | X-ray result, DST result (RIF resistance), dates |
| 📊 | Treatment Outcome | 4 | Outcome (6 categories), outcome date, 2nd line escalation, DOT support |
| 🔴 | HIV & ART | 7 | HIV status, test timing, ARV start time, ART ID, CPT status and start date |
| 🫁 | Post-TB / PRP | 3 | PRP enrolment, PTLD diagnosis method, PRP outcome |
| 📝 | Comments & Admin | 6 | Free-text comments, book/page numbers, discard and error flags |
How ScanForm digitises this form
ScanForm uses Optical Character Recognition (OCR) to convert photographed paper forms into structured data. Each field on the paper is assigned an OCR model tuned to the type of input expected.
| OCR input types on this form | |||
| Each field type uses a dedicated recognition model | |||
| Input Type | How OCR Processes It | Example Fields | |
|---|---|---|---|
| Digit boxes | 🔢 | Each box is recognised independently as a digit 0–9 using the `int` model. | Age, smear result, registration date |
| Letter boxes | 🔡 | Each box is recognised as A–Z. Restricted alphabets reduce errors for known value sets. | Age unit (Y/M), ID codes |
| Oval bubbles | ⚪ | The `select_one_or_zero` model detects whether each oval has an X crossing. At most one positive per question. | Sex, HIV status, treatment outcome |
| Pre-printed text | 🏷️ | Some boxes contain fixed printed characters (e.g. MTB-, ART-, 20). These are not OCR-scanned. | TB registration number prefix, year century |
| Handwriting zones | ✏️ | Free-text lines are not OCR-processed. They are captured as images for human review. | Full name, address, comments |
Data quality: a three-layer system
Data quality is enforced at three distinct stages, each acting as a safety net for the next.
| Three-layer data quality system | ||||
| # | Layer | When | Who Acts | What Happens |
|---|---|---|---|---|
| 1 | 📱 In-phone validation | At photo capture, before submission | Field worker | 26 key fields checked per row. Missing required fields and multiple ovals trigger an immediate on-screen alert. Worker corrects the paper form and retakes the photo. |
| 2 | 🗄️ Pipeline DQA checks | After submission, on each pipeline run | Data manager | 41 fields checked across 5 check types: integer parse, date parse, multiple-answer selection, and clinical logic. Errors exclude records; Warnings flag for review. |
| 3 | 📊 Dashboard monitoring | Ongoing, daily or monthly | Supervisor / MoH | 25 programme indicators surfaced in dashboards. Trends, outliers and facility comparisons make systemic data quality issues visible at programme level. |
Documentation tabs
Use the navigation bar to explore all aspects of this form. Each tab is described below.
| Documentation site map | |||
| Tab | Primary Audience | Contents | |
|---|---|---|---|
| 🏠 | Home | All | This page. Form summary, data collection overview, and orientation guide. |
| 📋 | Variables | Data managers, researchers | Complete field-by-field reference: variable names, input types, required status, conditional logic, OCR models, and constraints for all 56 variables. |
| 🔀 | Data Pipeline | Technical staff, analysts | How raw OCR output flows through 5 dbt transformation layers (Base → Pre-Clean → Clean → Intermediary → Refinery) to produce analysis-ready tables. |
| 📱 | In-Phone Validation | Field supervisors | The 26 automated checks that run instantly on the field worker’s phone when a page is photographed. Errors prompt immediate on-site correction. |
| 📊 | DQA Checks | Data managers | The 5 server-side data quality check blocks that run after submission: integer validation, date validation, multiple-selection detection, and clinical logic. |
| 📈 | Dashboard Indicators | Programme managers, MoH | 25 actionable indicators derivable from this form, with numerators, denominators, disaggregations, and recommended visualisations. |
Correct completion instructions
These instructions are printed on the back of the physical form.
Filling out
- Use the square boxes for numbers and letters.
- Write exactly one letter or digit per box.
- Use CAPITAL letters only.
- Cross an oval bubble with an X to select an option.
- Questions marked with an asterisk 605 are mandatory.
Correcting mistakes
- Shade the incorrect answer completely.
- Write the correct answer close to it, outside the original box.
- Do not use correction fluid.
Example: If W was written incorrectly, shade it and write the correct letter nearby.
| Code | Full Meaning |
|---|---|
| New | New case — never previously treated |
| Relapse | Previously treated — cured or completed treatment |
| RALF | Return after lost to follow-up |
| Fail | Treatment after treatment failure |
| Other | Previous treatment outcome unknown |
| Unknown | Treatment history unknown |
About ScanForm
ScanForm is a paper-based data digitisation platform developed by QED.ai for use in international development and public health programmes. Forms are designed in an extended XLSForm standard (XLS-ScanForm), printed and used in the field in the usual way, then photographed with a smartphone. The ScanForm mobile app applies OCR models to recognise handwritten digits, letters, and marked ovals, and immediately validates the captured data before uploading to the cloud.
Document version: Auto-generated from XLSForm MW-TB_TB_Register.xlsx (release 2026-03-02-v1.0.0). For questions about this documentation contact the QED data team.