# HPM01 Developer Handover

Updated: 2026-05-15
Workspace: `D:\Projects\AZ`
Repository: `https://github.com/Badmin-on/LC_M`
Production: `https://hpm01.pages.dev`

## 1. Mission

HPM01 is a Korean lung-cancer evidence monitoring dashboard. Its job is to separate what can be confirmed from public official data from what requires HIRA/NHIS customized-data approval.

The project must support 2025/2026 monitoring, but it must not fabricate hospital-level lung cancer new-patient counts. Public APIs provide hospital metadata, national/monthly cancer registration trends, screening capacity, claims aggregates, and treatment-environment context. Hospital-level C33/C34 new-patient counts still require approved customized data.

## 2. Current Key Finding

As of 2026-05-15, the latest public NHIS new-cancer special-case registration file used by this project covers 2023-01 through 2025-11. Public 2026 lung cancer new-patient counts and hospital-level counts are not available as direct official open data.

Use this interpretation:

- NHIS public file: national monthly C33/C34 new special-case registration trend.
- National Cancer Registry/KOSIS: annual official cancer incidence anchor.
- Statistics Korea/KOSIS: annual mortality companion indicator.
- HIRA public APIs: hospital metadata and infrastructure context.
- NHIS screening APIs: screening facility capacity proxy.
- HIRA/NHIS treatment files and MFDS/HIRA drug APIs: treatment-environment context.
- HIRA customized research data: primary route for hospital-level C33/C34 new visit-patient counts.
- NHIS customized DB: secondary route for new special-case registration analysis.

## 2.1 Wave 1 Treatment-Environment OpenAPI (2026-05-15, on-demand)

Three new Cloudflare Functions endpoints connect the dashboard to the treatment-environment OpenAPIs that were registered as sources by the prior commit `6d0fa8c`. They are invoked only when a user clicks the "갱신" button on the corresponding panel; no cron is configured.

```text
functions/api/mfds/anticancer.js          MFDS drug product approval (data.go.kr 15095677)
functions/api/hira/drug-ingredient.js     HIRA drug ingredient/efficacy (data.go.kr 15021027)
functions/api/nhis/screening.js           NHIS screening facility search (data.go.kr 15154419, 2026 replacement)
```

`wrangler.toml` adds the corresponding base URL variables. They share the existing `HIRA_SERVICE_KEY` (data.go.kr general key). If the live data.go.kr URL printed for an approved API differs from the default, override it in Cloudflare Pages env:

```text
MFDS_DRUG_APPROVAL_API_BASE
HIRA_DRUG_INGREDIENT_API_BASE
NHIS_SCREENING_SEARCH_API_BASE
```

Dashboard adds an `#treatment` section with three cards (MFDS / HIRA drug ingredient / NHIS screening). Each card has a search input and a "갱신" button. Results live in browser memory only; no automatic file commit and no scheduled refresh.

## 2.2 Wave 2 Treatment-Environment Expansion (2026-05-15, on-demand)

Wave 2 adds six more Cloudflare Functions endpoints and four new "치료 환경" cards. All endpoints are invoked only when the operator clicks the "갱신" button on the corresponding card.

```text
functions/_shared/file-proxy.js                Generic CSV/text fileDownload helper (EUC-KR + UTF-8 auto-decode).
functions/api/hira/cancer-burden.js            HIRA cancer-disease medical cost file (data.go.kr 15072788).
functions/api/hira/disease-provider.js         HIRA 4-digit disease by provider type file (data.go.kr 15089614).
functions/api/nhis/anticancer-stats.js         NHIS cancer × antineoplastic ingredient (data.go.kr 15139129, odcloud OpenAPI).
functions/api/nhis/treatment-action.js         NHIS cancer-patient major treatment action (data.go.kr 15143824, odcloud OpenAPI).
functions/api/nhis/screening-detail.js         NHIS screening facility detail (data.go.kr 15154392).
functions/api/nhis/screening-code.js           NHIS screening facility code dictionary (data.go.kr 15154423).
```

`wrangler.toml` adds the corresponding env vars. `HIRA_CANCER_BURDEN_FILE_URL` and `HIRA_DISEASE_PROVIDER_FILE_URL` are intentionally empty in source — the operator must paste the live `data.go.kr/cmm/cmm/fileDownload.do?atchFileId=...&fileDetailSn=...` URL into Cloudflare Pages env. The `atchFileId` rotates whenever the publisher republishes the file, so the env value should be checked when the dataset version changes.

```text
NHIS_SCREENING_DETAIL_API_BASE
NHIS_SCREENING_CODE_API_BASE
NHIS_ANTICANCER_STATS_API_BASE
NHIS_TREATMENT_ACTION_API_BASE
HIRA_CANCER_BURDEN_FILE_URL          (must be set per-environment; empty default)
HIRA_DISEASE_PROVIDER_FILE_URL       (must be set per-environment; empty default)
```

Dashboard cards added to `#treatment`:

- 암 상병별 진료 burden (HIRA 15072788) — file proxy with C34 client-side filter.
- 4단 상병 × 요양기관 (HIRA 15089614) — file proxy with C34 client-side filter.
- 시도별 항암제 사용 (NHIS 15139129) — odcloud proxy with `폐` client-side filter.
- 주요 치료행위 (NHIS 15143824) — odcloud proxy with `폐` client-side filter.

The two NHIS screening detail/code endpoints exist as proxy routes; the dashboard search card invokes the search endpoint and detail/code can be wired into a follow-up "facility profile" view in a later wave.

Limitations and operator notes:

- file-proxy caps the response at 800 rows. Large dumps must be paginated by the operator using a different env URL or pre-aggregated.
- The Cloudflare Function does not parse `.xlsx`. If the publisher only offers `.xlsx`, the operator must convert to CSV and host the CSV behind the env URL.
- odcloud `serviceKey` reuses `HIRA_SERVICE_KEY`. Different generated endpoints accept the same data.go.kr general key.

## 2.3 Wave 2.1 — Auto-resolve fileDownload + Proxy Indicator Expansion (2026-05-15)

Two improvements after operator feedback ("URL은 직접 업데이트 가능하지 않아? 추가 proxy 지표가 더 있지 않을까?"):

### Auto-resolve fileDownload (file-proxy.js)

`functions/_shared/file-proxy.js` now supports two resolution modes:

1. **dataset-id mode (preferred)** — `config.datasetIdEnv` supplies a numeric dataset id (e.g. `15072788`). The proxy fetches `data.go.kr/data/{id}/fileData.do`, scrapes the live `atchFileId` and `fileDetailSn` from the HTML (three regex patterns: anchor href, `fn_fileDataDown(...)` JS handler, `data-*` attributes), and downloads the rotating `fileDownload.do` URL. Operators do not need to update env when the publisher republishes.
2. **explicit-url mode (fallback)** — `config.fileEnv` supplies the full URL. Used when scraping fails (HTML structure changed) or the dataset is on a different host.

Wave 2 file endpoints are now configured for dataset-id mode by default:

```text
HIRA_CANCER_BURDEN_DATASET_ID    = "15072788"   (auto-resolves)
HIRA_DISEASE_PROVIDER_DATASET_ID = "15089614"   (auto-resolves)
HIRA_CANCER_BURDEN_FILE_URL      = ""           (manual fallback if scrape breaks)
HIRA_DISEASE_PROVIDER_FILE_URL   = ""           (manual fallback if scrape breaks)
```

The proxy response includes `mode: "auto-resolved" | "explicit-url"` and the resolved `atchFileId/fileDetailSn` for transparency.

### Additional new-incidence proxy indicators (sources.json)

Five more channels were registered as proxy candidates for newly diagnosed lung cancer patients beyond the official NHIS special-case registration trend:

| ID | Type | Proxy meaning |
|---|---|---|
| `hira_drug_prescription_stats` | HIRA prescription stats | First-line targeted therapy initiation (EGFR/ALK TKIs etc.) ≈ new patient flow |
| `cris_lung_cancer_trials` | CRIS clinical trial registry | Active enrollment volume = academic-hospital lung cancer activity |
| `mohw_hospice_palliative_registry` | MoHW hospice statistics | End-of-life lung cancer flow (downstream survivorship/care) |
| `mohw_cancer_patient_medical_aid_program` | MoHW medical-aid enrollment | New low-income lung cancer applicants (independent incidence proxy) |
| `kepri_pm25_air_quality_openapi` | Air Korea PM2.5 OpenAPI | Long-term PM2.5 exposure = lung cancer risk driver overlay |

These are not yet wired to dashboard endpoints. Wave 3 candidates: pick the highest-value 1-2 (likely `hira_drug_prescription_stats` for direct incidence proxy and `kepri_pm25_air_quality_openapi` for risk overlay) and add proxy endpoints + cards.

## 2.4 Wave 3 — HIRA Anticancer Prescription Card (2026-05-15)

The strongest first-line proxy for newly diagnosed lung cancer patients (EGFR/ALK/ROS1 TKIs and platinum-doublet initiations) is now wired as the eighth treatment-environment card.

```text
functions/api/hira/drug-prescription.js   HIRA drug prescription statistics file proxy.
```

`wrangler.toml` adds `HIRA_DRUG_PRESCRIPTION_DATASET_ID` and `HIRA_DRUG_PRESCRIPTION_FILE_URL`. Both are empty in source — the operator must search data.go.kr for the exact prescription dataset (e.g. quarterly antineoplastic prescription statistics) and register the dataset id. Auto-resolve handles the rotating atchFileId.

The card is marked with ★ in the dashboard to highlight its proxy strength versus the other treatment-environment indicators.

## 2.5 Wave 4 — Co-prescription Indirect Indicators (2026-05-15)

Operator insight: lung cancer chemotherapy is almost always co-prescribed with supportive medications and procedures (antiemetics, G-CSF, central venous catheters, bone-metastasis agents). Tracking these gives an indirect new-incidence proxy independent of the NHIS special-case registration release cycle.

```text
functions/api/hira/co-rx-antiemetic.js     5-HT3 + NK1 antagonist (★★★ chemo cycle 1 proxy)
functions/api/hira/co-rx-gcsf.js           Pegfilgrastim/filgrastim (★★★ cycle volume proxy)
functions/api/hira/co-rx-port.js           Central venous catheter procedure (★★★ IV chemo init)
functions/api/hira/co-rx-bone-mets.js      Zoledronic acid/denosumab (★★ stage IV proxy)
docs/indirect-indicator-methodology.md     Statistical framework, weights, limits
```

Each endpoint reuses `file-proxy.js` with dataset-id auto-resolve. Operator must register the matching data.go.kr dataset id in Pages env (`HIRA_ANTIEMETIC_DATASET_ID`, etc).

New `#co-rx` panel groups the four cards and adds a Composite Proxy Index note. Suggested weights (operator-tunable):

```
CPI = 0.35*antiemetic + 0.30*port + 0.20*gcsf + 0.15*bone_mets
```

Joining CPI with the Wave 2 4-digit-disease × provider-type stats yields a hospital-class intensity ranking proxy for lung cancer treatment activity.

Three additional sources registered for future waves: HIRA DUR co-prescription metadata, MFDS KAERS adverse-event registry, plus the four co-prescription dataset references.

Remaining Wave 4+ candidates (deferred until requested):

- Premedication for taxanes (diphenhydramine + famotidine + dexamethasone — ★★★).
- EGFR-TKI side-effect drugs (loperamide, doxycycline — ★★).
- Pleurodesis / bronchoscopy device codes (★).
- CRIS lung cancer clinical trial endpoint + card.
- Air Korea PM2.5 endpoint + map overlay (regional risk driver).
- MoHW hospice/medical-aid program endpoints + cards.
- NHIS screening detail/code wired into the search card as a "facility profile" expansion.
- KOSIS cancer registry / cause-of-death OpenAPI fetcher into `data/raw/annual_incidence_lung_cancer.json` and `mortality_lung_cancer.json`.
- Persistent caching of OpenAPI/file responses (Cloudflare KV or commit-back via GitHub Actions).

## 3. Phase A Updates In The Current Working Tree

The current uncommitted work contains a Phase A Executive Dashboard expansion.

Important files changed or added:

```text
index.html
src/app.js
src/styles.css
scripts/build-data.mjs
scripts/fetch-cancer-stats.mjs
data/sources.json
data/processed/lung-cancer-trend.json
data/raw/annual_incidence_lung_cancer_skeleton.json
data/raw/regional_lung_cancer_skeleton.json
data/raw/mortality_lung_cancer_skeleton.json
data/raw/screening_capacity_skeleton.json
docs/source-expansion-phase-a.md
```

`scripts/build-data.mjs` now writes schema v2 with:

```text
freshness
alerts
sparkline
heatmap
annualIncidence
regional
mortality
screening
years
monthly
```

`scripts/fetch-cancer-stats.mjs` is intentionally conservative. It creates official-data templates and capture guidance. It does not silently create values.

## 4. Source Expansion

`data/sources.json` now includes the original HIRA/NHIS hospital and trend sources plus extra 2025/2026 monitoring channels:

- NHIS screening facility replacement APIs: `15154419`, `15154392`, `15154423`.
- HIRA cancer disease medical cost statistics.
- HIRA 4-digit disease by provider-type statistics.
- NHIS cancer disease and antineoplastic ingredient treatment information.
- NHIS cancer patient major treatment action statistics.
- MFDS drug product approval API.
- HIRA drug ingredient and efficacy information API.

These support patient-environment and treatment-environment monitoring. They do not replace the customized-data route for hospital-level new-patient counts.

## 5. Data Files

Current raw official input:

```text
data/raw/nhis_new_cancer_by_disease_20251130.csv
```

Skeletons waiting for official values:

```text
data/raw/annual_incidence_lung_cancer_skeleton.json
data/raw/regional_lung_cancer_skeleton.json
data/raw/mortality_lung_cancer_skeleton.json
data/raw/screening_capacity_skeleton.json
```

Processed dashboard data:

```text
data/processed/lung-cancer-trend.json
data/processed/source-status.json
data/processed/hira-cache/manifest.json
```

When official values are available, create non-skeleton files:

```text
data/raw/annual_incidence_lung_cancer.json
data/raw/regional_lung_cancer.json
data/raw/mortality_lung_cancer.json
data/raw/screening_capacity.json
```

Then set row `status` to `official` and run `npm run build:data`.

## 6. API And Secrets

Cloudflare Pages must have:

```text
HIRA_SERVICE_KEY
```

GitHub Actions should also have the same secret:

```text
Repository > Settings > Secrets and variables > Actions > New repository secret
Name: HIRA_SERVICE_KEY
```

Do not commit the key.

Implemented Cloudflare Pages Functions:

```text
/api/hira/hospitals
/api/hira/medical-detail
/api/hira/clinic-top5
/api/hira/pharmacies
/api/diagnostics/upstream
```

Source files:

```text
functions/_shared/hira-proxy.js
functions/api/hira/hospitals.js
functions/api/hira/medical-detail.js
functions/api/hira/clinic-top5.js
functions/api/hira/pharmacies.js
functions/api/diagnostics/upstream.js
```

## 7. Known API Behavior

Cloudflare-to-`apis.data.go.kr` calls can timeout even when invalid-key probes return quickly. The app therefore supports cache fallback for HIRA public API results.

Cache script:

```text
scripts/fetch-hira-cache.mjs
```

Output folder:

```text
data/processed/hira-cache/
```

Run locally:

```powershell
cd D:\Projects\AZ
$env:HIRA_SERVICE_KEY="YOUR_KEY_HERE"
npm run fetch:hira-cache
```

## 8. Local Commands

```powershell
cd D:\Projects\AZ
npm install
npm run build:data
npm run check:sources
npm run validate
npm run start
```

Local static URL:

```text
http://localhost:4173
```

Note: the local static server does not execute Cloudflare Pages Functions. Test live HIRA proxy behavior on the deployed Pages URL or with a Pages-compatible runtime.

## 9. Current Verification Snapshot

Recent local checks before this handover:

```text
npm run build:data
npm run validate
```

Expected build summary:

```text
Schema v2 | freshness=stale | alerts=1 | annual=partial | regional=pending | mortality=pending | screening=pending
```

`freshness=stale` is expected because the public NHIS file stops at 2025-11 while the current date is 2026-05-15.

## 10. Hospital-Level Methodology

Recommended HIRA customized-data definition:

```text
Disease: ICD-10 C33/C34
Analysis period: target year/months
Index date: first C33/C34 claim in analysis period
Wash-out: no C33/C34 claim in prior 5 years
Aggregation: approved institution or institution group by year/month
```

Recommended output layer:

```text
year
month
institution_id_or_group
provider_type
region
new_lung_cancer_visit_patients
new_outpatient_patients
new_inpatient_patients
new_anti_cancer_treatment_start_patients
radiation_treatment_patients
surgery_patients
source_status
```

## 11. Next Work

1. Add automated watcher for the NHIS new-cancer public file update.
2. Add KOSIS fetcher for annual incidence and mortality once the KOSIS key/table parameters are confirmed.
3. Add NHIS screening facility replacement API fetcher.
4. Add treatment-environment panel using MFDS/HIRA drug master and NHIS antineoplastic ingredient statistics.
5. Add a custom-data import path under `data/raw/custom/` after HIRA/NHIS approvals.
6. Re-run `npm run check:sources`, `npm run build:data`, and `npm run validate`, then commit and push.

## 12. Guardrails

- Never label HIRA public hospital API output as hospital-level lung cancer new-patient counts.
- Never fill pending 2026 or hospital-level values with estimates unless the UI labels them explicitly as scenario/model output.
- Keep public counts, custom-data counts, and proxy indicators visually separate.
- Keep API keys server-side only.
