Modeling Micronutrient Intake in India
1. Introduction
This document explains how we model and predict micronutrient intake at the household level — in both absolute and energy-adjusted forms — benchmarked against ICMR-NIN age-sex-specific nutrient requirements for each household member — using data from the Household Consumer Expenditure Survey (HCES) 2011–12 & 2023–24. The framework uses semi-parametric Generalized Additive Models (GAMs)1 within a two-stage hurdle structure2 to quantify how much of each essential micronutrient Indian households obtain from their diets, how intake varies across income levels and demographic groups, and what share of the population falls below nutritional requirements. The prevalence of inadequacy is computed at the person level using demographic-specific EAR/RDA values, then aggregated to the household for modeling.
1.1 What Do We Estimate?
The analysis produces four families of estimates for each micronutrient, each available in both absolute (unadjusted) and energy-adjusted variants:
Probability of positive intake: What share of the population obtains any amount of a given micronutrient from their diet?
Conditional intake level: Among households with positive intake, how much of the micronutrient (in the relevant unit per day) is consumed per Adult Female Equivalent (AFE)?
Unconditional expected intake: Combining the two above — the population-average daily intake including zero-intake households — across expenditure deciles and demographic groups.
Prevalence of inadequacy: What fraction of the population has usual intake below their individual nutrient requirement, computed using person-specific EAR and RDA values derived from each household member's age-sex profile?
The absolute view shows what households actually consume; the energy-adjusted view scales intake to each household's caloric requirement (holding diet composition constant), isolating whether the diet is compositionally adequate versus simply reflecting total food quantity. Both views are produced for intake and prevalence of inadequacy. The Shannon Diversity Index, being inherently scale-free, requires no energy adjustment.
In addition, two supplementary analyses are run for each micronutrient: intake excluding cereals (to assess dietary source diversity) and the Shannon Diversity Index of food sources contributing to micronutrient intake.
1.2 Micronutrients Analyzed
The analysis covers 9 micronutrients and macronutrients, benchmarked against ICMR-NIN dietary requirements3 for adult woman (55kg, moderately active, non-lactating):
| Micronutrient | Unit | Description |
|---|---|---|
| Iron | mg/day | Essential for oxygen transport; deficiency causes anaemia |
| Folate | µg/day | Critical for cell division and neural tube development |
| Zinc | mg/day | Supports immune function and wound healing |
| Vitamin B1 (Thiamine) | mg/day | Required for carbohydrate metabolism |
| Vitamin B2 (Riboflavin) | mg/day | Involved in energy production and cell function |
| Vitamin B3 (Niacin) | mg/day | Supports metabolism and DNA repair |
| Vitamin B6 | mg/day | Required for amino acid metabolism |
| Vitamin C | mg/day | Antioxidant; supports immune function |
| Calcium | mg/day | Essential for bone health |
Inadequacy prevalence is computed for all micronutrients that have defined EAR values in the ICMR-NIN reference tables. While the dashboard displays benchmarks for a reference adult woman (55 kg, moderately active, non-pregnant and non-lactating), the underlying prevalence calculation uses person-specific EAR and RDA values for each household member (Section 5).
| Micronutrient | EAR | RDA | Units |
|---|---|---|---|
| Iron | 15 | 29 | mg |
| Vitamin B9 (Folate) | 180 | 220 | mcg |
| Zinc | 11 | 13.2 | mg |
| Vitamin B1 | 1.4 | 1.7 | mg |
| Vitamin B2 | 2 | 2.4 | mg |
| Vitamin B3 | 12 | 14 | mg |
| Vitamin B6 | 1.6 | 1.9 | mg |
| Vitamin C | 55 | 65 | mg |
| Calcium | 800 | 1000 | mg |
Source: ICMR-NIN Recommended Dietary Allowances for Indians (2024). Reference values shown are for an adult woman, 55 kg, moderately active, and non-lactating. The prevalence of inadequacy analysis (Section 5) uses person-specific EAR and RDA values derived from each household member's age-sex profile.
2. Data Construction
2.1 From Food Quantities to Micronutrient Intake
Micronutrient intake is not directly observed in the HCES. Instead, it is derived by combining three data sources:
- Household food quantities from the HCES (amount of each food item consumed over the reference period)
- Food composition tables4 that map food items to micronutrient content per unit weight
- Reference period data to convert consumption from weekly/monthly reporting windows to daily values
For each household i, food item j, and micronutrient m, the daily household-level micronutrient intake is:
where qij is the quantity of food j consumed by household i over the reference period, cjm is the micronutrient content of food j per unit weight, and Tj is the reference period in days for food item j. Micronutrient content values are drawn from the Indian Food Composition Tables (IFCT) published by ICMR-NIN4, supplemented by Vijayakumar et al.5.
Total daily household intake of micronutrient m sums across all food sources, grouped into 10 food categories (cereals & millets, green leafy vegetables, other vegetables, roots & tubers (excluding potatoes), fruits, milk & milk products, fats & oils, oilseeds & nuts, pulses & beans, and flesh foods (eggs/fish/meat)):
2.2 Adult Female Equivalent (AFE) Scaling
To make intake comparable across households of different sizes and demographic compositions, we express intake per AFE using energy-based equivalence scales:
The energy-based AFE scale converts each household member’s energy requirement to the equivalent number of adult females, based on ICMR-NIN age-sex-specific energy requirements.3 This produces a per-person measure that accounts for the varying nutritional needs within a household.
The energy requirements and corresponding AFE scale factors used in this analysis are shown in Table 1. The reference category is the adult woman (moderate activity), whose requirement of 2,130 kcal/day defines one AFE unit.
| Demographic Profile | Activity Level | Energy Requirement (kcal/day) | AFE Scale Factor |
|---|---|---|---|
| Child 0 to 12 mo | — | 595 | 0.28 |
| Child 1 to 3 yrs | — | 1,110 | 0.52 |
| Child 4 to 6 yrs | — | 1,360 | 0.64 |
| Child 7 to 9 yrs | — | 1,700 | 0.80 |
| Girls 10 to 12 yrs | — | 2,060 | 0.97 |
| Boys 10 to 12 yrs | — | 2,220 | 1.04 |
| Girls 13 to 15 yrs | — | 2,400 | 1.13 |
| Boys 13 to 15 yrs | — | 2,860 | 1.34 |
| Girls 16 to 18 yrs | — | 2,500 | 1.17 |
| Boys 16 to 18 yrs | — | 3,320 | 1.56 |
| Adult women (reference) | Moderate | 2,130 | 1.00 |
| Adult women (lactating) | Moderate | 2,690 | 1.26 |
| Adult men | Moderate | 2,710 | 1.27 |
For example, a household comprising one adult man, one adult woman, and one child aged 4–6 would have an AFE household size of 1.27 + 1.00 + 0.64 = 2.91, rather than a simple headcount of 3. Dividing total household intake by this AFE size yields a per-AFE measure that is comparable across households of different demographic compositions.
Implementation Details
The AFE scale is constructed at the individual level from the HCES person-level roster, which records each household member's age and gender. The assignment proceeds in three steps:
Step 1 — Lactation status imputation. The HCES does not directly record lactation status. We proxy it using the presence of a child under 12 months in the household. If at least one child aged <1 is present, the youngest woman aged 19–49 in the household is classified as lactating and assigned the higher energy requirement of 2,690 kcal/day (= 2,130 base + 560 lactation increment, where 560 is the average of the ICMR-NIN increments of 600 kcal for 0–6 months and 520 kcal for 6–12 months postpartum). If there are n children under 1, then up to n women (ordered youngest-first) are assigned lactating status. Remaining adult women, as well as all women aged 50 and above, receive the non-pregnant, non-lactating requirement of 2,130 kcal/day.
Step 2 — Energy requirement assignment. Each household member is mapped to an age-sex-specific energy requirement from Table 1 using the following rules: children are classified solely by age (the infant category uses the average of the ICMR 0–6 month and 6–12 month values); adolescents aged 10–18 are further stratified by gender; and all adults are assigned the moderate activity level. Adult men (and transgender individuals coded as gender = 3) receive the adult male requirement of 2,710 kcal/day regardless of age.
Step 3 — AFE conversion and household aggregation. Each member's AFE scale factor is computed as AFEk = Ereqk / 2,130. The household AFE size is then the sum across all members: HH Size(AFE)i = ∑k AFEk. An individual's within-household energy share is sharek = AFEk / HH Size(AFE)i, which is used in subsequent micronutrient allocation steps.
The R implementation is shown below:
HCES2023_AFE_energy <- HCES2023_level02 %>%
dplyr::select(hhid, person_sno, gender, age) %>%
arrange(hhid, gender, age) %>%
group_by(hhid) %>%
mutate(under_1_a = ifelse(age < 1, 1, 0)) %>%
ungroup() %>%
group_by(hhid) %>%
mutate(
child_under_1 = max(under_1_a),
n_child_under_1 = sum(under_1_a)
) %>%
ungroup() %>%
mutate(
women_18_50 = dplyr::case_when(
(age >= 19 & age < 50) & gender == "2" ~ 1,
.default = 0
)
) %>%
group_by(hhid, women_18_50) %>%
mutate(seq = row_number()) %>%
ungroup() %>%
mutate(
energy_requirement = dplyr::case_when(
age < 1 ~ 595,
age >= 1 & age <= 3 ~ 1110,
age >= 4 & age <= 6 ~ 1360,
age >= 7 & age <= 9 ~ 1700,
(age >= 10 & age <= 12) & (gender %in% c("1","3")) ~ 2220,
(age >= 10 & age <= 12) & (gender == "2") ~ 2060,
(age >= 13 & age <= 15) & (gender %in% c("1","3")) ~ 2860,
(age >= 13 & age <= 15) & (gender == "2") ~ 2400,
(age >= 16 & age <= 18) & (gender %in% c("1","3")) ~ 3320,
(age >= 16 & age <= 18) & (gender == "2") ~ 2500,
(age >= 19) & (gender %in% c("1","3")) ~ 2710,
(age >= 19) & (gender == "2") & (child_under_1 == 0) ~ 2130,
(age >= 19 & age < 50) & (gender == "2") &
(child_under_1 == 1) & (seq <= n_child_under_1) ~ 2690,
(age >= 19 & age < 50) & (gender == "2") &
(child_under_1 == 1) & (seq > n_child_under_1) ~ 2130,
(age >= 50) & (gender == "2") ~ 2130
)
) %>%
mutate(
AFE_energy = energy_requirement / 2130,
share_energy = AFE_energy / sum(AFE_energy),
.by = hhid
)
2.3 Intake Without Cereals
For each micronutrient, a parallel variable is constructed that excludes the cereal contribution:
This decomposition is important because cereals dominate Indian diets and can mask micronutrient source diversity. For iron, for example, a high total intake may reflect heavy cereal consumption (with low bioavailability) rather than diverse dietary sources.
2.4 Shannon Diversity Index of Food Sources
To capture the diversity of dietary sources contributing to micronutrient intake, we compute the Shannon Diversity Index6 across the 10 food categories:
where pij = Qijm / ∑k Qikm is the share of micronutrient m that household i derives from food category j. Zero-share categories are excluded from the sum. Hi = 0 when all intake comes from a single food group, and Hi = ln(10) ≈ 2.30 when intake is equally distributed across all 10 categories.
2.5 Consumption Indicators and Sample Restrictions
The analysis is restricted to households with cooking arrangements (excluding those coded as “no cooking”), since micronutrient intake from purchased/consumed food is meaningful only for households that prepare meals. A binary consumption indicator is defined for each model variant:
2.6 Energy Adjustment
Absolute micronutrient intake is strongly correlated with total food consumption: households that eat more food mechanically obtain more of every nutrient. To disentangle diet composition from diet quantity, we apply a nutrient density scaling approach1314 that projects each household's intake onto its caloric requirement.
Scaling factor
For each household i, we compute a scaling factor as the ratio of the household's total energy requirement (summed across all members using ICMR-NIN age-sex-specific requirements) to its reported total caloric intake:
where Ereqk is the energy requirement of member k and Ereportedi is the household's total reported caloric intake. A scaling factor greater than 1 indicates the household reports consuming fewer calories than its members require (the common case, consistent with well-documented energy under-reporting); a scaling factor less than 1 indicates over-reporting.
Energy-adjusted intake
The energy-adjusted household micronutrient intake is then:
This is mathematically equivalent to computing the nutrient density of the diet (micronutrient per calorie) and multiplying by the household's caloric requirement. The counterfactual it answers is: if this household consumed exactly its required calories while maintaining the same diet composition, how much of each micronutrient would it obtain?
Interpretation
The dashboard presents both views as complementary lenses for policy:
- Absolute (unadjusted): What households actually consume. This is the view relevant for assessing whether nutrient needs are being met in practice.
- Energy-adjusted: What households would consume at their required caloric intake, holding diet composition constant. This isolates whether shortfalls reflect insufficient food quantity (an income/food security problem) or poor diet quality (a diversification/education problem).
3. Model Specification
3.1 Overview: Three Model Variants per Micronutrient
For each micronutrient, three parallel hurdle models are estimated:
| Variant | Outcome | Purpose |
|---|---|---|
| Main | Total intake per AFE (all food sources) | Primary intake estimate + inadequacy assessment |
| Without cereals | Total intake per AFE excluding cereals | Measures non-cereal dietary quality |
| Shannon | Shannon Diversity Index of food sources | Captures dietary source diversity |
Each variant consists of two sub-models (probability and quantity), producing six GAM models per micronutrient per survey round. The full model suite is estimated twice — once on energy-adjusted data and once on unadjusted data — with the logit participation model and Shannon models shared (since the sign of intake and proportional shares are invariant to energy scaling).
3.2 The Hurdle Model Framework
Each model variant follows the same two-stage hurdle structure.27 Let Yi denote the outcome for household i (intake, intake-without-cereals, or Shannon index). The expected value is:
The two components are estimated separately:
Part 1 — Probability model (logit sub-model): Estimated on all households using a quasi-binomial GAM with logit link.
Part 2 — Quantity model (positive sub-model): Estimated only on households with Yi > 0 using either a log-normal or Gamma(log) GAM. Model selection between the two families is based on AIC comparison.
3.3 GAM Specification
Both sub-models share the same semi-parametric GAM predictor structure1 (illustrated here for the main variant):
Logit sub-model (probability of positive intake):
Quantity sub-model (conditional mean for positive values):
where x = log(MPCEreal, AFE) is log real monthly per capita expenditure in AFE terms, and g(·) is the log link for both log-normal and Gamma models.
3.3.1 Smooth Terms and Random Effects
The predictor structure includes:
| Term | Type | Purpose |
|---|---|---|
| f(x) | Thin-plate spline | Baseline expenditure–intake curve |
| f(x, state) | Factor-smooth interaction (bs = "fs") |
State-specific expenditure curves |
| f(x, child) | Factor-smooth interaction | Curves for households with/without children |
| f(x, female_headed) | Factor-smooth interaction | Curves by household head gender |
| f(x, social) | Factor-smooth interaction | Curves by social group (caste) |
| f(x, rel) | Factor-smooth interaction | Curves by religion |
| s(sector) | Random intercept (bs = "re") |
Rural vs. urban shift |
| s(nss_region) | Random intercept | NSS region effect |
| s(reg_sector) | Random intercept | Region × sector interaction |
| s(social) | Random intercept | Social group intercept |
| s(rel) | Random intercept | Religion intercept |
| s(child) | Random intercept | Children-in-household intercept |
| s(female_headed) | Random intercept | Female-headed household intercept |
All factor-smooth interactions use m = 1 (first-order
penalty), which allows group-specific curves to deviate from the
population mean with a roughness penalty. This hierarchical structure provides adaptive regularisation — groups with limited data are shrunk toward the population curve, while data-rich groups are allowed to deviate more freely.8
3.4 Estimation
All models are estimated using mgcv::bam() with the
following settings1:
| Parameter | Value | Rationale |
|---|---|---|
method |
"fREML" |
Fast REML for large-sample smooth parameter estimation |
discrete |
TRUE |
Discretized covariate method for datasets > 100,000 observations9 |
gamma |
1.4 |
Extra penalty on effective degrees of freedom to prevent overfitting |
gc.level |
2 |
Aggressive garbage collection to manage memory |
select |
TRUE |
Allows smooth terms to be penalized to zero (automatic variable selection)10 |
Probability model:
family = quasibinomial(link = "logit") — quasi-likelihood
allows for over/under-dispersion relative to the binomial.
Quantity model: Two candidate families are estimated and the one with lower AIC is selected:
- Gamma:
family = Gamma(link = "log")withselect = TRUE - Log-normal:
family = gaussian()applied to log(Y) (noselectsince the Gaussian family has fixed dispersion)
3.5 Survey Weights
To account for the complex survey design, observations are weighted using per-capita survey weights:
where wi is the original survey weight and the denominator normalizes by the mean of the numerator, producing weights that average to 1.
4. Prediction Grid Construction
4.1 The Geographic Standardization Problem
Raw group means (e.g., average iron intake by religion) confound the focal effect with geographic and demographic composition. A religious group concentrated in states with higher cereal consumption will mechanically show different intake patterns even if there is no causal effect of religion on intake.
The prediction grid isolates focal effects by constructing counterfactual populations where non-focal variables are held at a standardized distribution while the focal variable and expenditure vary naturally.
4.2 Grid Construction by Comparison Type
The grid construction depends on which grouping variable is focal:
4.2.1 State Comparisons (group_vars = c("nss", "state_code"))
Each state retains its actual geographic structure (NSS regions, rural/urban mix). Demographics are standardized to the national distribution.
- Geography: Keep actual state × region × sector structure with observed population weights
- Demographics: Crossed with national demographic distribution (social group, religion, children, household head gender)
- Weights: wgrid = wgeo × wdemo
4.2.2 NSS Region Comparisons (group_vars = c("nss", "nss_region"))
Two options are available. The standardized version gives all regions the same rural/urban mix; the unstandardized version keeps each region’s actual sector distribution.
- Standardized: National sector distribution applied uniformly; demographics standardized nationally
- Unstandardized: Keep actual region × sector mix; standardize only demographics
4.2.3 Sector Comparisons (group_vars = c("nss", "sector"))
The national state-region distribution is standardized so that the pure rural–urban effect is isolated:
- Geography: National distribution of state × region applied equally to both sectors
- Demographics: Standardized nationally
4.2.4 Demographic Comparisons (e.g., religion, social group, child status)
Full geographic standardization: the national distribution of states, regions, and sectors is applied uniformly to all demographic groups, isolating the effect of the focal variable.
4.3 Expenditure Binning
For each group, households are assigned to expenditure deciles using group-specific cutpoints from the pre-computed MPCE distribution:
Within each decile, the group-specific mean MPCE is used as the representative expenditure level (on log scale) for prediction. An “Overall” bin uses the population-weighted mean MPCE for each group.
4.4 Seasonal Model Variation
For the micronutrient models, the seasonal variable is
excluded from the prediction grid
(season = "FALSE"), unlike the food consumption models.
This is because micronutrient intake aggregates across all food sources
and seasonal variation is absorbed into the expenditure–intake
relationship.
5. Prevalence of Inadequacy
5.1 Person-Level Approach Using Household Demographics
A key feature of this analysis is that the prevalence of inadequacy is not computed from a single reference EAR/RDA for an adult woman. Instead, it exploits the full demographic composition of each household — the age, sex, and physiological profile of every member — to derive a household-specific probability of inadequacy that reflects the actual requirements of the people consuming the food.1112
The procedure has three stages: (1) allocate household intake to individual members, (2) evaluate each member's probability of inadequacy against their person-specific requirement distribution, and (3) aggregate back to the household level for use as a GAM outcome.
5.2 Requirement Distributions
For each household member k with age-sex profile p, the ICMR-NIN reference tables3 provide a person-specific EARp and RDAp. The standard deviation of the requirement distribution is derived from the relationship RDA = EAR + 2 × σreq:
For most nutrient-profile combinations, requirements are assumed to follow a normal distribution. The exception is iron for menstruating women and adolescent girls (adult women and girls aged 13–18), where requirements follow a log-normal distribution because menstrual iron losses are highly variable and right-skewed:15
The complete set of person-specific EAR and RDA values used in this analysis is shown in Figure 5.1 below. These reference values — drawn from the ICMR-NIN 2024 guidelines3 — define the requirement distributions against which each household member's allocated intake is evaluated. Click any micronutrient segment to view exact EAR and RDA values for all thirteen demographic profiles; click the centre to reset. Note the substantially higher iron RDA for menstruating females (girls 13–18, adult women) relative to their EAR, reflecting the log-normal requirement distribution discussed above.
Figure 5.1 — ICMR-NIN EAR and RDA by Demographic Profile and Micronutrient
Source: ICMR-NIN (2024). Each segment represents one micronutrient. Click to view EAR/RDA by demographic profile; click centre to reset. Iron for menstruating females uses a log-normal requirement distribution; all others normal.
5.3 Individual Probability of Inadequacy
Household intake is allocated to individual members in proportion to each member's energy requirement (following the intra-household allocation approach of Smith & Subandoro, 200716):
where the household total Yhhi can be either the raw (absolute) or energy-adjusted intake (Section 2.6), producing separate prevalence estimates for each variant. The share weights sum to one within each household by construction.
The individual probability of inadequacy is then computed by evaluating each member's allocated intake against their person-specific requirement distribution:
This gives the probability that a random draw from member k’s requirement distribution exceeds their allocated intake — that is, the probability that member k is nutrient-inadequate given what the household reports consuming.
5.4 Household Aggregation
The household-level probability of inadequacy is the unweighted average across all members:
This household-level probability is computed separately for the energy-adjusted and unadjusted intake allocations. Boundary values (exactly 0 or 1) are squeezed into the open interval (ε, 1−ε) with ε = 10−6 for compatibility with quasi-binomial regression.17
5.5 Modeling Prevalence as a GAM Outcome
The household-level prevalence πi is then used as the response variable in a quasi-binomial GAM with logit link, using the same GAM specification (Section 3.3) as the intake models:
This allows the prevalence of inadequacy to vary smoothly across expenditure levels and demographic groups, with full posterior uncertainty quantification via the same simulation machinery (Section 6). Separate prevalence models are estimated for the energy-adjusted and unadjusted variants.
6. Posterior Simulation and Uncertainty Propagation
The analysis propagates uncertainty from two sources: (1) model uncertainty in the estimated smooth functions (coefficient uncertainty and dispersion parameter uncertainty), and (2) sampling uncertainty from the complex survey design. The approach exploits the Bayesian interpretation of penalized splines, where smoothing penalties correspond to improper Gaussian priors on the spline coefficients, yielding an approximate multivariate normal posterior for the coefficient vector.1
6.1 Step 1: Covariance Matrix Validation
Before drawing coefficient vectors, the posterior covariance matrix Vp = Cov(β̂) is checked for positive definiteness:
If the check fails, two repair strategies are available: (a)
nearPD — project onto the nearest positive-definite
matrix in the Frobenius norm (via Matrix::nearPD()); (b)
jitter — add a small diagonal perturbation V +
εI.
6.2 Step 2: Draw Coefficient Vectors
Coefficient draws are sampled from the approximate posterior:
Separate draw matrices are generated for each sub-model: BL (M × pL) for the logit model and BQ (M × pQ) for the quantity model. This is done independently for all three variants (main, without-cereals, Shannon), yielding 6 draw matrices per micronutrient per survey round.
6.3 Step 3: Draw Dispersion Parameters
The quantity model’s dispersion parameter is drawn from its sampling distribution:
Log-normal model: The residual variance σ² has a scaled inverse-chi-squared posterior:
The log-normal bias correction for each draw is δ(s) = 0.5 × (σ(s))2.
Gamma model: The dispersion φ has a similar scaled distribution:
where ν is the residual degrees of freedom from the fitted model.
6.4 Step 4: Chunked Household-Level Predictions
The prediction grid is processed in blocks (default: 5,000 rows) to manage memory. For each block:
Linear predictors via the lpmatrix:
Transform to response scale:
- Probability: p(s) = logit−1(ηL)
- Conditional mean (log-normal): μ(s) = exp(ηQ + δ(s))
- Conditional mean (Gamma): μ(s) = exp(ηQ)
- Unconditional expected intake: E[Y](s) = p(s) × μ(s)
Posterior predictive draws for individual-level intake:
- Log-normal: Ypos(s) ~ LogNormal(ηQ(s), σ(s))
- Gamma: Ypos(s) ~ Gamma(shape(s), scale = μ(s) × φ(s))
- Unconditional: Z ~ Bernoulli(p(s)); Yuncond(s) = Z × Ypos(s)
Inadequacy CDF evaluation (if enabled): For each of the K = 50 requirement draws, the CDF of the positive-intake distribution is evaluated at Rk, then averaged and combined with the zero-probability term.
6.5 Step 5: Weighted Aggregation to Group × Decile
Within each block, household-level draws are accumulated into group-level weighted averages using normalized within-group weights:
The accumulator matrices (G × M) are pre-allocated for: probability, conditional mean, unconditional expected intake, posterior predictive (positive and unconditional), and inadequacy prevalence. Block-level contributions are added via weighted cross-products.
6.6 Step 6: Survey Standard Errors
Plug-in predictions at the coefficient point estimates are computed for each household in the original survey data. A complex survey design object is created:
des <- svydesign(ids = ~psu, strata = ~strata, weights = ~wts, nest = TRUE)
Survey standard errors are obtained for each group × decile cell via
svyby() with svymean(), separately for:
- p̂ (consumption probability), μ̂ (conditional mean), E[Y] (unconditional mean)
- Pr(inadequacy) (inadequacy prevalence, main model only)
- All three model variants (main, without-cereals, Shannon)
The lonely PSU adjustment (survey.lonely.psu = "adjust")
ensures stable variance estimates when a stratum contains only one
PSU.
6.7 Step 7: Injecting Sampling Uncertainty
Survey standard errors are combined with model draws using scale-appropriate transformations:
Probability draws (logit scale, via delta method):
where ε(s) ~ N(0, σlogit2). Values are clipped to [10−6, 1 − 10−6].
Quantity and intake draws (log scale, mean-preserving):
The bias correction −0.5σlog2 ensures E[exp(ε − 0.5σ2)] = 1, so the noise is mean-preserving.
Inadequacy draws use the logit-scale transformation (same as probability draws), since inadequacy prevalence is a proportion.
Unconditional posterior predictive draws with sampling uncertainty are constructed by combining the noise-injected probability draws with the original posterior predictive quantity draws:
6.8 Step 8: Summary Statistics
For each group × decile × model variant, the following summaries are extracted from the M posterior draws:
| Statistic | Definition |
|---|---|
| Mean | (1/M) Σs θ(s) |
| Median | 50th percentile of {θ(s)} |
| 95% credible interval | [2.5th, 97.5th] percentiles of {θ(s)} |
Summaries are computed separately for: intake (main), inadequacy prevalence (main only), intake without cereals, and Shannon diversity.
7. Computational Implementation
7.1 Batch Processing Architecture
The full analysis covers 10 micronutrients × 7 grouping variables = 70 jobs, where each job runs both survey rounds (HCES 2011–12 and 2023–24). Within each job, the three model variants (main, without-cereals, Shannon) are processed sequentially, sharing the same prediction grid.
The batch processor uses future_lapply() for parallel
execution across items and grouping variables:
batch_generate_mn_figure_data(
items = c("iron", "folate", "zinc", ...),
grouping_vars = c("state_code", "sector", "rel", ...),
base_dir = base_dir,
n_workers = NULL # auto-detected
)
7.2 Adaptive Resource Management
Worker count is determined by the minimum of three constraints:
The system resource utility detects physical and logical cores, available memory, and recommends the optimal worker count.
7.3 Output File Organization
Each micronutrient gets its own directory with per-grouping-variable output files:
~/data/bam_models/micronutrient/AFE_energy/<nutrient>/
├── <nutrient>_data_list.RData # Raw intake data
├── <nutrient>_model.RData # Fitted hurdle models (6 per round)
├── data_mn_intake_<nutrient>_sector.RData
├── data_mn_intake_<nutrient>_rel.RData
├── data_mn_intake_<nutrient>_social.RData
└── ... (7 grouping variables)
Each output file contains summary statistics (mean, median, 95% CI) for all model variants, ready for visualization.
8. Visualization
8.1 Plot Structure
Figures display micronutrient intake (or inadequacy prevalence) across expenditure deciles, stratified by demographic group. Each plot contains:
- Expenditure curves with 95% credible intervals (ribbons) for each group
- Key points at the bottom decile, overall mean, and top decile
- Mean labels with group-specific values annotated
via
ggrepel - Reference lines for EAR and RDA benchmarks (intake plots)
- Title showing the micronutrient name, EAR, and RDA values
For state-level comparisons, states are grouped by geographic region (Northern, Southern, Eastern, Western, North-Eastern, Central), each rendered as a separate panel.
8.2 Output Formats
Plots are produced as:
- Individual ggplot objects for flexible composition
- Combined multi-panel images using
magickfor publication - Excel spreadsheets with the underlying data and variable descriptions
Glossary
| Term | Definition |
|---|---|
| AFE (Adult Female Equivalent) | Household size measure scaled by ICMR-NIN age-sex-specific requirements; converts each member to equivalent adult females |
| BAM | mgcv::bam() — Bayesian Additive Model for
large datasets; uses fast REML and discretized covariates |
| Coefficient of Variation (CV) | SE / mean; used to convert standard errors to the log scale (σlog = CV) |
| Delta Method | Approximation for the variance of a transformed variable: Var(g(θ)) ≈ [g′(θ)]2 Var(θ) |
| EAR | Estimated Average Requirement — the median nutrient requirement; intake below EAR indicates a >50% probability of inadequacy. In this analysis, person-specific EAR values from ICMR-NIN are used for each household member's age-sex profile |
| Energy Adjustment (Nutrient Density Scaling) | Scaling household micronutrient intake by the ratio of required to reported energy, holding diet composition constant. Equivalent to multiplying nutrient density (per kcal) by the caloric requirement. Follows the FAO nutrient density framework (1998) and Vossenaar et al. (2020) |
| Factor-Smooth Interaction | bs = "fs" in mgcv — allows each level of a
factor to have its own smooth curve of a continuous predictor, with a
shared penalty |
| fREML | Fast Restricted Maximum Likelihood — efficient method for estimating smoothing parameters in large-sample GAMs |
| Geographic Standardization | Holding the geographic (state/region/sector) distribution constant across comparison groups to isolate focal effects |
| Hurdle Model | Two-part model: (1) probability of positive outcome, (2) distribution of positive values; allows structural zeros |
| ICMR-NIN | Indian Council of Medical Research – National Institute of Nutrition; source of dietary requirements and food composition data |
| Inadequacy Prevalence | Average probability across household members that individual intake falls below person-specific requirements; uses age-sex-specific EAR/RDA from ICMR-NIN for each member, with normal distribution (or log-normal for iron in menstruating females) |
| Intra-Household Allocation | Distribution of household-level intake to individual members in proportion to their energy requirements (Smith & Subandoro, 2007) |
| lpmatrix | Linear predictor matrix X such that η = Xβ; enables vectorized computation of predictions across all simulation draws |
| Mean-Preserving Noise | Multiplicative noise exp(ε − 0.5σ2) with E[·] = 1; injects variance without shifting the mean |
| Posterior Predictive Draw | Simulated observation combining parameter uncertainty (coefficient draws) with observation-level variability (distributional draws) |
| Prediction Grid | Counterfactual population with standardized non-focal variables; used to compute comparable group-level estimates |
| RDA | Recommended Dietary Allowance — nutrient intake sufficient for 97.5% of individuals; equals EAR + 2 × σreq (this analysis uses 2 rather than 1.96) |
| Scaling Factor (SF) | Ratio of a household's total energy requirement (summed across members) to its reported caloric intake; SF > 1 indicates energy under-reporting (the common case) |
| Shannon Diversity Index | H = −Σ pj ln(pj); measures evenness of micronutrient sources across food categories |
Data Sources
| Source | Description |
|---|---|
| HCES 2011–12 | Household Consumer Expenditure Survey, NSS 68th Round (NSSO) |
| HCES 2023–24 | Household Consumer Expenditure Survey (MoSPI) |
| ICMR-NIN Requirements | Nutrient requirements (EAR, RDA) by age-sex profile,
from nin_requirements.dta; provides person-specific EAR and RDA
for each micronutrient across all demographic profiles |
| Household Demographic Roster | Individual-level records of household members with age, sex,
physiological profile, and ICMR-NIN energy requirements;
from HCES20XX_AFE.RData |
| Food Composition Tables | Micronutrient content per food item, from ICMR-NIN Indian Food Composition Tables |
| General Price Index | State-level price deflators for converting nominal MPCE to constant 2011–12 prices |