Methodology Note

Modeling Micronutrient Intake in India

GAM-Based Estimation of Nutrient Intake, Energy Adjustment, Dietary Diversity, and Prevalence of Inadequacy

Dr Shamika Ravi (Member, EAC to PM) & Dr Mudit Kapoor (CECFEE, EPU, ISI-Delhi Center)
February 2026

1. Introduction

This document explains how we model and predict micronutrient intake at the household level — in both absolute and energy-adjusted forms — benchmarked against ICMR-NIN age-sex-specific nutrient requirements for each household member — using data from the Household Consumer Expenditure Survey (HCES) 2011–12 & 2023–24. The framework uses semi-parametric Generalized Additive Models (GAMs)¹ within a two-stage hurdle structure² to quantify how much of each essential micronutrient Indian households obtain from their diets, how intake varies across income levels and demographic groups, and what share of the population falls below nutritional requirements. The prevalence of inadequacy is computed at the person level using demographic-specific EAR/RDA values, then aggregated to the household for modeling.

1.1 What Do We Estimate?

The analysis produces four families of estimates for each micronutrient, each available in both absolute (unadjusted) and energy-adjusted variants:

Probability of positive intake: What share of the population obtains any amount of a given micronutrient from their diet?
Conditional intake level: Among households with positive intake, how much of the micronutrient (in the relevant unit per day) is consumed per Adult Female Equivalent (AFE)?
Unconditional expected intake: Combining the two above — the population-average daily intake including zero-intake households — across expenditure deciles and demographic groups.
Prevalence of inadequacy: What fraction of the population has usual intake below their individual nutrient requirement, computed using person-specific EAR and RDA values derived from each household member's age-sex profile?

The absolute view shows what households actually consume; the energy-adjusted view scales intake to each household's caloric requirement (holding diet composition constant), isolating whether the diet is compositionally adequate versus simply reflecting total food quantity. Both views are produced for intake and prevalence of inadequacy. The Shannon Diversity Index, being inherently scale-free, requires no energy adjustment.

In addition, two supplementary analyses are run for each micronutrient: intake excluding cereals (to assess dietary source diversity) and the Shannon Diversity Index of food sources contributing to micronutrient intake.

1.2 Micronutrients Analyzed

The analysis covers 9 micronutrients and macronutrients, benchmarked against ICMR-NIN dietary requirements³ for adult woman (55kg, moderately active, non-lactating):

Micronutrient	Unit	Description
Iron	mg/day	Essential for oxygen transport; deficiency causes anaemia
Folate	µg/day	Critical for cell division and neural tube development
Zinc	mg/day	Supports immune function and wound healing
Vitamin B1 (Thiamine)	mg/day	Required for carbohydrate metabolism
Vitamin B2 (Riboflavin)	mg/day	Involved in energy production and cell function
Vitamin B3 (Niacin)	mg/day	Supports metabolism and DNA repair
Vitamin B6	mg/day	Required for amino acid metabolism
Vitamin C	mg/day	Antioxidant; supports immune function
Calcium	mg/day	Essential for bone health

Inadequacy prevalence is computed for all micronutrients that have defined EAR values in the ICMR-NIN reference tables. While the dashboard displays benchmarks for a reference adult woman (55 kg, moderately active, non-pregnant and non-lactating), the underlying prevalence calculation uses person-specific EAR and RDA values for each household member (Section 5).

Micronutrient	EAR	RDA	Units
Iron	15	29	mg
Vitamin B9 (Folate)	180	220	mcg
Zinc	11	13.2	mg
Vitamin B1	1.4	1.7	mg
Vitamin B2	2	2.4	mg
Vitamin B3	12	14	mg
Vitamin B6	1.6	1.9	mg
Vitamin C	55	65	mg
Calcium	800	1000	mg

Source: ICMR-NIN Recommended Dietary Allowances for Indians (2024). Reference values shown are for an adult woman, 55 kg, moderately active, and non-lactating. The prevalence of inadequacy analysis (Section 5) uses person-specific EAR and RDA values derived from each household member's age-sex profile.

2. Data Construction

2.1 From Food Quantities to Micronutrient Intake

Micronutrient intake is not directly observed in the HCES. Instead, it is derived by combining three data sources:

Household food quantities from the HCES (amount of each food item consumed over the reference period)
Food composition tables⁴ that map food items to micronutrient content per unit weight
Reference period data to convert consumption from weekly/monthly reporting windows to daily values

For each household i, food item j, and micronutrient m, the daily household-level micronutrient intake is:

Q_{i j m} = \frac{q_{i j} \times c_{j m}}{T_{j}}

where q_ij is the quantity of food j consumed by household i over the reference period, c_jm is the micronutrient content of food j per unit weight, and T_j is the reference period in days for food item j. Micronutrient content values are drawn from the Indian Food Composition Tables (IFCT) published by ICMR-NIN⁴, supplemented by Vijayakumar et al.⁵.

Total daily household intake of micronutrient m sums across all food sources, grouped into 10 food categories (cereals & millets, green leafy vegetables, other vegetables, roots & tubers (excluding potatoes), fruits, milk & milk products, fats & oils, oilseeds & nuts, pulses & beans, and flesh foods (eggs/fish/meat)):

Y_{i}^{h h} = \sum_{j = 1}^{10} Q_{i j m}

2.2 Adult Female Equivalent (AFE) Scaling

To make intake comparable across households of different sizes and demographic compositions, we express intake per AFE using energy-based equivalence scales:

Y_{i} = \frac{Y_{i}^{h h}}{{HH Size (AFE Energy)}_{i}}

The energy-based AFE scale converts each household member’s energy requirement to the equivalent number of adult females, based on ICMR-NIN age-sex-specific energy requirements.³ This produces a per-person measure that accounts for the varying nutritional needs within a household.

The energy requirements and corresponding AFE scale factors used in this analysis are shown in Table 1. The reference category is the adult woman (moderate activity), whose requirement of 2,130 kcal/day defines one AFE unit.

Table 1: ICMR-NIN Energy Requirements and AFE Scale Factors
Demographic Profile	Activity Level	Energy Requirement (kcal/day)	AFE Scale Factor
Child 0 to 12 mo	—	595	0.28
Child 1 to 3 yrs	—	1,110	0.52
Child 4 to 6 yrs	—	1,360	0.64
Child 7 to 9 yrs	—	1,700	0.80
Girls 10 to 12 yrs	—	2,060	0.97
Boys 10 to 12 yrs	—	2,220	1.04
Girls 13 to 15 yrs	—	2,400	1.13
Boys 13 to 15 yrs	—	2,860	1.34
Girls 16 to 18 yrs	—	2,500	1.17
Boys 16 to 18 yrs	—	3,320	1.56
Adult women (reference)	Moderate	2,130	1.00
Adult women (lactating)	Moderate	2,690	1.26
Adult men	Moderate	2,710	1.27

For example, a household comprising one adult man, one adult woman, and one child aged 4–6 would have an AFE household size of 1.27 + 1.00 + 0.64 = 2.91, rather than a simple headcount of 3. Dividing total household intake by this AFE size yields a per-AFE measure that is comparable across households of different demographic compositions.

Implementation Details

The AFE scale is constructed at the individual level from the HCES person-level roster, which records each household member's age and gender. The assignment proceeds in three steps:

Step 1 — Lactation status imputation. The HCES does not directly record lactation status. We proxy it using the presence of a child under 12 months in the household. If at least one child aged <1 is present, the youngest woman aged 19–49 in the household is classified as lactating and assigned the higher energy requirement of 2,690 kcal/day (= 2,130 base + 560 lactation increment, where 560 is the average of the ICMR-NIN increments of 600 kcal for 0–6 months and 520 kcal for 6–12 months postpartum). If there are n children under 1, then up to n women (ordered youngest-first) are assigned lactating status. Remaining adult women, as well as all women aged 50 and above, receive the non-pregnant, non-lactating requirement of 2,130 kcal/day.

Step 2 — Energy requirement assignment. Each household member is mapped to an age-sex-specific energy requirement from Table 1 using the following rules: children are classified solely by age (the infant category uses the average of the ICMR 0–6 month and 6–12 month values); adolescents aged 10–18 are further stratified by gender; and all adults are assigned the moderate activity level. Adult men (and transgender individuals coded as gender = 3) receive the adult male requirement of 2,710 kcal/day regardless of age.

Step 3 — AFE conversion and household aggregation. Each member's AFE scale factor is computed as AFE_k = E^req_k / 2,130. The household AFE size is then the sum across all members: HH Size(AFE)_i = ∑_k AFE_k. An individual's within-household energy share is share_k = AFE_k / HH Size(AFE)_i, which is used in subsequent micronutrient allocation steps.

The R implementation is shown below:

HCES2023_AFE_energy <- HCES2023_level02 %>%
  dplyr::select(hhid, person_sno, gender, age) %>%
  arrange(hhid, gender, age) %>%
  group_by(hhid) %>%
  mutate(under_1_a = ifelse(age < 1, 1, 0)) %>%
  ungroup() %>%
  group_by(hhid) %>%
  mutate(
    child_under_1   = max(under_1_a),
    n_child_under_1 = sum(under_1_a)
  ) %>%
  ungroup() %>%
  mutate(
    women_18_50 = dplyr::case_when(
      (age >= 19 & age < 50) & gender == "2" ~ 1,
      .default = 0
    )
  ) %>%
  group_by(hhid, women_18_50) %>%
  mutate(seq = row_number()) %>%
  ungroup() %>%
  mutate(
    energy_requirement = dplyr::case_when(
      age < 1                                          ~ 595,
      age >= 1  & age <= 3                             ~ 1110,
      age >= 4  & age <= 6                             ~ 1360,
      age >= 7  & age <= 9                             ~ 1700,
      (age >= 10 & age <= 12) & (gender %in% c("1","3")) ~ 2220,
      (age >= 10 & age <= 12) & (gender == "2")        ~ 2060,
      (age >= 13 & age <= 15) & (gender %in% c("1","3")) ~ 2860,
      (age >= 13 & age <= 15) & (gender == "2")        ~ 2400,
      (age >= 16 & age <= 18) & (gender %in% c("1","3")) ~ 3320,
      (age >= 16 & age <= 18) & (gender == "2")        ~ 2500,
      (age >= 19) & (gender %in% c("1","3"))           ~ 2710,
      (age >= 19) & (gender == "2") & (child_under_1 == 0) ~ 2130,
      (age >= 19 & age < 50) & (gender == "2") &
        (child_under_1 == 1) & (seq <= n_child_under_1)   ~ 2690,
      (age >= 19 & age < 50) & (gender == "2") &
        (child_under_1 == 1) & (seq > n_child_under_1)    ~ 2130,
      (age >= 50) & (gender == "2")                    ~ 2130
    )
  ) %>%
  mutate(
    AFE_energy   = energy_requirement / 2130,
    share_energy = AFE_energy / sum(AFE_energy),
    .by = hhid
  )

2.3 Intake Without Cereals

For each micronutrient, a parallel variable is constructed that excludes the cereal contribution:

Y_{i}^{w o} = \frac{Y_{i}^{h h} - Q_{i, cereal}^{m}}{{HH Size (AFE Energy)}_{i}}

This decomposition is important because cereals dominate Indian diets and can mask micronutrient source diversity. For iron, for example, a high total intake may reflect heavy cereal consumption (with low bioavailability) rather than diverse dietary sources.

2.4 Shannon Diversity Index of Food Sources

To capture the diversity of dietary sources contributing to micronutrient intake, we compute the Shannon Diversity Index⁶ across the 10 food categories:

H_{i} = - \sum_{j = 1}^{10} p_{i j} ln p_{i j}

where p_ij = Q_ijm / ∑_k Q_ikm is the share of micronutrient m that household i derives from food category j. Zero-share categories are excluded from the sum. H_i = 0 when all intake comes from a single food group, and H_i = ln(10) ≈ 2.30 when intake is equally distributed across all 10 categories.

2.5 Consumption Indicators and Sample Restrictions

The analysis is restricted to households with cooking arrangements (excluding those coded as “no cooking”), since micronutrient intake from purchased/consumed food is meaningful only for households that prepare meals. A binary consumption indicator is defined for each model variant:

D_{i} = {\begin{cases} 1 & if Y_{i} > 0 \\ 0 & otherwise \end{cases}}

2.6 Energy Adjustment

Absolute micronutrient intake is strongly correlated with total food consumption: households that eat more food mechanically obtain more of every nutrient. To disentangle diet composition from diet quantity, we apply a nutrient density scaling approach¹³¹⁴ that projects each household's intake onto its caloric requirement.

Scaling factor

For each household i, we compute a scaling factor as the ratio of the household's total energy requirement (summed across all members using ICMR-NIN age-sex-specific requirements) to its reported total caloric intake:

{SF}_{i} = \frac{\sum_{k = 1}^{n_{i}} E_{k}^{req}}{E_{i}^{reported}}

where E^req_k is the energy requirement of member k and E^reported_i is the household's total reported caloric intake. A scaling factor greater than 1 indicates the household reports consuming fewer calories than its members require (the common case, consistent with well-documented energy under-reporting); a scaling factor less than 1 indicates over-reporting.

Energy-adjusted intake

The energy-adjusted household micronutrient intake is then:

Y_{i}^{adj} = Y_{i}^{h h} \times {SF}_{i}

This is mathematically equivalent to computing the nutrient density of the diet (micronutrient per calorie) and multiplying by the household's caloric requirement. The counterfactual it answers is: if this household consumed exactly its required calories while maintaining the same diet composition, how much of each micronutrient would it obtain?

Interpretation

The dashboard presents both views as complementary lenses for policy:

Absolute (unadjusted): What households actually consume. This is the view relevant for assessing whether nutrient needs are being met in practice.
Energy-adjusted: What households would consume at their required caloric intake, holding diet composition constant. This isolates whether shortfalls reflect insufficient food quantity (an income/food security problem) or poor diet quality (a diversification/education problem).

Note: The Shannon Diversity Index is invariant to energy adjustment because it depends only on the proportional shares of micronutrient sources across food categories, not on absolute intake levels. It is therefore computed only once per household.

3. Model Specification

3.1 Overview: Three Model Variants per Micronutrient

For each micronutrient, three parallel hurdle models are estimated:

Variant	Outcome	Purpose
Main	Total intake per AFE (all food sources)	Primary intake estimate + inadequacy assessment
Without cereals	Total intake per AFE excluding cereals	Measures non-cereal dietary quality
Shannon	Shannon Diversity Index of food sources	Captures dietary source diversity

Each variant consists of two sub-models (probability and quantity), producing six GAM models per micronutrient per survey round. The full model suite is estimated twice — once on energy-adjusted data and once on unadjusted data — with the logit participation model and Shannon models shared (since the sign of intake and proportional shares are invariant to energy scaling).

3.2 The Hurdle Model Framework

Each model variant follows the same two-stage hurdle structure.²⁷ Let Y_i denote the outcome for household i (intake, intake-without-cereals, or Shannon index). The expected value is:

E [Y_{i}] = \Pr (Y_{i} > 0) \times E [Y_{i} | Y_{i} > 0] = p_{i} \times μ_{i}

The two components are estimated separately:

Part 1 — Probability model (logit sub-model): Estimated on all households using a quasi-binomial GAM with logit link.

Part 2 — Quantity model (positive sub-model): Estimated only on households with Y_i > 0 using either a log-normal or Gamma(log) GAM. Model selection between the two families is based on AIC comparison.

3.3 GAM Specification

Both sub-models share the same semi-parametric GAM predictor structure¹ (illustrated here for the main variant):

Logit sub-model (probability of positive intake):

logit (p_{i}) = β_{0} + f (x) + \sum_{k}^{} f_{k} (x, z_{k}) + random effects

Quantity sub-model (conditional mean for positive values):

g (μ_{i}) = β_{0} + f (x) + \sum_{k}^{} f_{k} (x, z_{k}) + random effects

where x = log(MPCE_{real, AFE}) is log real monthly per capita expenditure in AFE terms, and g(·) is the log link for both log-normal and Gamma models.

3.3.1 Smooth Terms and Random Effects

The predictor structure includes:

Term	Type	Purpose
f(x)	Thin-plate spline	Baseline expenditure–intake curve
f(x, state)	Factor-smooth interaction (`bs = "fs"`)	State-specific expenditure curves
f(x, child)	Factor-smooth interaction	Curves for households with/without children
f(x, female_headed)	Factor-smooth interaction	Curves by household head gender
f(x, social)	Factor-smooth interaction	Curves by social group (caste)
f(x, rel)	Factor-smooth interaction	Curves by religion
s(sector)	Random intercept (`bs = "re"`)	Rural vs. urban shift
s(nss_region)	Random intercept	NSS region effect
s(reg_sector)	Random intercept	Region × sector interaction
s(social)	Random intercept	Social group intercept
s(rel)	Random intercept	Religion intercept
s(child)	Random intercept	Children-in-household intercept
s(female_headed)	Random intercept	Female-headed household intercept

All factor-smooth interactions use m = 1 (first-order penalty), which allows group-specific curves to deviate from the population mean with a roughness penalty. This hierarchical structure provides adaptive regularisation — groups with limited data are shrunk toward the population curve, while data-rich groups are allowed to deviate more freely.⁸

3.4 Estimation

All models are estimated using mgcv::bam() with the following settings¹:

Parameter	Value	Rationale
`method`	`"fREML"`	Fast REML for large-sample smooth parameter estimation
`discrete`	`TRUE`	Discretized covariate method for datasets > 100,000 observations⁹
`gamma`	`1.4`	Extra penalty on effective degrees of freedom to prevent overfitting
`gc.level`	`2`	Aggressive garbage collection to manage memory
`select`	`TRUE`	Allows smooth terms to be penalized to zero (automatic variable selection)¹⁰

Probability model: family = quasibinomial(link = "logit") — quasi-likelihood allows for over/under-dispersion relative to the binomial.

Quantity model: Two candidate families are estimated and the one with lower AIC is selected:

Gamma: family = Gamma(link = "log") with select = TRUE
Log-normal: family = gaussian() applied to log(Y) (no select since the Gaussian family has fixed dispersion)

3.5 Survey Weights

To account for the complex survey design, observations are weighted using per-capita survey weights:

w_{i}^{p c} = \frac{w_{i} \times {FDQ HH Size}_{i}}{\bar{w}}

where w_i is the original survey weight and the denominator normalizes by the mean of the numerator, producing weights that average to 1.

4. Prediction Grid Construction

4.1 The Geographic Standardization Problem

Raw group means (e.g., average iron intake by religion) confound the focal effect with geographic and demographic composition. A religious group concentrated in states with higher cereal consumption will mechanically show different intake patterns even if there is no causal effect of religion on intake.

The prediction grid isolates focal effects by constructing counterfactual populations where non-focal variables are held at a standardized distribution while the focal variable and expenditure vary naturally.

4.2 Grid Construction by Comparison Type

The grid construction depends on which grouping variable is focal:

4.2.1 State Comparisons (group_vars = c("nss", "state_code"))

Each state retains its actual geographic structure (NSS regions, rural/urban mix). Demographics are standardized to the national distribution.

Geography: Keep actual state × region × sector structure with observed population weights
Demographics: Crossed with national demographic distribution (social group, religion, children, household head gender)
Weights: w_grid = w_geo × w_demo

4.2.2 NSS Region Comparisons (group_vars = c("nss", "nss_region"))

Two options are available. The standardized version gives all regions the same rural/urban mix; the unstandardized version keeps each region’s actual sector distribution.

Standardized: National sector distribution applied uniformly; demographics standardized nationally
Unstandardized: Keep actual region × sector mix; standardize only demographics

4.2.3 Sector Comparisons (group_vars = c("nss", "sector"))

The national state-region distribution is standardized so that the pure rural–urban effect is isolated:

Geography: National distribution of state × region applied equally to both sectors
Demographics: Standardized nationally

4.2.4 Demographic Comparisons (e.g., religion, social group, child status)

Full geographic standardization: the national distribution of states, regions, and sectors is applied uniformly to all demographic groups, isolating the effect of the focal variable.

4.3 Expenditure Binning

For each group, households are assigned to expenditure deciles using group-specific cutpoints from the pre-computed MPCE distribution:

b_{i} = 1 + findInterval ({MPCE}_{i}, {c_{10}, c_{20}, \dots, c_{90}})

Within each decile, the group-specific mean MPCE is used as the representative expenditure level (on log scale) for prediction. An “Overall” bin uses the population-weighted mean MPCE for each group.

4.4 Seasonal Model Variation

For the micronutrient models, the seasonal variable is excluded from the prediction grid (season = "FALSE"), unlike the food consumption models. This is because micronutrient intake aggregates across all food sources and seasonal variation is absorbed into the expenditure–intake relationship.

5. Prevalence of Inadequacy

5.1 Person-Level Approach Using Household Demographics

A key feature of this analysis is that the prevalence of inadequacy is not computed from a single reference EAR/RDA for an adult woman. Instead, it exploits the full demographic composition of each household — the age, sex, and physiological profile of every member — to derive a household-specific probability of inadequacy that reflects the actual requirements of the people consuming the food.¹¹¹²

The procedure has three stages: (1) allocate household intake to individual members, (2) evaluate each member's probability of inadequacy against their person-specific requirement distribution, and (3) aggregate back to the household level for use as a GAM outcome.

5.2 Requirement Distributions

For each household member k with age-sex profile p, the ICMR-NIN reference tables³ provide a person-specific EAR_p and RDA_p. The standard deviation of the requirement distribution is derived from the relationship RDA = EAR + 2 × σ_req:

σ_{p} = \frac{{RDA}_{p} - {EAR}_{p}}{2}

For most nutrient-profile combinations, requirements are assumed to follow a normal distribution. The exception is iron for menstruating women and adolescent girls (adult women and girls aged 13–18), where requirements follow a log-normal distribution because menstrual iron losses are highly variable and right-skewed:¹⁵

σ_{p, log} = \frac{ln ({RDA}_{p}) - ln ({EAR}_{p})}{2}

The complete set of person-specific EAR and RDA values used in this analysis is shown in Figure 5.1 below. These reference values — drawn from the ICMR-NIN 2024 guidelines³ — define the requirement distributions against which each household member's allocated intake is evaluated. Click any micronutrient segment to view exact EAR and RDA values for all thirteen demographic profiles; click the centre to reset. Note the substantially higher iron RDA for menstruating females (girls 13–18, adult women) relative to their EAR, reflecting the log-normal requirement distribution discussed above.

Figure 5.1 — ICMR-NIN EAR and RDA by Demographic Profile and Micronutrient

Source: ICMR-NIN (2024). Each segment represents one micronutrient. Click to view EAR/RDA by demographic profile; click centre to reset. Iron for menstruating females uses a log-normal requirement distribution; all others normal.

5.3 Individual Probability of Inadequacy

Household intake is allocated to individual members in proportion to each member's energy requirement (following the intra-household allocation approach of Smith & Subandoro, 2007¹⁶):

Y_{k} = Y_{i}^{hh} \times \frac{E_{k}^{req}}{\sum_{j}^{n_{i}} E_{j}^{req}}

where the household total Y^hh_i can be either the raw (absolute) or energy-adjusted intake (Section 2.6), producing separate prevalence estimates for each variant. The share weights sum to one within each household by construction.

The individual probability of inadequacy is then computed by evaluating each member's allocated intake against their person-specific requirement distribution:

π_{k} = {\begin{cases} 1 - Φ (Y_{k}; {EAR}_{p}, σ_{p}^{2}) & Normal case \\ 1 - F_{LN} (Y_{k}; ln {EAR}_{p}, σ_{p, log}^{2}) & Iron, menstruating \end{cases}}

This gives the probability that a random draw from member k’s requirement distribution exceeds their allocated intake — that is, the probability that member k is nutrient-inadequate given what the household reports consuming.

5.4 Household Aggregation

The household-level probability of inadequacy is the unweighted average across all members:

π_{i} = \frac{1}{n_{i}} \sum_{k = 1}^{n_{i}} π_{k}

This household-level probability is computed separately for the energy-adjusted and unadjusted intake allocations. Boundary values (exactly 0 or 1) are squeezed into the open interval (ε, 1−ε) with ε = 10⁻⁶ for compatibility with quasi-binomial regression.¹⁷

5.5 Modeling Prevalence as a GAM Outcome

The household-level prevalence π_i is then used as the response variable in a quasi-binomial GAM with logit link, using the same GAM specification (Section 3.3) as the intake models:

logit (π_{i}) = f (log MPCE) + smooth terms + random effects

This allows the prevalence of inadequacy to vary smoothly across expenditure levels and demographic groups, with full posterior uncertainty quantification via the same simulation machinery (Section 6). Separate prevalence models are estimated for the energy-adjusted and unadjusted variants.

Comparison with the standard EAR cut-point method. The conventional approach¹¹ treats all individuals as having the same requirement (that of a reference adult woman) and asks whether the household's per-AFE intake falls below this single threshold. Our approach instead recognizes that a household with young children, adolescent girls, and adult men has a different mix of requirements than a household of adult women alone. By evaluating each member against their own EAR/RDA and averaging, we obtain a prevalence estimate that reflects the actual demographic composition of the household. This is particularly important for nutrients like iron, where requirements vary dramatically by age, sex, and menstrual status.

6. Posterior Simulation and Uncertainty Propagation

The analysis propagates uncertainty from two sources: (1) model uncertainty in the estimated smooth functions (coefficient uncertainty and dispersion parameter uncertainty), and (2) sampling uncertainty from the complex survey design. The approach exploits the Bayesian interpretation of penalized splines, where smoothing penalties correspond to improper Gaussian priors on the spline coefficients, yielding an approximate multivariate normal posterior for the coefficient vector.¹

6.1 Step 1: Covariance Matrix Validation

Before drawing coefficient vectors, the posterior covariance matrix V_p = Cov(β̂) is checked for positive definiteness:

λ_{min} (V_{p}) > 10^{- 10}

If the check fails, two repair strategies are available: (a) nearPD — project onto the nearest positive-definite matrix in the Frobenius norm (via Matrix::nearPD()); (b) jitter — add a small diagonal perturbation V + εI.

6.2 Step 2: Draw Coefficient Vectors

Coefficient draws are sampled from the approximate posterior:

β^{(s)} \sim N (\hat{β}, V_{p}), s = 1, \dots, M

Separate draw matrices are generated for each sub-model: B_L (M × p_L) for the logit model and B_Q (M × p_Q) for the quantity model. This is done independently for all three variants (main, without-cereals, Shannon), yielding 6 draw matrices per micronutrient per survey round.

6.3 Step 3: Draw Dispersion Parameters

The quantity model’s dispersion parameter is drawn from its sampling distribution:

Log-normal model: The residual variance σ² has a scaled inverse-chi-squared posterior:

σ^{(s)} = \sqrt{\frac{{\hat{σ}}^{2} \times ν}{χ_{ν}^{2}}}

The log-normal bias correction for each draw is δ^(s) = 0.5 × (σ^(s))².

Gamma model: The dispersion φ has a similar scaled distribution:

φ^{(s)} = \frac{\hat{φ} \times ν}{χ_{ν}^{2}}, {shape}^{(s)} = 1 / φ^{(s)}

where ν is the residual degrees of freedom from the fitted model.

6.4 Step 4: Chunked Household-Level Predictions

The prediction grid is processed in blocks (default: 5,000 rows) to manage memory. For each block:

Linear predictors via the lpmatrix:

η_{L} = X_{L} B_{L}^{T} (n \times M) η_{Q} = X_{Q} B_{Q}^{T} (n \times M)

Transform to response scale:

Probability: p^(s) = logit⁻¹(η_L)
Conditional mean (log-normal): μ^(s) = exp(η_Q + δ^(s))
Conditional mean (Gamma): μ^(s) = exp(η_Q)
Unconditional expected intake: E[Y]^(s) = p^(s) × μ^(s)

Posterior predictive draws for individual-level intake:

Log-normal: Y_pos^(s) ~ LogNormal(η_Q^(s), σ^(s))
Gamma: Y_pos^(s) ~ Gamma(shape^(s), scale = μ^(s) × φ^(s))
Unconditional: Z ~ Bernoulli(p^(s)); Y_uncond^(s) = Z × Y_pos^(s)

Inadequacy CDF evaluation (if enabled): For each of the K = 50 requirement draws, the CDF of the positive-intake distribution is evaluated at R_k, then averaged and combined with the zero-probability term.

6.5 Step 5: Weighted Aggregation to Group × Decile

Within each block, household-level draws are accumulated into group-level weighted averages using normalized within-group weights:

{\tilde{w}}_{i} = \frac{w_{i}}{\sum_{j \in g} w_{j}}

The accumulator matrices (G × M) are pre-allocated for: probability, conditional mean, unconditional expected intake, posterior predictive (positive and unconditional), and inadequacy prevalence. Block-level contributions are added via weighted cross-products.

6.6 Step 6: Survey Standard Errors

Plug-in predictions at the coefficient point estimates are computed for each household in the original survey data. A complex survey design object is created:

des <- svydesign(ids = ~psu, strata = ~strata, weights = ~wts, nest = TRUE)

Survey standard errors are obtained for each group × decile cell via svyby() with svymean(), separately for:

p̂ (consumption probability), μ̂ (conditional mean), E[Y] (unconditional mean)
Pr(inadequacy) (inadequacy prevalence, main model only)
All three model variants (main, without-cereals, Shannon)

The lonely PSU adjustment (survey.lonely.psu = "adjust") ensures stable variance estimates when a stratum contains only one PSU.

6.7 Step 7: Injecting Sampling Uncertainty

Survey standard errors are combined with model draws using scale-appropriate transformations:

Probability draws (logit scale, via delta method):

σ_{logit} = \frac{SE (\bar{p})}{\bar{p} (1 - \bar{p})}, {\tilde{p}}^{(s)} = {logit}^{- 1} [logit (p^{(s)}) + ε^{(s)}]

where ε^(s) ~ N(0, σ_logit²). Values are clipped to [10⁻⁶, 1 − 10⁻⁶].

Quantity and intake draws (log scale, mean-preserving):

σ_{log} = \frac{SE (\bar{q})}{\bar{q}}, {\tilde{q}}^{(s)} = q^{(s)} \times \exp (ε^{(s)} - 0.5 σ_{log}^{2})

The bias correction −0.5σ_log² ensures E[exp(ε − 0.5σ²)] = 1, so the noise is mean-preserving.

Inadequacy draws use the logit-scale transformation (same as probability draws), since inadequacy prevalence is a proportion.

Unconditional posterior predictive draws with sampling uncertainty are constructed by combining the noise-injected probability draws with the original posterior predictive quantity draws:

Y_{uncond}^{(s)} = Z_{svy}^{(s)} \times Y_{pos}^{(s)}, Z_{svy}^{(s)} \sim Bernoulli ({\tilde{p}}_{svy}^{(s)})

6.8 Step 8: Summary Statistics

For each group × decile × model variant, the following summaries are extracted from the M posterior draws:

Statistic	Definition
Mean	(1/M) Σ_s θ^(s)
Median	50th percentile of {θ^(s)}
95% credible interval	[2.5th, 97.5th] percentiles of {θ^(s)}

Summaries are computed separately for: intake (main), inadequacy prevalence (main only), intake without cereals, and Shannon diversity.

7. Computational Implementation

7.1 Batch Processing Architecture

The full analysis covers 10 micronutrients × 7 grouping variables = 70 jobs, where each job runs both survey rounds (HCES 2011–12 and 2023–24). Within each job, the three model variants (main, without-cereals, Shannon) are processed sequentially, sharing the same prediction grid.

The batch processor uses future_lapply() for parallel execution across items and grouping variables:

batch_generate_mn_figure_data(
  items = c("iron", "folate", "zinc", ...),
  grouping_vars = c("state_code", "sector", "rel", ...),
  base_dir = base_dir,
  n_workers = NULL   # auto-detected
)

7.2 Adaptive Resource Management

Worker count is determined by the minimum of three constraints:

W_{opt} = \min (n_{cores} - 2, \frac{RAM - 4 GB}{2 GB/job}, n_{jobs})

The system resource utility detects physical and logical cores, available memory, and recommends the optimal worker count.

7.3 Output File Organization

Each micronutrient gets its own directory with per-grouping-variable output files:

~/data/bam_models/micronutrient/AFE_energy/<nutrient>/
   ├── <nutrient>_data_list.RData          # Raw intake data
   ├── <nutrient>_model.RData              # Fitted hurdle models (6 per round)
   ├── data_mn_intake_<nutrient>_sector.RData
   ├── data_mn_intake_<nutrient>_rel.RData
   ├── data_mn_intake_<nutrient>_social.RData
   └── ... (7 grouping variables)

Each output file contains summary statistics (mean, median, 95% CI) for all model variants, ready for visualization.

8. Visualization

8.1 Plot Structure

Figures display micronutrient intake (or inadequacy prevalence) across expenditure deciles, stratified by demographic group. Each plot contains:

Expenditure curves with 95% credible intervals (ribbons) for each group
Key points at the bottom decile, overall mean, and top decile
Mean labels with group-specific values annotated via ggrepel
Reference lines for EAR and RDA benchmarks (intake plots)
Title showing the micronutrient name, EAR, and RDA values

For state-level comparisons, states are grouped by geographic region (Northern, Southern, Eastern, Western, North-Eastern, Central), each rendered as a separate panel.

8.2 Output Formats

Plots are produced as:

Individual ggplot objects for flexible composition
Combined multi-panel images using magick for publication
Excel spreadsheets with the underlying data and variable descriptions

Glossary

Term	Definition
AFE (Adult Female Equivalent)	Household size measure scaled by ICMR-NIN age-sex-specific requirements; converts each member to equivalent adult females
BAM	`mgcv::bam()` — Bayesian Additive Model for large datasets; uses fast REML and discretized covariates
Coefficient of Variation (CV)	SE / mean; used to convert standard errors to the log scale (σ_log = CV)
Delta Method	Approximation for the variance of a transformed variable: Var(g(θ)) ≈ [g′(θ)]² Var(θ)
EAR	Estimated Average Requirement — the median nutrient requirement; intake below EAR indicates a >50% probability of inadequacy. In this analysis, person-specific EAR values from ICMR-NIN are used for each household member's age-sex profile
Energy Adjustment (Nutrient Density Scaling)	Scaling household micronutrient intake by the ratio of required to reported energy, holding diet composition constant. Equivalent to multiplying nutrient density (per kcal) by the caloric requirement. Follows the FAO nutrient density framework (1998) and Vossenaar et al. (2020)
Factor-Smooth Interaction	`bs = "fs"` in mgcv — allows each level of a factor to have its own smooth curve of a continuous predictor, with a shared penalty
fREML	Fast Restricted Maximum Likelihood — efficient method for estimating smoothing parameters in large-sample GAMs
Geographic Standardization	Holding the geographic (state/region/sector) distribution constant across comparison groups to isolate focal effects
Hurdle Model	Two-part model: (1) probability of positive outcome, (2) distribution of positive values; allows structural zeros
ICMR-NIN	Indian Council of Medical Research – National Institute of Nutrition; source of dietary requirements and food composition data
Inadequacy Prevalence	Average probability across household members that individual intake falls below person-specific requirements; uses age-sex-specific EAR/RDA from ICMR-NIN for each member, with normal distribution (or log-normal for iron in menstruating females)
Intra-Household Allocation	Distribution of household-level intake to individual members in proportion to their energy requirements (Smith & Subandoro, 2007)
lpmatrix	Linear predictor matrix X such that η = Xβ; enables vectorized computation of predictions across all simulation draws
Mean-Preserving Noise	Multiplicative noise exp(ε − 0.5σ²) with E[·] = 1; injects variance without shifting the mean
Posterior Predictive Draw	Simulated observation combining parameter uncertainty (coefficient draws) with observation-level variability (distributional draws)
Prediction Grid	Counterfactual population with standardized non-focal variables; used to compute comparable group-level estimates
RDA	Recommended Dietary Allowance — nutrient intake sufficient for 97.5% of individuals; equals EAR + 2 × σ_req (this analysis uses 2 rather than 1.96)
Scaling Factor (SF)	Ratio of a household's total energy requirement (summed across members) to its reported caloric intake; SF > 1 indicates energy under-reporting (the common case)
Shannon Diversity Index	H = −Σ p_j ln(p_j); measures evenness of micronutrient sources across food categories

Data Sources

Source	Description
HCES 2011–12	Household Consumer Expenditure Survey, NSS 68th Round (NSSO)
HCES 2023–24	Household Consumer Expenditure Survey (MoSPI)
ICMR-NIN Requirements	Nutrient requirements (EAR, RDA) by age-sex profile, from `nin_requirements.dta`; provides person-specific EAR and RDA for each micronutrient across all demographic profiles
Household Demographic Roster	Individual-level records of household members with age, sex, physiological profile, and ICMR-NIN energy requirements; from `HCES20XX_AFE.RData`
Food Composition Tables	Micronutrient content per food item, from ICMR-NIN Indian Food Composition Tables
General Price Index	State-level price deflators for converting nominal MPCE to constant 2011–12 prices

References

[1] Wood, S.N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). Chapman and Hall/CRC.

[2] Cragg, J.G. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 39(5), 829–844.

[3] ICMR-NIN (2024). Recommended Dietary Allowances and Estimated Average Requirements for Indians. Indian Council of Medical Research – National Institute of Nutrition, Hyderabad.

[4] Longvah, T., Ananthan, R., Bhaskarachary, K. & Venkaiah, K. (2017). Indian Food Composition Tables. National Institute of Nutrition, Hyderabad.

[5] Vijayakumar, A., Dubasi, H.B., Awasthi, A. & Jaacks, L.M. (2024). Development of an Indian Food Composition Database. Current Developments in Nutrition, 8(7), 103790.

[6] Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.

[7] Mullahy, J. (1998). Much ado about two: reconsidering retransformation and the two-part model in health econometrics. Journal of Health Economics, 17(3), 247–281.

[8] Gelman, A. (2006). Multilevel (hierarchical) modeling: What it can and can't do. Technometrics, 48(3), 432–435.

[9] Wood, S.N., Goude, Y. & Shaw, S. (2015). Generalized additive models for large data sets. Journal of the Royal Statistical Society: Series C, 64(1), 139–155.

[10] Li, R. & Shively, T.S. (2008). Variable selection in semiparametric regression modeling. Annals of Statistics, 36(1), 261–286.

[11] Institute of Medicine (2000). Dietary Reference Intakes: Applications in Dietary Assessment. National Academies Press. The EAR cut-point method assumes that the intake distribution and the requirement distribution are independent, and that the requirement distribution is approximately symmetric.

[12] Beaton, G.H. (1994). Criteria of an adequate diet. In M.E. Shils, J.A. Olson & M. Shike (Eds.), Modern Nutrition in Health and Disease (8th ed., pp. 1491–1505). Lea & Febiger.

[13] FAO/WHO (1998). Preparation and Use of Food-Based Dietary Guidelines. WHO Technical Report Series No. 880. Expresses nutrient requirements as densities per 1,000 kcal to define compositional adequacy of diets assuming sufficient energy is consumed.

[14] Vossenaar, M., Doak, C.M., et al. (2020). Nutrient Density as a Dimension of Dietary Quality: Findings of the Nutrient Density Approach in a Multi-Center Evaluation. Nutrients, 12(6), 1792. Formalizes the "critical nutrient density" framework: nutrient requirement / energy requirement.

[15] Institute of Medicine (2001). Dietary Reference Intakes for Vitamin A, Vitamin K, Arsenic, Boron, Chromium, Copper, Iodine, Iron, Manganese, Molybdenum, Nickel, Silicon, Vanadium, and Zinc. National Academies Press. Documents the log-normal distribution of iron requirements for menstruating women.

[16] Smith, L.C. & Subandoro, A. (2007). Measuring Food Security Using Household Expenditure Surveys. Food Security in Practice Technical Guide Series No. 3. International Food Policy Research Institute (IFPRI). Proportional allocation of household intake to members based on energy requirement shares.

[17] Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods, 11(1), 54–71.