Methodology Note

Modeling Micronutrient Intake in India

GAM-Based Estimation of Nutrient Intake, Energy Adjustment, Dietary Diversity, and Prevalence of Inadequacy
Dr Shamika Ravi (Member, EAC to PM) & Dr Mudit Kapoor (CECFEE, EPU, ISI-Delhi Center)
February 2026

1. Introduction

This document explains how we model and predict micronutrient intake at the household level — in both absolute and energy-adjusted forms — benchmarked against ICMR-NIN age-sex-specific nutrient requirements for each household member — using data from the Household Consumer Expenditure Survey (HCES) 2011–12 & 2023–24. The framework uses semi-parametric Generalized Additive Models (GAMs)1 within a two-stage hurdle structure2 to quantify how much of each essential micronutrient Indian households obtain from their diets, how intake varies across income levels and demographic groups, and what share of the population falls below nutritional requirements. The prevalence of inadequacy is computed at the person level using demographic-specific EAR/RDA values, then aggregated to the household for modeling.

1.1 What Do We Estimate?

The analysis produces four families of estimates for each micronutrient, each available in both absolute (unadjusted) and energy-adjusted variants:

  1. Probability of positive intake: What share of the population obtains any amount of a given micronutrient from their diet?

  2. Conditional intake level: Among households with positive intake, how much of the micronutrient (in the relevant unit per day) is consumed per Adult Female Equivalent (AFE)?

  3. Unconditional expected intake: Combining the two above — the population-average daily intake including zero-intake households — across expenditure deciles and demographic groups.

  4. Prevalence of inadequacy: What fraction of the population has usual intake below their individual nutrient requirement, computed using person-specific EAR and RDA values derived from each household member's age-sex profile?

The absolute view shows what households actually consume; the energy-adjusted view scales intake to each household's caloric requirement (holding diet composition constant), isolating whether the diet is compositionally adequate versus simply reflecting total food quantity. Both views are produced for intake and prevalence of inadequacy. The Shannon Diversity Index, being inherently scale-free, requires no energy adjustment.

In addition, two supplementary analyses are run for each micronutrient: intake excluding cereals (to assess dietary source diversity) and the Shannon Diversity Index of food sources contributing to micronutrient intake.

1.2 Micronutrients Analyzed

The analysis covers 9 micronutrients and macronutrients, benchmarked against ICMR-NIN dietary requirements3 for adult woman (55kg, moderately active, non-lactating):

Micronutrient Unit Description
Iron mg/day Essential for oxygen transport; deficiency causes anaemia
Folate µg/day Critical for cell division and neural tube development
Zinc mg/day Supports immune function and wound healing
Vitamin B1 (Thiamine) mg/day Required for carbohydrate metabolism
Vitamin B2 (Riboflavin) mg/day Involved in energy production and cell function
Vitamin B3 (Niacin) mg/day Supports metabolism and DNA repair
Vitamin B6 mg/day Required for amino acid metabolism
Vitamin C mg/day Antioxidant; supports immune function
Calcium mg/day Essential for bone health

Inadequacy prevalence is computed for all micronutrients that have defined EAR values in the ICMR-NIN reference tables. While the dashboard displays benchmarks for a reference adult woman (55 kg, moderately active, non-pregnant and non-lactating), the underlying prevalence calculation uses person-specific EAR and RDA values for each household member (Section 5).

Micronutrient EAR RDA Units
Iron 15 29 mg
Vitamin B9 (Folate) 180 220 mcg
Zinc 11 13.2 mg
Vitamin B1 1.4 1.7 mg
Vitamin B2 2 2.4 mg
Vitamin B3 12 14 mg
Vitamin B6 1.6 1.9 mg
Vitamin C 55 65 mg
Calcium 800 1000 mg

Source: ICMR-NIN Recommended Dietary Allowances for Indians (2024). Reference values shown are for an adult woman, 55 kg, moderately active, and non-lactating. The prevalence of inadequacy analysis (Section 5) uses person-specific EAR and RDA values derived from each household member's age-sex profile.

2. Data Construction

2.1 From Food Quantities to Micronutrient Intake

Micronutrient intake is not directly observed in the HCES. Instead, it is derived by combining three data sources:

  1. Household food quantities from the HCES (amount of each food item consumed over the reference period)
  2. Food composition tables4 that map food items to micronutrient content per unit weight
  3. Reference period data to convert consumption from weekly/monthly reporting windows to daily values

For each household i, food item j, and micronutrient m, the daily household-level micronutrient intake is:

Qijm = qij × cjm Tj

where qij is the quantity of food j consumed by household i over the reference period, cjm is the micronutrient content of food j per unit weight, and Tj is the reference period in days for food item j. Micronutrient content values are drawn from the Indian Food Composition Tables (IFCT) published by ICMR-NIN4, supplemented by Vijayakumar et al.5.

Total daily household intake of micronutrient m sums across all food sources, grouped into 10 food categories (cereals & millets, green leafy vegetables, other vegetables, roots & tubers (excluding potatoes), fruits, milk & milk products, fats & oils, oilseeds & nuts, pulses & beans, and flesh foods (eggs/fish/meat)):

Yihh = j=110 Qijm

2.2 Adult Female Equivalent (AFE) Scaling

To make intake comparable across households of different sizes and demographic compositions, we express intake per AFE using energy-based equivalence scales:

Yi = Yihh HH Size (AFE Energy)i

The energy-based AFE scale converts each household member’s energy requirement to the equivalent number of adult females, based on ICMR-NIN age-sex-specific energy requirements.3 This produces a per-person measure that accounts for the varying nutritional needs within a household.

The energy requirements and corresponding AFE scale factors used in this analysis are shown in Table 1. The reference category is the adult woman (moderate activity), whose requirement of 2,130 kcal/day defines one AFE unit.

Table 1: ICMR-NIN Energy Requirements and AFE Scale Factors
Demographic Profile Activity Level Energy Requirement (kcal/day) AFE Scale Factor
Child 0 to 12 mo5950.28
Child 1 to 3 yrs1,1100.52
Child 4 to 6 yrs1,3600.64
Child 7 to 9 yrs1,7000.80
Girls 10 to 12 yrs2,0600.97
Boys 10 to 12 yrs2,2201.04
Girls 13 to 15 yrs2,4001.13
Boys 13 to 15 yrs2,8601.34
Girls 16 to 18 yrs2,5001.17
Boys 16 to 18 yrs3,3201.56
Adult women (reference)Moderate2,1301.00
Adult women (lactating)Moderate2,6901.26
Adult menModerate2,7101.27

For example, a household comprising one adult man, one adult woman, and one child aged 4–6 would have an AFE household size of 1.27 + 1.00 + 0.64 = 2.91, rather than a simple headcount of 3. Dividing total household intake by this AFE size yields a per-AFE measure that is comparable across households of different demographic compositions.

Implementation Details

The AFE scale is constructed at the individual level from the HCES person-level roster, which records each household member's age and gender. The assignment proceeds in three steps:

Step 1 — Lactation status imputation. The HCES does not directly record lactation status. We proxy it using the presence of a child under 12 months in the household. If at least one child aged <1 is present, the youngest woman aged 19–49 in the household is classified as lactating and assigned the higher energy requirement of 2,690 kcal/day (= 2,130 base + 560 lactation increment, where 560 is the average of the ICMR-NIN increments of 600 kcal for 0–6 months and 520 kcal for 6–12 months postpartum). If there are n children under 1, then up to n women (ordered youngest-first) are assigned lactating status. Remaining adult women, as well as all women aged 50 and above, receive the non-pregnant, non-lactating requirement of 2,130 kcal/day.

Step 2 — Energy requirement assignment. Each household member is mapped to an age-sex-specific energy requirement from Table 1 using the following rules: children are classified solely by age (the infant category uses the average of the ICMR 0–6 month and 6–12 month values); adolescents aged 10–18 are further stratified by gender; and all adults are assigned the moderate activity level. Adult men (and transgender individuals coded as gender = 3) receive the adult male requirement of 2,710 kcal/day regardless of age.

Step 3 — AFE conversion and household aggregation. Each member's AFE scale factor is computed as AFEk = Ereqk / 2,130. The household AFE size is then the sum across all members: HH Size(AFE)i = ∑k AFEk. An individual's within-household energy share is sharek = AFEk / HH Size(AFE)i, which is used in subsequent micronutrient allocation steps.

The R implementation is shown below:

HCES2023_AFE_energy <- HCES2023_level02 %>%
  dplyr::select(hhid, person_sno, gender, age) %>%
  arrange(hhid, gender, age) %>%
  group_by(hhid) %>%
  mutate(under_1_a = ifelse(age < 1, 1, 0)) %>%
  ungroup() %>%
  group_by(hhid) %>%
  mutate(
    child_under_1   = max(under_1_a),
    n_child_under_1 = sum(under_1_a)
  ) %>%
  ungroup() %>%
  mutate(
    women_18_50 = dplyr::case_when(
      (age >= 19 & age < 50) & gender == "2" ~ 1,
      .default = 0
    )
  ) %>%
  group_by(hhid, women_18_50) %>%
  mutate(seq = row_number()) %>%
  ungroup() %>%
  mutate(
    energy_requirement = dplyr::case_when(
      age < 1                                          ~ 595,
      age >= 1  & age <= 3                             ~ 1110,
      age >= 4  & age <= 6                             ~ 1360,
      age >= 7  & age <= 9                             ~ 1700,
      (age >= 10 & age <= 12) & (gender %in% c("1","3")) ~ 2220,
      (age >= 10 & age <= 12) & (gender == "2")        ~ 2060,
      (age >= 13 & age <= 15) & (gender %in% c("1","3")) ~ 2860,
      (age >= 13 & age <= 15) & (gender == "2")        ~ 2400,
      (age >= 16 & age <= 18) & (gender %in% c("1","3")) ~ 3320,
      (age >= 16 & age <= 18) & (gender == "2")        ~ 2500,
      (age >= 19) & (gender %in% c("1","3"))           ~ 2710,
      (age >= 19) & (gender == "2") & (child_under_1 == 0) ~ 2130,
      (age >= 19 & age < 50) & (gender == "2") &
        (child_under_1 == 1) & (seq <= n_child_under_1)   ~ 2690,
      (age >= 19 & age < 50) & (gender == "2") &
        (child_under_1 == 1) & (seq > n_child_under_1)    ~ 2130,
      (age >= 50) & (gender == "2")                    ~ 2130
    )
  ) %>%
  mutate(
    AFE_energy   = energy_requirement / 2130,
    share_energy = AFE_energy / sum(AFE_energy),
    .by = hhid
  )

2.3 Intake Without Cereals

For each micronutrient, a parallel variable is constructed that excludes the cereal contribution:

Yiwo = Yihh Qi,cerealm HH Size (AFE Energy)i

This decomposition is important because cereals dominate Indian diets and can mask micronutrient source diversity. For iron, for example, a high total intake may reflect heavy cereal consumption (with low bioavailability) rather than diverse dietary sources.

2.4 Shannon Diversity Index of Food Sources

To capture the diversity of dietary sources contributing to micronutrient intake, we compute the Shannon Diversity Index6 across the 10 food categories:

Hi = j=110 pij ln pij

where pij = Qijm / ∑k Qikm is the share of micronutrient m that household i derives from food category j. Zero-share categories are excluded from the sum. Hi = 0 when all intake comes from a single food group, and Hi = ln(10) ≈ 2.30 when intake is equally distributed across all 10 categories.

2.5 Consumption Indicators and Sample Restrictions

The analysis is restricted to households with cooking arrangements (excluding those coded as “no cooking”), since micronutrient intake from purchased/consumed food is meaningful only for households that prepare meals. A binary consumption indicator is defined for each model variant:

Di = { 1if Yi>0 0otherwise }

2.6 Energy Adjustment

Absolute micronutrient intake is strongly correlated with total food consumption: households that eat more food mechanically obtain more of every nutrient. To disentangle diet composition from diet quantity, we apply a nutrient density scaling approach1314 that projects each household's intake onto its caloric requirement.

Scaling factor

For each household i, we compute a scaling factor as the ratio of the household's total energy requirement (summed across all members using ICMR-NIN age-sex-specific requirements) to its reported total caloric intake:

SFi = k=1ni Ekreq Eireported

where Ereqk is the energy requirement of member k and Ereportedi is the household's total reported caloric intake. A scaling factor greater than 1 indicates the household reports consuming fewer calories than its members require (the common case, consistent with well-documented energy under-reporting); a scaling factor less than 1 indicates over-reporting.

Energy-adjusted intake

The energy-adjusted household micronutrient intake is then:

Yiadj = Yihh × SFi

This is mathematically equivalent to computing the nutrient density of the diet (micronutrient per calorie) and multiplying by the household's caloric requirement. The counterfactual it answers is: if this household consumed exactly its required calories while maintaining the same diet composition, how much of each micronutrient would it obtain?

Interpretation

The dashboard presents both views as complementary lenses for policy:

Note: The Shannon Diversity Index is invariant to energy adjustment because it depends only on the proportional shares of micronutrient sources across food categories, not on absolute intake levels. It is therefore computed only once per household.

3. Model Specification

3.1 Overview: Three Model Variants per Micronutrient

For each micronutrient, three parallel hurdle models are estimated:

Variant Outcome Purpose
Main Total intake per AFE (all food sources) Primary intake estimate + inadequacy assessment
Without cereals Total intake per AFE excluding cereals Measures non-cereal dietary quality
Shannon Shannon Diversity Index of food sources Captures dietary source diversity

Each variant consists of two sub-models (probability and quantity), producing six GAM models per micronutrient per survey round. The full model suite is estimated twice — once on energy-adjusted data and once on unadjusted data — with the logit participation model and Shannon models shared (since the sign of intake and proportional shares are invariant to energy scaling).

3.2 The Hurdle Model Framework

Each model variant follows the same two-stage hurdle structure.27 Let Yi denote the outcome for household i (intake, intake-without-cereals, or Shannon index). The expected value is:

E[Yi] = Pr(Yi>0) × E[Yi|Yi>0] = pi × μi

The two components are estimated separately:

Part 1 — Probability model (logit sub-model): Estimated on all households using a quasi-binomial GAM with logit link.

Part 2 — Quantity model (positive sub-model): Estimated only on households with Yi > 0 using either a log-normal or Gamma(log) GAM. Model selection between the two families is based on AIC comparison.

3.3 GAM Specification

Both sub-models share the same semi-parametric GAM predictor structure1 (illustrated here for the main variant):

Logit sub-model (probability of positive intake):

logit(pi) = β0 + f(x) + k fk(x,zk) + random effects

Quantity sub-model (conditional mean for positive values):

g(μi) = β0 + f(x) + k fk(x,zk) + random effects

where x = log(MPCEreal, AFE) is log real monthly per capita expenditure in AFE terms, and g(·) is the log link for both log-normal and Gamma models.

3.3.1 Smooth Terms and Random Effects

The predictor structure includes:

Term Type Purpose
f(x) Thin-plate spline Baseline expenditure–intake curve
f(x, state) Factor-smooth interaction (bs = "fs") State-specific expenditure curves
f(x, child) Factor-smooth interaction Curves for households with/without children
f(x, female_headed) Factor-smooth interaction Curves by household head gender
f(x, social) Factor-smooth interaction Curves by social group (caste)
f(x, rel) Factor-smooth interaction Curves by religion
s(sector) Random intercept (bs = "re") Rural vs. urban shift
s(nss_region) Random intercept NSS region effect
s(reg_sector) Random intercept Region × sector interaction
s(social) Random intercept Social group intercept
s(rel) Random intercept Religion intercept
s(child) Random intercept Children-in-household intercept
s(female_headed) Random intercept Female-headed household intercept

All factor-smooth interactions use m = 1 (first-order penalty), which allows group-specific curves to deviate from the population mean with a roughness penalty. This hierarchical structure provides adaptive regularisation — groups with limited data are shrunk toward the population curve, while data-rich groups are allowed to deviate more freely.8

3.4 Estimation

All models are estimated using mgcv::bam() with the following settings1:

Parameter Value Rationale
method "fREML" Fast REML for large-sample smooth parameter estimation
discrete TRUE Discretized covariate method for datasets > 100,000 observations9
gamma 1.4 Extra penalty on effective degrees of freedom to prevent overfitting
gc.level 2 Aggressive garbage collection to manage memory
select TRUE Allows smooth terms to be penalized to zero (automatic variable selection)10

Probability model: family = quasibinomial(link = "logit") — quasi-likelihood allows for over/under-dispersion relative to the binomial.

Quantity model: Two candidate families are estimated and the one with lower AIC is selected:

3.5 Survey Weights

To account for the complex survey design, observations are weighted using per-capita survey weights:

wipc = wi × FDQ HH Sizei w¯

where wi is the original survey weight and the denominator normalizes by the mean of the numerator, producing weights that average to 1.

4. Prediction Grid Construction

4.1 The Geographic Standardization Problem

Raw group means (e.g., average iron intake by religion) confound the focal effect with geographic and demographic composition. A religious group concentrated in states with higher cereal consumption will mechanically show different intake patterns even if there is no causal effect of religion on intake.

The prediction grid isolates focal effects by constructing counterfactual populations where non-focal variables are held at a standardized distribution while the focal variable and expenditure vary naturally.

4.2 Grid Construction by Comparison Type

The grid construction depends on which grouping variable is focal:

4.2.1 State Comparisons (group_vars = c("nss", "state_code"))

Each state retains its actual geographic structure (NSS regions, rural/urban mix). Demographics are standardized to the national distribution.

4.2.2 NSS Region Comparisons (group_vars = c("nss", "nss_region"))

Two options are available. The standardized version gives all regions the same rural/urban mix; the unstandardized version keeps each region’s actual sector distribution.

4.2.3 Sector Comparisons (group_vars = c("nss", "sector"))

The national state-region distribution is standardized so that the pure rural–urban effect is isolated:

4.2.4 Demographic Comparisons (e.g., religion, social group, child status)

Full geographic standardization: the national distribution of states, regions, and sectors is applied uniformly to all demographic groups, isolating the effect of the focal variable.

4.3 Expenditure Binning

For each group, households are assigned to expenditure deciles using group-specific cutpoints from the pre-computed MPCE distribution:

bi = 1 + findInterval ( MPCEi , { c10 , c20 , , c90 } )

Within each decile, the group-specific mean MPCE is used as the representative expenditure level (on log scale) for prediction. An “Overall” bin uses the population-weighted mean MPCE for each group.

4.4 Seasonal Model Variation

For the micronutrient models, the seasonal variable is excluded from the prediction grid (season = "FALSE"), unlike the food consumption models. This is because micronutrient intake aggregates across all food sources and seasonal variation is absorbed into the expenditure–intake relationship.

5. Prevalence of Inadequacy

5.1 Person-Level Approach Using Household Demographics

A key feature of this analysis is that the prevalence of inadequacy is not computed from a single reference EAR/RDA for an adult woman. Instead, it exploits the full demographic composition of each household — the age, sex, and physiological profile of every member — to derive a household-specific probability of inadequacy that reflects the actual requirements of the people consuming the food.1112

The procedure has three stages: (1) allocate household intake to individual members, (2) evaluate each member's probability of inadequacy against their person-specific requirement distribution, and (3) aggregate back to the household level for use as a GAM outcome.

5.2 Requirement Distributions

For each household member k with age-sex profile p, the ICMR-NIN reference tables3 provide a person-specific EARp and RDAp. The standard deviation of the requirement distribution is derived from the relationship RDA = EAR + 2 × σreq:

σp = RDAp EARp 2

For most nutrient-profile combinations, requirements are assumed to follow a normal distribution. The exception is iron for menstruating women and adolescent girls (adult women and girls aged 13–18), where requirements follow a log-normal distribution because menstrual iron losses are highly variable and right-skewed:15

σp,log = ln(RDAp) ln(EARp) 2

The complete set of person-specific EAR and RDA values used in this analysis is shown in Figure 5.1 below. These reference values — drawn from the ICMR-NIN 2024 guidelines3 — define the requirement distributions against which each household member's allocated intake is evaluated. Click any micronutrient segment to view exact EAR and RDA values for all thirteen demographic profiles; click the centre to reset. Note the substantially higher iron RDA for menstruating females (girls 13–18, adult women) relative to their EAR, reflecting the log-normal requirement distribution discussed above.

Figure 5.1 — ICMR-NIN EAR and RDA by Demographic Profile and Micronutrient

Source: ICMR-NIN (2024). Each segment represents one micronutrient. Click to view EAR/RDA by demographic profile; click centre to reset. Iron for menstruating females uses a log-normal requirement distribution; all others normal.

5.3 Individual Probability of Inadequacy

Household intake is allocated to individual members in proportion to each member's energy requirement (following the intra-household allocation approach of Smith & Subandoro, 200716):

Yk = Yihh × Ekreq jni Ejreq

where the household total Yhhi can be either the raw (absolute) or energy-adjusted intake (Section 2.6), producing separate prevalence estimates for each variant. The share weights sum to one within each household by construction.

The individual probability of inadequacy is then computed by evaluating each member's allocated intake against their person-specific requirement distribution:

πk = { 1 Φ(Yk; EARp, σp2) Normal case 1 FLN(Yk; lnEARp, σp,log2) Iron, menstruating }

This gives the probability that a random draw from member k’s requirement distribution exceeds their allocated intake — that is, the probability that member k is nutrient-inadequate given what the household reports consuming.

5.4 Household Aggregation

The household-level probability of inadequacy is the unweighted average across all members:

πi = 1ni k=1ni πk

This household-level probability is computed separately for the energy-adjusted and unadjusted intake allocations. Boundary values (exactly 0 or 1) are squeezed into the open interval (ε, 1−ε) with ε = 10−6 for compatibility with quasi-binomial regression.17

5.5 Modeling Prevalence as a GAM Outcome

The household-level prevalence πi is then used as the response variable in a quasi-binomial GAM with logit link, using the same GAM specification (Section 3.3) as the intake models:

logit(πi) = f(log MPCE) + smooth terms + random effects

This allows the prevalence of inadequacy to vary smoothly across expenditure levels and demographic groups, with full posterior uncertainty quantification via the same simulation machinery (Section 6). Separate prevalence models are estimated for the energy-adjusted and unadjusted variants.

Comparison with the standard EAR cut-point method. The conventional approach11 treats all individuals as having the same requirement (that of a reference adult woman) and asks whether the household's per-AFE intake falls below this single threshold. Our approach instead recognizes that a household with young children, adolescent girls, and adult men has a different mix of requirements than a household of adult women alone. By evaluating each member against their own EAR/RDA and averaging, we obtain a prevalence estimate that reflects the actual demographic composition of the household. This is particularly important for nutrients like iron, where requirements vary dramatically by age, sex, and menstrual status.

6. Posterior Simulation and Uncertainty Propagation

The analysis propagates uncertainty from two sources: (1) model uncertainty in the estimated smooth functions (coefficient uncertainty and dispersion parameter uncertainty), and (2) sampling uncertainty from the complex survey design. The approach exploits the Bayesian interpretation of penalized splines, where smoothing penalties correspond to improper Gaussian priors on the spline coefficients, yielding an approximate multivariate normal posterior for the coefficient vector.1

6.1 Step 1: Covariance Matrix Validation

Before drawing coefficient vectors, the posterior covariance matrix Vp = Cov(β̂) is checked for positive definiteness:

λmin (Vp) > 1010

If the check fails, two repair strategies are available: (a) nearPD — project onto the nearest positive-definite matrix in the Frobenius norm (via Matrix::nearPD()); (b) jitter — add a small diagonal perturbation V + εI.

6.2 Step 2: Draw Coefficient Vectors

Coefficient draws are sampled from the approximate posterior:

β(s) N ( β^ , Vp ) , s=1,,M

Separate draw matrices are generated for each sub-model: BL (M × pL) for the logit model and BQ (M × pQ) for the quantity model. This is done independently for all three variants (main, without-cereals, Shannon), yielding 6 draw matrices per micronutrient per survey round.

6.3 Step 3: Draw Dispersion Parameters

The quantity model’s dispersion parameter is drawn from its sampling distribution:

Log-normal model: The residual variance σ² has a scaled inverse-chi-squared posterior:

σ(s) = σ^2 × ν χν2

The log-normal bias correction for each draw is δ(s) = 0.5 × (σ(s))2.

Gamma model: The dispersion φ has a similar scaled distribution:

φ(s) = φ^ × ν χν2 , shape(s) = 1/φ(s)

where ν is the residual degrees of freedom from the fitted model.

6.4 Step 4: Chunked Household-Level Predictions

The prediction grid is processed in blocks (default: 5,000 rows) to manage memory. For each block:

Linear predictors via the lpmatrix:

ηL = XL BLT (n×M) ηQ = XQ BQT (n×M)

Transform to response scale:

Posterior predictive draws for individual-level intake:

Inadequacy CDF evaluation (if enabled): For each of the K = 50 requirement draws, the CDF of the positive-intake distribution is evaluated at Rk, then averaged and combined with the zero-probability term.

6.5 Step 5: Weighted Aggregation to Group × Decile

Within each block, household-level draws are accumulated into group-level weighted averages using normalized within-group weights:

w~i = wi jg wj

The accumulator matrices (G × M) are pre-allocated for: probability, conditional mean, unconditional expected intake, posterior predictive (positive and unconditional), and inadequacy prevalence. Block-level contributions are added via weighted cross-products.

6.6 Step 6: Survey Standard Errors

Plug-in predictions at the coefficient point estimates are computed for each household in the original survey data. A complex survey design object is created:

des <- svydesign(ids = ~psu, strata = ~strata, weights = ~wts, nest = TRUE)

Survey standard errors are obtained for each group × decile cell via svyby() with svymean(), separately for:

The lonely PSU adjustment (survey.lonely.psu = "adjust") ensures stable variance estimates when a stratum contains only one PSU.

6.7 Step 7: Injecting Sampling Uncertainty

Survey standard errors are combined with model draws using scale-appropriate transformations:

Probability draws (logit scale, via delta method):

σlogit = SE(p¯) p¯(1p¯) , p~(s) = logit1 [ logit(p(s)) + ε(s) ]

where ε(s) ~ N(0, σlogit2). Values are clipped to [10−6, 1 − 10−6].

Quantity and intake draws (log scale, mean-preserving):

σlog = SE(q¯) q¯ , q~(s) = q(s) × exp ( ε(s) 0.5 σlog2 )

The bias correction −0.5σlog2 ensures E[exp(ε − 0.5σ2)] = 1, so the noise is mean-preserving.

Inadequacy draws use the logit-scale transformation (same as probability draws), since inadequacy prevalence is a proportion.

Unconditional posterior predictive draws with sampling uncertainty are constructed by combining the noise-injected probability draws with the original posterior predictive quantity draws:

Yuncond(s) = Zsvy(s) × Ypos(s) , Zsvy(s) Bernoulli ( p~svy(s) )

6.8 Step 8: Summary Statistics

For each group × decile × model variant, the following summaries are extracted from the M posterior draws:

Statistic Definition
Mean (1/M) Σs θ(s)
Median 50th percentile of {θ(s)}
95% credible interval [2.5th, 97.5th] percentiles of {θ(s)}

Summaries are computed separately for: intake (main), inadequacy prevalence (main only), intake without cereals, and Shannon diversity.

7. Computational Implementation

7.1 Batch Processing Architecture

The full analysis covers 10 micronutrients × 7 grouping variables = 70 jobs, where each job runs both survey rounds (HCES 2011–12 and 2023–24). Within each job, the three model variants (main, without-cereals, Shannon) are processed sequentially, sharing the same prediction grid.

The batch processor uses future_lapply() for parallel execution across items and grouping variables:

batch_generate_mn_figure_data(
  items = c("iron", "folate", "zinc", ...),
  grouping_vars = c("state_code", "sector", "rel", ...),
  base_dir = base_dir,
  n_workers = NULL   # auto-detected
)

7.2 Adaptive Resource Management

Worker count is determined by the minimum of three constraints:

Wopt = min ( ncores2 , RAM4 GB 2 GB/job , njobs )

The system resource utility detects physical and logical cores, available memory, and recommends the optimal worker count.

7.3 Output File Organization

Each micronutrient gets its own directory with per-grouping-variable output files:

~/data/bam_models/micronutrient/AFE_energy/<nutrient>/
   ├── <nutrient>_data_list.RData          # Raw intake data
   ├── <nutrient>_model.RData              # Fitted hurdle models (6 per round)
   ├── data_mn_intake_<nutrient>_sector.RData
   ├── data_mn_intake_<nutrient>_rel.RData
   ├── data_mn_intake_<nutrient>_social.RData
   └── ... (7 grouping variables)

Each output file contains summary statistics (mean, median, 95% CI) for all model variants, ready for visualization.

8. Visualization

8.1 Plot Structure

Figures display micronutrient intake (or inadequacy prevalence) across expenditure deciles, stratified by demographic group. Each plot contains:

For state-level comparisons, states are grouped by geographic region (Northern, Southern, Eastern, Western, North-Eastern, Central), each rendered as a separate panel.

8.2 Output Formats

Plots are produced as:

Glossary

Term Definition
AFE (Adult Female Equivalent) Household size measure scaled by ICMR-NIN age-sex-specific requirements; converts each member to equivalent adult females
BAM mgcv::bam() — Bayesian Additive Model for large datasets; uses fast REML and discretized covariates
Coefficient of Variation (CV) SE / mean; used to convert standard errors to the log scale (σlog = CV)
Delta Method Approximation for the variance of a transformed variable: Var(g(θ)) ≈ [g′(θ)]2 Var(θ)
EAR Estimated Average Requirement — the median nutrient requirement; intake below EAR indicates a >50% probability of inadequacy. In this analysis, person-specific EAR values from ICMR-NIN are used for each household member's age-sex profile
Energy Adjustment (Nutrient Density Scaling) Scaling household micronutrient intake by the ratio of required to reported energy, holding diet composition constant. Equivalent to multiplying nutrient density (per kcal) by the caloric requirement. Follows the FAO nutrient density framework (1998) and Vossenaar et al. (2020)
Factor-Smooth Interaction bs = "fs" in mgcv — allows each level of a factor to have its own smooth curve of a continuous predictor, with a shared penalty
fREML Fast Restricted Maximum Likelihood — efficient method for estimating smoothing parameters in large-sample GAMs
Geographic Standardization Holding the geographic (state/region/sector) distribution constant across comparison groups to isolate focal effects
Hurdle Model Two-part model: (1) probability of positive outcome, (2) distribution of positive values; allows structural zeros
ICMR-NIN Indian Council of Medical Research – National Institute of Nutrition; source of dietary requirements and food composition data
Inadequacy Prevalence Average probability across household members that individual intake falls below person-specific requirements; uses age-sex-specific EAR/RDA from ICMR-NIN for each member, with normal distribution (or log-normal for iron in menstruating females)
Intra-Household Allocation Distribution of household-level intake to individual members in proportion to their energy requirements (Smith & Subandoro, 2007)
lpmatrix Linear predictor matrix X such that η = Xβ; enables vectorized computation of predictions across all simulation draws
Mean-Preserving Noise Multiplicative noise exp(ε − 0.5σ2) with E[·] = 1; injects variance without shifting the mean
Posterior Predictive Draw Simulated observation combining parameter uncertainty (coefficient draws) with observation-level variability (distributional draws)
Prediction Grid Counterfactual population with standardized non-focal variables; used to compute comparable group-level estimates
RDA Recommended Dietary Allowance — nutrient intake sufficient for 97.5% of individuals; equals EAR + 2 × σreq (this analysis uses 2 rather than 1.96)
Scaling Factor (SF) Ratio of a household's total energy requirement (summed across members) to its reported caloric intake; SF > 1 indicates energy under-reporting (the common case)
Shannon Diversity Index H = −Σ pj ln(pj); measures evenness of micronutrient sources across food categories

Data Sources

Source Description
HCES 2011–12 Household Consumer Expenditure Survey, NSS 68th Round (NSSO)
HCES 2023–24 Household Consumer Expenditure Survey (MoSPI)
ICMR-NIN Requirements Nutrient requirements (EAR, RDA) by age-sex profile, from nin_requirements.dta; provides person-specific EAR and RDA for each micronutrient across all demographic profiles
Household Demographic Roster Individual-level records of household members with age, sex, physiological profile, and ICMR-NIN energy requirements; from HCES20XX_AFE.RData
Food Composition Tables Micronutrient content per food item, from ICMR-NIN Indian Food Composition Tables
General Price Index State-level price deflators for converting nominal MPCE to constant 2011–12 prices

References

[1] Wood, S.N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). Chapman and Hall/CRC.
[2] Cragg, J.G. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 39(5), 829–844.
[3] ICMR-NIN (2024). Recommended Dietary Allowances and Estimated Average Requirements for Indians. Indian Council of Medical Research – National Institute of Nutrition, Hyderabad.
[4] Longvah, T., Ananthan, R., Bhaskarachary, K. & Venkaiah, K. (2017). Indian Food Composition Tables. National Institute of Nutrition, Hyderabad.
[5] Vijayakumar, A., Dubasi, H.B., Awasthi, A. & Jaacks, L.M. (2024). Development of an Indian Food Composition Database. Current Developments in Nutrition, 8(7), 103790.
[6] Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.
[7] Mullahy, J. (1998). Much ado about two: reconsidering retransformation and the two-part model in health econometrics. Journal of Health Economics, 17(3), 247–281.
[8] Gelman, A. (2006). Multilevel (hierarchical) modeling: What it can and can't do. Technometrics, 48(3), 432–435.
[9] Wood, S.N., Goude, Y. & Shaw, S. (2015). Generalized additive models for large data sets. Journal of the Royal Statistical Society: Series C, 64(1), 139–155.
[10] Li, R. & Shively, T.S. (2008). Variable selection in semiparametric regression modeling. Annals of Statistics, 36(1), 261–286.
[11] Institute of Medicine (2000). Dietary Reference Intakes: Applications in Dietary Assessment. National Academies Press. The EAR cut-point method assumes that the intake distribution and the requirement distribution are independent, and that the requirement distribution is approximately symmetric.
[12] Beaton, G.H. (1994). Criteria of an adequate diet. In M.E. Shils, J.A. Olson & M. Shike (Eds.), Modern Nutrition in Health and Disease (8th ed., pp. 1491–1505). Lea & Febiger.
[13] FAO/WHO (1998). Preparation and Use of Food-Based Dietary Guidelines. WHO Technical Report Series No. 880. Expresses nutrient requirements as densities per 1,000 kcal to define compositional adequacy of diets assuming sufficient energy is consumed.
[14] Vossenaar, M., Doak, C.M., et al. (2020). Nutrient Density as a Dimension of Dietary Quality: Findings of the Nutrient Density Approach in a Multi-Center Evaluation. Nutrients, 12(6), 1792. Formalizes the "critical nutrient density" framework: nutrient requirement / energy requirement.
[15] Institute of Medicine (2001). Dietary Reference Intakes for Vitamin A, Vitamin K, Arsenic, Boron, Chromium, Copper, Iodine, Iron, Manganese, Molybdenum, Nickel, Silicon, Vanadium, and Zinc. National Academies Press. Documents the log-normal distribution of iron requirements for menstruating women.
[16] Smith, L.C. & Subandoro, A. (2007). Measuring Food Security Using Household Expenditure Surveys. Food Security in Practice Technical Guide Series No. 3. International Food Policy Research Institute (IFPRI). Proportional allocation of household intake to members based on energy requirement shares.
[17] Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods, 11(1), 54–71.