Stratified Sampling: Software
(Stata, R)
PUBHBIO 7225 Lecture 8
Outline
Topics
- Stata for Stratified Samples
- R for Stratified Samples
Activities
- 8.1 Stata/R for Stratified Samples
Assignments
- Peer Evaluation of Problem Set 2 due Tuesday 9/23/25 11:59pm via Carmen
- Quiz 2 due Thursday 9/25/2025 11:59pm via Carmen
- Group Progress Report due Thursday 9/25/2025 11:59pm via Carmen (only one group member needs to upload this)
Stata Commands for Stratified Samples
Select a stratified random sample of \(n_h=50\) in each stratum:
sort STRATVAR IDVAR # STRATVAR = variable defining strata, IDVAR = record ID
set seed 217
by STRATVAR: sample 50, count
Select a stratified random sample by taking 50% of each stratum:
sort STRATVAR IDVAR
set seed 531
by STRATVAR: sample 50
Tell Stata about the sample design for a stratified sample:
svyset [pweight=WEIGHTVAR], fpc(N_h) strata(STRATVAR)
# WEIGHTVAR = variable containing sampling weights
# N_h = variable that contains finite population sizes in each stratum
With the survey design set, for estimation use same commands as for SRS (e.g., svy:)
R Commands for Stratified Samples
Select a stratified random sample of \(n_h=50\) in each stratum:
set.seed(217)
DF <- POPDATA %>% group_by(STRATVAR) %>% slice_sample(n = 50)
# STRATVAR = variable defining strata
Select a stratified random sample by taking 50% of each stratum:
set.seed(217)
DF <- POPDATA %>% group_by(STRATVAR) %>% slice_sample(prop = 0.5)
Tell R about the sample design for a stratified sample:
DESIGN.OBJECT <- svydesign(id = ~1, weights = ~WEIGHTVAR, fpc = ~N_h,
strata = ~STRATVAR, data=DF)
# WEIGHTVAR = variable containing sampling weights
# N_h = variable that contains finite population sizes in each stratum
With the survey design object created, for estimation use same commands as for SRS (e.g., svymean())
Activity 8.1
Stata/R for Stratified Samples