Simple Random Sampling: Software (Stata, R)

PUBHBIO 7225 Lecture 5

Outline

Topics

  • Stata for SRS
  • R for SRS

Activities

  • 5.1 Stata/R for SRS


Assignments

  • Quiz 1 due Thursday 9/11/2025 11:59pm via Carmen
  • Problem Set 2 due Thursday 9/18/2025 11:59pm via Carmen

Stata Commands for SRS

Select an SRS of size 100:

set seed 72947
sample 100, count

Select an SRS that is 15% of the population:

set seed 72947
sample 15

Create a new variable called N that contains the value 10,000 for all records:

generate N = 10000
# can abbreviate with:
gen N = 10000

Tell Stata about the sample design for an SRS:

svyset [pweight=WEIGHTVAR], fpc(N)
# WEIGHTVAR = variable containing sampling weights
# N = variable that contains finite population size

(Some) Stata Commands for Survey Estimation

Must first run svyset command. Variables are y, x1, x2.

Estimate a mean:

svy: mean y

Estimate a proportion:

svy: proportion x1
# can abbreviate with:
svy: prop x1

Estimate a total:

svy: total y

Linear and logistic regression:

# linear regression (y on x1 and x2)
svy: regress y x1 x2
# logistic regression (x1 on x2)
svy: logistic x1 x2

R Commands for SRS

Select an SRS of size 100 (from a dataset called POPDATA):

set.seed(72947)
DF <- POPDATA %>% slice_sample(n = 100)
# DF = resulting dataset with 100 observations

Select an SRS that is 15% of the population (from a dataset called POPDATA):

set.seed(72947)
DF <- POPDATA %>% slice_sample(prop = 0.15)

Create a new variable called N that contains the value 10,000 for all records:

DF$N <- 10000
# or, tidyverse-style:
DF <- DF %>% mutate(N = 10000)

Tell R about the sample design for an SRS:

DESIGN.OBJECT <- svydesign(id = ~1, weights = ~WEIGHTVAR, fpc = ~N, data=DF)
# WEIGHTVAR = variable containing sampling weights
# N = variable that contains finite population size

(Some) R Commands for Survey Estimation

Must first use svydesign() to make survey design object. Variables are y, x1, x2.

Estimate a mean:

svymean(~y, design=DESIGN.OBJECT) # DESIGN.OBJECT = design object from svydesign()

Estimate a proportion:

# proportions at each level of the factor
svymean(~factor(x1), design=DESIGN.OBJECT)
# if variable is 1/0 or TRUE/FALSE can use svymean() to get proportion
svymean(~x1, design=DESIGN.OBJECT)

Estimate a total:

svytotal(~x1, design=DESIGN.OBJECT)

Linear and logistic regression:

# linear regression (y on x1 and x2)
svyglm(y ~ x1 + x2, design=DESIGN.OBJECT)
# logistic regression (x1 on x2)
svyglm(x1 ~ x2, family=binomial(link="logit"), design=DESIGN.OBJECT)

Activity 5.1

Stata/R for SRS