Strata.Rd
Stratified sampling with equal/unequal probabilities.
Strata(x, stratanames = NULL, size,
method = c("srswor", "srswr", "poisson", "systematic"),
pik, description = FALSE)
a data frame or a matrix; its number of rows is n, the population size.
vector of stratification variables.
vector of stratum sample sizes (in the order in which the strata are given in the input data set).
method to select units; implemented are: a) simple random
sampling without replacement ("srswor"
), b) simple random sampling with replacement ("srswr"
),
c) Poisson sampling ("poisson"
), d) systematic sampling ("systematic"
) (default is "srswor"
).
vector of inclusion probabilities or auxiliary information used to compute them; this argument is only used for unequal probability sampling (Poisson and systematic). If an auxiliary information is provided, the function uses the inclusionprobabilities function for computing these probabilities. If the method is "srswr" and the sample size is larger than the population size, this vector is normalized to one.
a message is printed if its value is TRUE; the message gives the number of selected units and the number of the units in the population. By default, the value is FALSE.
The function produces an object, which contains the following information:
the identifier of the selected units.
the unit stratum.
the final unit inclusion probability.
# Example from An and Watts (New SAS procedures for Analysis of Sample Survey Data)
# generates artificial data (a 235X3 matrix with 3 columns: state, region, income).
# the variable "state" has 2 categories ('nc' and 'sc').
# the variable "region" has 3 categories (1, 2 and 3).
# the sampling frame is stratified by region within state.
# the income variable is randomly generated
m <- rbind(matrix(rep("nc",165), 165, 1, byrow=TRUE),
matrix(rep("sc", 70), 70, 1, byrow=TRUE))
m <- cbind.data.frame(m, c(rep(1, 100), rep(2,50), rep(3,15),
rep(1, 30), rep(2, 40)), 1000 * runif(235))
names(m) <- c("state", "region", "income")
# computes the population stratum sizes
table(m$region, m$state)
#>
#> nc sc
#> 1 100 30
#> 2 50 40
#> 3 15 0
# not run
# nc sc
# 1 100 30
# 2 50 40
# 3 15 0
# there are 5 cells with non-zero values
# one draws 5 samples (1 sample in each stratum)
# the sample stratum sizes are 10,5,10,4,6, respectively
# the method is 'srswor' (equal probability, without replacement)
s <- Strata(m, c("region", "state"), size=c(10, 5, 10, 4, 6), method="srswor")
# extracts the observed data
data.frame(income=m[s$id, "income"], s)
#> income state region income.1 stratum size id
#> 1.1 431.79468 nc 1 431.79468 1 10 1
#> 1.80 256.99471 nc 1 256.99471 1 10 80
#> 1.73 27.68708 nc 1 27.68708 1 10 73
#> 1.32 785.78417 nc 1 785.78417 1 10 32
#> 1.15 982.44097 nc 1 982.44097 1 10 15
#> 1.71 427.83172 nc 1 427.83172 1 10 71
#> 1.66 507.13136 nc 1 507.13136 1 10 66
#> 1.81 249.62570 nc 1 249.62570 1 10 81
#> 1.13 451.99868 nc 1 451.99868 1 10 13
#> 1.57 178.19258 nc 1 178.19258 1 10 57
#> 2.130 460.58674 nc 2 460.58674 2 5 130
#> 2.135 469.94421 nc 2 469.94421 2 5 135
#> 2.117 991.83105 nc 2 991.83105 2 5 117
#> 2.106 773.62990 nc 2 773.62990 2 5 106
#> 2.109 192.33168 nc 2 192.33168 2 5 109
#> 3.160 512.05001 nc 3 512.05001 3 10 160
#> 3.163 708.51129 nc 3 708.51129 3 10 163
#> 3.161 771.14369 nc 3 771.14369 3 10 161
#> 3.154 853.65094 nc 3 853.65094 3 10 154
#> 3.157 411.88021 nc 3 411.88021 3 10 157
#> 3.152 390.29183 nc 3 390.29183 3 10 152
#> 3.158 636.70828 nc 3 636.70828 3 10 158
#> 3.151 663.38892 nc 3 663.38892 3 10 151
#> 3.164 999.05023 nc 3 999.05023 3 10 164
#> 3.156 533.18085 nc 3 533.18085 3 10 156
#> 4.180 464.97943 sc 1 464.97943 4 4 180
#> 4.175 520.83494 sc 1 520.83494 4 4 175
#> 4.174 569.91573 sc 1 569.91573 4 4 174
#> 4.186 35.69785 sc 1 35.69785 4 4 186
#> 5.204 151.80241 sc 2 151.80241 5 6 204
#> 5.198 424.94331 sc 2 424.94331 5 6 198
#> 5.216 516.31473 sc 2 516.31473 5 6 216
#> 5.222 833.44388 sc 2 833.44388 5 6 222
#> 5.201 16.52919 sc 2 16.52919 5 6 201
#> 5.228 307.86605 sc 2 307.86605 5 6 228
# see the result using a contigency table
table(s$region, s$state)
#>
#> nc sc
#> 1 10 4
#> 2 5 6
#> 3 10 0
# The same data as in Example 1
# the method is 'systematic' (unequal probability, without replacement)
# the selection probabilities are computed using the variable 'income'
s <- Strata(m,c("region", "state"), size=c(10, 5, 10, 4, 6),
method="systematic", pik=m$income)
# extracts the observed data
data.frame(income=m[s$id, "income"], s)
#> income state region income.1 stratum size id
#> 1.13 451.99868 nc 1 451.99868 1 10 13
#> 1.27 20.54442 nc 1 20.54442 1 10 27
#> 1.7 552.14334 nc 1 552.14334 1 10 7
#> 1.48 934.30059 nc 1 934.30059 1 10 48
#> 1.62 177.50997 nc 1 177.50997 1 10 62
#> 1.50 887.68151 nc 1 887.68151 1 10 50
#> 1.66 507.13136 nc 1 507.13136 1 10 66
#> 1.80 256.99471 nc 1 256.99471 1 10 80
#> 1.97 270.36405 nc 1 270.36405 1 10 97
#> 1.100 888.16006 nc 1 888.16006 1 10 100
#> 2.134 894.79894 nc 2 894.79894 2 5 134
#> 2.145 486.84010 nc 2 486.84010 2 5 145
#> 2.142 601.15380 nc 2 601.15380 2 5 142
#> 2.117 991.83105 nc 2 991.83105 2 5 117
#> 2.107 36.20353 nc 2 36.20353 2 5 107
#> 3.154 853.65094 nc 3 853.65094 3 10 154
#> 3.165 135.38640 nc 3 135.38640 3 10 165
#> 3.163 708.51129 nc 3 708.51129 3 10 163
#> 3.158 636.70828 nc 3 636.70828 3 10 158
#> 3.159 914.26539 nc 3 914.26539 3 10 159
#> 3.151 663.38892 nc 3 663.38892 3 10 151
#> 3.162 383.89029 nc 3 383.89029 3 10 162
#> 3.161 771.14369 nc 3 771.14369 3 10 161
#> 3.153 876.68424 nc 3 876.68424 3 10 153
#> 3.152 390.29183 nc 3 390.29183 3 10 152
#> 4.191 285.42066 sc 1 285.42066 4 4 191
#> 4.167 58.01234 sc 1 58.01234 4 4 167
#> 4.168 141.61709 sc 1 141.61709 4 4 168
#> 4.181 27.77380 sc 1 27.77380 4 4 181
#> 5.209 338.79154 sc 2 338.79154 5 6 209
#> 5.223 947.00049 sc 2 947.00049 5 6 223
#> 5.221 852.56993 sc 2 852.56993 5 6 221
#> 5.213 632.28781 sc 2 632.28781 5 6 213
#> 5.230 556.82973 sc 2 556.82973 5 6 230
#> 5.217 106.51764 sc 2 106.51764 5 6 217
# see the result using a contigency table
table(s$region, s$state)
#>
#> nc sc
#> 1 10 4
#> 2 5 6
#> 3 10 0