d.pizza.Rd
An artificial dataset inspired by a similar dataset pizza.sav in Arbeitsbuch zur deskriptiven und induktiven Statistik by Toutenburg et.al.
The dataset contains data of a pizza delivery service in London, delivering pizzas to three areas. Every record defines one order/delivery and the according properties. A pizza is supposed to taste good, if its temperature is high enough, say 45 Celsius. So it might be interesting for the pizza delivery service to minimize the delivery time.
The dataset is designed to be as evil as possible. As far as the description is concerned, it should pose the same difficulties that we have to deal with in everyday life. It contains the most used datatypes as numerics, factors, ordered factors, integers, logicals and a date. NAs are scattered everywhere partly systematically, partly randomly (except in the index).
data(d.pizza)
A data frame with 1209 observations on the following 17 variables.
index
a numeric vector, indexing the records (no missings here).
date
Date, the delivery date
week
integer, the weeknumber
weekday
integer, the weekday
area
factor, the three London districts: Brent
, Camden
, Westminster
count
integer, the number of pizzas delivered
rabate
logical, TRUE
if a rabate has been given
price
numeric, the total price of delivered pizza(s)
operator
a factor with levels Allanah
Maria
Rhonda
driver
a factor with levels Carpenter
Carter
Taylor
Butcher
Hunter
Miller
Farmer
delivery_min
numeric, the delivery time in minutes (decimal)
temperature
numeric, the temperature of the pizza in degrees Celsius when delivered to the customer
wine_ordered
integer, 1 if wine was ordered, 0 if not
wine_delivered
integer, 1 if wine was delivered, 0 if not
wrongpizza
logical, TRUE
if a wrong pizza was delivered
quality
ordered factor with levels low
< medium
< high
, defining the quality of the pizza when delivered
The dataset contains NAs randomly scattered.
Toutenburg H, Schomaker M, Wissmann M, Heumann C (2009): Arbeitsbuch zur deskriptiven und induktiven Statistik Springer, Berlin Heidelberg
str(d.pizza)
#> 'data.frame': 1209 obs. of 16 variables:
#> $ index : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ date : Date, format: "2014-03-01" "2014-03-01" ...
#> $ week : num 9 9 9 9 9 9 9 9 9 9 ...
#> $ weekday : num 6 6 6 6 6 6 6 6 6 6 ...
#> $ area : Factor w/ 3 levels "Brent","Camden",..: 2 3 3 1 1 2 2 1 3 1 ...
#> $ count : int 5 2 3 2 5 1 4 NA 3 6 ...
#> $ rabate : logi TRUE FALSE FALSE FALSE TRUE FALSE ...
#> $ price : num 65.7 27 41 26 57.6 ...
#> $ operator : Factor w/ 3 levels "Allanah","Maria",..: 3 3 1 1 3 1 3 1 1 3 ...
#> $ driver : Factor w/ 7 levels "Butcher","Carpenter",..: 7 1 1 7 3 7 7 7 7 3 ...
#> $ delivery_min : num 20 19.6 17.8 37.3 21.8 48.7 49.3 25.6 26.4 24.3 ...
#> $ temperature : num 53 56.4 36.5 NA 50 27 33.9 54.8 48 54.4 ...
#> $ wine_ordered : int 0 0 0 0 0 0 1 NA 0 1 ...
#> $ wine_delivered: int 0 0 0 0 0 0 1 NA 0 1 ...
#> $ wrongpizza : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
#> $ quality : Ord.factor w/ 3 levels "low"<"medium"<..: 2 3 NA NA 2 1 1 3 3 2 ...
head(d.pizza)
#> index date week weekday area count rabate price operator driver
#> 1 1 2014-03-01 9 6 Camden 5 TRUE 65.7 Rhonda Taylor
#> 2 2 2014-03-01 9 6 Westminster 2 FALSE 27.0 Rhonda Butcher
#> 3 3 2014-03-01 9 6 Westminster 3 FALSE 41.0 Allanah Butcher
#> 4 4 2014-03-01 9 6 Brent 2 FALSE 26.0 Allanah Taylor
#> 5 5 2014-03-01 9 6 Brent 5 TRUE 57.6 Rhonda Carter
#> 6 6 2014-03-01 9 6 Camden 1 FALSE 14.0 Allanah Taylor
#> delivery_min temperature wine_ordered wine_delivered wrongpizza quality
#> 1 20.0 53.0 0 0 FALSE medium
#> 2 19.6 56.4 0 0 FALSE high
#> 3 17.8 36.5 0 0 FALSE <NA>
#> 4 37.3 NA 0 0 FALSE <NA>
#> 5 21.8 50.0 0 0 FALSE medium
#> 6 48.7 27.0 0 0 FALSE low
Desc(d.pizza)
#> ──────────────────────────────────────────────────────────────────────────────
#> Describe d.pizza (data.frame):
#>
#> data frame: 1209 obs. of 16 variables
#> 917 complete cases (75.8%)
#>
#> Nr Class ColName NAs Levels
#> 1 int index .
#> 2 dat date 32 (2.6%)
#> 3 num week 32 (2.6%)
#> 4 num weekday 32 (2.6%)
#> 5 fac area 10 (0.8%) (3): 1-Brent, 2-Camden,
#> 3-Westminster
#> 6 int count 12 (1.0%)
#> 7 log rabate 12 (1.0%)
#> 8 num price 12 (1.0%)
#> 9 fac operator 8 (0.7%) (3): 1-Allanah, 2-Maria, 3-Rhonda
#> 10 fac driver 5 (0.4%) (7): 1-Butcher, 2-Carpenter,
#> 3-Carter, 4-Farmer, 5-Hunter, ...
#> 11 num delivery_min .
#> 12 num temperature 39 (3.2%)
#> 13 int wine_ordered 12 (1.0%)
#> 14 int wine_delivered 12 (1.0%)
#> 15 log wrongpizza 4 (0.3%)
#> 16 ord quality 201 (16.6%) (3): 1-low, 2-medium, 3-high
#>
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 1 - index (integer)
#>
#> length n NAs unique 0s mean meanCI'
#> 1'209 1'209 0 = n 0 605.00 585.30
#> 100.0% 0.0% 0.0% 624.70
#>
#> .05 .10 .25 median .75 .90 .95
#> 61.40 121.80 303.00 605.00 907.00 1'088.20 1'148.60
#>
#> range sd vcoef mad IQR skew kurt
#> 1'208.00 349.15 0.58 447.75 604.00 0.00 -1.20
#>
#> lowest : 1, 2, 3, 4, 5
#> highest: 1'205, 1'206, 1'207, 1'208, 1'209
#>
#> ' 95%-CI (classic)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 2 - date (Date)
#>
#> length n NAs unique
#> 1'209 1'177 32 31
#> 97.4% 2.6%
#>
#> lowest : 2014-03-01 (42), 2014-03-02 (46), 2014-03-03 (26), 2014-03-04 (19)
#> highest: 2014-03-28 (46), 2014-03-29 (53), 2014-03-30 (43), 2014-03-31 (34)
#>
#>
#> Weekday:
#>
#> Pearson's Chi-squared test (1-dim uniform):
#> X-squared = 79, df = 6, p-value = 6e-15
#>
#> level freq perc cumfreq cumperc
#> 1 Monday 144 12.2% 144 12.2%
#> 2 Tuesday 117 9.9% 261 22.2%
#> 3 Wednesday 134 11.4% 395 33.6%
#> 4 Thursday 147 12.5% 542 46.0%
#> 5 Friday 171 14.5% 713 60.6%
#> 6 Saturday 244 20.7% 957 81.3%
#> 7 Sunday 220 18.7% 1'177 100.0%
#>
#> Months:
#>
#> Pearson's Chi-squared test (1-dim uniform):
#> X-squared = 12947, df = 11, p-value <2e-16
#>
#> level freq perc cumfreq cumperc
#> 1 January 0 0.0% 0 0.0%
#> 2 February 0 0.0% 0 0.0%
#> 3 March 1'177 100.0% 1'177 100.0%
#> 4 April 0 0.0% 1'177 100.0%
#> 5 May 0 0.0% 1'177 100.0%
#> 6 June 0 0.0% 1'177 100.0%
#> 7 July 0 0.0% 1'177 100.0%
#> 8 August 0 0.0% 1'177 100.0%
#> 9 September 0 0.0% 1'177 100.0%
#> 10 October 0 0.0% 1'177 100.0%
#> 11 November 0 0.0% 1'177 100.0%
#> 12 December 0 0.0% 1'177 100.0%
#>
#> By days :
#>
#> level freq perc cumfreq cumperc
#> 1 2014-03-01 42 3.6% 42 3.6%
#> 2 2014-03-02 46 3.9% 88 7.5%
#> 3 2014-03-03 26 2.2% 114 9.7%
#> 4 2014-03-04 19 1.6% 133 11.3%
#> 5 2014-03-05 33 2.8% 166 14.1%
#> 6 2014-03-06 39 3.3% 205 17.4%
#> 7 2014-03-07 44 3.7% 249 21.2%
#> 8 2014-03-08 55 4.7% 304 25.8%
#> 9 2014-03-09 42 3.6% 346 29.4%
#> 10 2014-03-10 26 2.2% 372 31.6%
#> 11 2014-03-11 34 2.9% 406 34.5%
#> 12 2014-03-12 36 3.1% 442 37.6%
#> 13 2014-03-13 35 3.0% 477 40.5%
#> 14 2014-03-14 38 3.2% 515 43.8%
#> 15 2014-03-15 48 4.1% 563 47.8%
#> 16 2014-03-16 47 4.0% 610 51.8%
#> 17 2014-03-17 30 2.5% 640 54.4%
#> 18 2014-03-18 32 2.7% 672 57.1%
#> 19 2014-03-19 31 2.6% 703 59.7%
#> 20 2014-03-20 36 3.1% 739 62.8%
#> 21 2014-03-21 43 3.7% 782 66.4%
#> 22 2014-03-22 46 3.9% 828 70.3%
#> 23 2014-03-23 42 3.6% 870 73.9%
#> 24 2014-03-24 28 2.4% 898 76.3%
#> 25 2014-03-25 32 2.7% 930 79.0%
#> 26 2014-03-26 34 2.9% 964 81.9%
#> 27 2014-03-27 37 3.1% 1'001 85.0%
#> 28 2014-03-28 46 3.9% 1'047 89.0%
#> 29 2014-03-29 53 4.5% 1'100 93.5%
#> 30 2014-03-30 43 3.7% 1'143 97.1%
#> 31 2014-03-31 34 2.9% 1'177 100.0%
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 3 - week (numeric)
#>
#> length n NAs unique 0s mean meanCI'
#> 1'209 1'177 32 6 0 11.40 11.33
#> 97.4% 2.6% 0.0% 11.48
#>
#> .05 .10 .25 median .75 .90 .95
#> 9.00 10.00 10.00 11.00 13.00 13.00 13.00
#>
#> range sd vcoef mad IQR skew kurt
#> 5.00 1.33 0.12 1.48 3.00 -0.07 -1.01
#>
#>
#> value freq perc cumfreq cumperc
#> 1 9 88 7.5% 88 7.5%
#> 2 10 258 21.9% 346 29.4%
#> 3 11 264 22.4% 610 51.8%
#> 4 12 260 22.1% 870 73.9%
#> 5 13 273 23.2% 1'143 97.1%
#> 6 14 34 2.9% 1'177 100.0%
#>
#> ' 95%-CI (classic)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 4 - weekday (numeric)
#>
#> length n NAs unique 0s mean meanCI'
#> 1'209 1'177 32 7 0 4.44 4.33
#> 97.4% 2.6% 0.0% 4.56
#>
#> .05 .10 .25 median .75 .90 .95
#> 1.00 1.00 3.00 5.00 6.00 7.00 7.00
#>
#> range sd vcoef mad IQR skew kurt
#> 6.00 2.02 0.45 2.97 3.00 -0.34 -1.17
#>
#>
#> value freq perc cumfreq cumperc
#> 1 1 144 12.2% 144 12.2%
#> 2 2 117 9.9% 261 22.2%
#> 3 3 134 11.4% 395 33.6%
#> 4 4 147 12.5% 542 46.0%
#> 5 5 171 14.5% 713 60.6%
#> 6 6 244 20.7% 957 81.3%
#> 7 7 220 18.7% 1'177 100.0%
#>
#> ' 95%-CI (classic)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 5 - area (factor)
#>
#> length n NAs unique levels dupes
#> 1'209 1'199 10 3 3 y
#> 99.2% 0.8%
#>
#> level freq perc cumfreq cumperc
#> 1 Brent 474 39.5% 474 39.5%
#> 2 Westminster 381 31.8% 855 71.3%
#> 3 Camden 344 28.7% 1'199 100.0%
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 6 - count (integer)
#>
#> length n NAs unique 0s mean meanCI'
#> 1'209 1'197 12 8 0 3.44 3.36
#> 99.0% 1.0% 0.0% 3.53
#>
#> .05 .10 .25 median .75 .90 .95
#> 1.00 2.00 2.00 3.00 4.00 6.00 6.00
#>
#> range sd vcoef mad IQR skew kurt
#> 7.00 1.56 0.45 1.48 2.00 0.45 -0.36
#>
#>
#> value freq perc cumfreq cumperc
#> 1 1 108 9.0% 108 9.0%
#> 2 2 259 21.6% 367 30.7%
#> 3 3 300 25.1% 667 55.7%
#> 4 4 240 20.1% 907 75.8%
#> 5 5 152 12.7% 1'059 88.5%
#> 6 6 97 8.1% 1'156 96.6%
#> 7 7 34 2.8% 1'190 99.4%
#> 8 8 7 0.6% 1'197 100.0%
#>
#> ' 95%-CI (classic)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 7 - rabate (logical - dichotomous)
#>
#> length n NAs unique
#> 1'209 1'197 12 2
#> 99.0% 1.0%
#>
#> freq perc lci.95 uci.95'
#> FALSE 601 50.2% 47.4% 53.0%
#> TRUE 596 49.8% 47.0% 52.6%
#>
#> ' 95%-CI (Wilson)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 8 - price (numeric)
#>
#> length n NAs unique 0s mean meanCI'
#> 1'209 1'197 12 360 0 48.73 47.50
#> 99.0% 1.0% 0.0% 49.96
#>
#> .05 .10 .25 median .75 .90 .95
#> 13.99 23.98 30.98 46.76 63.18 78.83 87.12
#>
#> range sd vcoef mad IQR skew kurt
#> 125.54 21.63 0.44 23.40 32.20 0.50 0.11
#>
#> lowest : 8.79 (3), 9.59, 10.39 (2), 10.99 (11), 11.19 (2)
#> highest: 116.53, 123.39, 124.43, 129.55, 134.33
#>
#> ' 95%-CI (classic)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 9 - operator (factor)
#>
#> length n NAs unique levels dupes
#> 1'209 1'201 8 3 3 y
#> 99.3% 0.7%
#>
#> level freq perc cumfreq cumperc
#> 1 Rhonda 446 37.1% 446 37.1%
#> 2 Maria 388 32.3% 834 69.4%
#> 3 Allanah 367 30.6% 1'201 100.0%
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 10 - driver (factor)
#>
#> length n NAs unique levels dupes
#> 1'209 1'204 5 7 7 y
#> 99.6% 0.4%
#>
#> level freq perc cumfreq cumperc
#> 1 Carpenter 272 22.6% 272 22.6%
#> 2 Carter 234 19.4% 506 42.0%
#> 3 Taylor 204 16.9% 710 59.0%
#> 4 Hunter 156 13.0% 866 71.9%
#> 5 Miller 125 10.4% 991 82.3%
#> 6 Farmer 117 9.7% 1'108 92.0%
#> 7 Butcher 96 8.0% 1'204 100.0%
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 11 - delivery_min (numeric)
#>
#> length n NAs unique 0s mean meanCI'
#> 1'209 1'209 0 384 0 25.65 25.04
#> 100.0% 0.0% 0.0% 26.26
#>
#> .05 .10 .25 median .75 .90 .95
#> 10.40 11.60 17.40 24.40 32.50 40.42 45.20
#>
#> range sd vcoef mad IQR skew kurt
#> 56.80 10.84 0.42 11.27 15.10 0.61 0.10
#>
#> lowest : 8.8 (3), 8.9, 9.0 (3), 9.1 (5), 9.2 (3)
#> highest: 61.9, 62.7, 62.9, 63.2, 65.6
#>
#> ' 95%-CI (classic)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 12 - temperature (numeric)
#>
#> length n NAs unique 0s mean meanCI'
#> 1'209 1'170 39 375 0 47.94 47.37
#> 96.8% 3.2% 0.0% 48.51
#>
#> .05 .10 .25 median .75 .90 .95
#> 26.70 33.29 42.23 50.00 55.30 58.80 60.50
#>
#> range sd vcoef mad IQR skew kurt
#> 45.50 9.94 0.21 9.19 13.07 -0.84 0.05
#>
#> lowest : 19.3, 19.4, 20.0, 20.2 (2), 20.35
#> highest: 63.8, 64.1, 64.6, 64.7, 64.8
#>
#> ' 95%-CI (classic)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 13 - wine_ordered (integer - dichotomous)
#>
#> length n NAs unique
#> 1'209 1'197 12 2
#> 99.0% 1.0%
#>
#> freq perc lci.95 uci.95'
#> 0 1'010 84.4% 82.2% 86.3%
#> 1 187 15.6% 13.7% 17.8%
#>
#> ' 95%-CI (Wilson)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 14 - wine_delivered (integer - dichotomous)
#>
#> length n NAs unique
#> 1'209 1'197 12 2
#> 99.0% 1.0%
#>
#> freq perc lci.95 uci.95'
#> 0 1'034 86.4% 84.3% 88.2%
#> 1 163 13.6% 11.8% 15.7%
#>
#> ' 95%-CI (Wilson)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 15 - wrongpizza (logical - dichotomous)
#>
#> length n NAs unique
#> 1'209 1'205 4 2
#> 99.7% 0.3%
#>
#> freq perc lci.95 uci.95'
#> FALSE 1'122 93.1% 91.5% 94.4%
#> TRUE 83 6.9% 5.6% 8.5%
#>
#> ' 95%-CI (Wilson)
#>
#> ──────────────────────────────────────────────────────────────────────────────
#> 16 - quality (ordered, factor)
#>
#> length n NAs unique levels dupes
#> 1'209 1'008 201 3 3 y
#> 83.4% 16.6%
#>
#> level freq perc cumfreq cumperc
#> 1 low 156 15.5% 156 15.5%
#> 2 medium 356 35.3% 512 50.8%
#> 3 high 496 49.2% 1'008 100.0%
#>