KnnImputation.Rd
Function that fills in all NA values using the k Nearest Neighbours of each case with NA values. By default it uses the values of the neighbours and obtains an weighted (by the distance to the case) average of their values to fill in the unknows. If meth='median' it uses the median/most frequent value, instead.
ImputeKnn(data, k = 10, scale = TRUE, meth = "weighAvg", distData = NULL)
A data frame with the data set
The number of nearest neighbours to use (defaults to 10)
Boolean setting if the data should be scale before finding the nearest neighbours (defaults to TRUE)
String indicating the method used to calculate the value to fill in each NA. Available values are 'median' or 'weighAvg' (the default).
Optionally you may sepecify here a data frame containing the data set
that should be used to find the neighbours. This is usefull when
filling in NA values on a test set, where you should use only
information from the training set. This defaults to NULL, which means
that the neighbours will be searched in data
This function uses the k-nearest neighbours to fill in the unknown (NA) values in a data set. For each case with any NA value it will search for its k most similar cases and use the values of these cases to fill in the unknowns.
If meth='median'
the function will use either the median (in
case of numeric variables) or the most frequent value (in case of
factors), of the neighbours to fill in the NAs. If
meth='weighAvg'
the function will use a weighted average of the
values of the neighbours. The weights are given by exp(-dist(k,x)
where dist(k,x)
is the euclidean distance between the case with
NAs (x) and the neighbour k.
A data frame without NA values
Torgo, L. (2010) Data Mining using R: learning with case studies, CRC Press (ISBN: 9781439810187).
cleanPizza <- ImputeKnn(d.pizza[, -2]) # no dates allowed
summary(cleanPizza)
#> index week weekday area
#> Min. : 1 Min. : 9.00 Min. :1.000 Brent :480
#> 1st Qu.: 303 1st Qu.:10.00 1st Qu.:3.000 Camden :346
#> Median : 605 Median :11.00 Median :5.000 Westminster:383
#> Mean : 605 Mean :11.41 Mean :4.443
#> 3rd Qu.: 907 3rd Qu.:13.00 3rd Qu.:6.000
#> Max. :1209 Max. :14.00 Max. :7.000
#>
#> count rabate price operator
#> Min. :1.000 Length:1209 Min. : 8.792 Allanah:370
#> 1st Qu.:2.000 Class :character 1st Qu.: 31.176 Maria :391
#> Median :3.000 Mode :character Median : 46.764 Rhonda :448
#> Mean :3.445 Mean : 48.734
#> 3rd Qu.:4.000 3rd Qu.: 62.955
#> Max. :8.000 Max. :134.334
#>
#> driver delivery_min temperature wine_ordered
#> Butcher : 96 Min. : 8.80 Min. :19.30 Min. :0.0000
#> Carpenter:275 1st Qu.:17.40 1st Qu.:42.20 1st Qu.:0.0000
#> Carter :235 Median :24.40 Median :49.80 Median :0.0000
#> Farmer :118 Mean :25.65 Mean :47.84 Mean :0.1561
#> Hunter :156 3rd Qu.:32.50 3rd Qu.:55.30 3rd Qu.:0.0000
#> Miller :125 Max. :65.60 Max. :64.80 Max. :1.0000
#> Taylor :204
#> wine_delivered wrongpizza quality
#> Min. :0.0000 Length:1209 low :229
#> 1st Qu.:0.0000 Class :character medium:458
#> Median :0.0000 Mode :character high :522
#> Mean :0.1361
#> 3rd Qu.:0.0000
#> Max. :1.0000
#>