This function provide a report showing all outlier values for each numerical fields. The function will try to automatically determine the type of distribution (between Normal and Log-Normal) based on the difference between mean and median between untransformed normalized and log transformed normalized distribution.

surveyOutliers(
  ds = NULL,
  enumeratorID = NULL,
  sdval = 2,
  reportingColumns = c(enumeratorID, uniqueID),
  enumeratorCheck = FALSE
)

Arguments

ds

dataset containing the survey (from kobo): data.frame

enumeratorID

name of the field where the enumerator ID is stored: string

sdval

(Optional, by default set to 2) number of standard deviation for which the data within is considered as acceptable: integer

reportingColumns

(Optional, by default it is built from the enumeratorID and the UniqueID) name of the columns from the dataset you want in the result: list of string (c('col1','col2',...))

enumeratorCheck

(Optional, by default set to FALSE) specify if the report has to be displayed for each enumerator or not: boolean (TRUE/FALSE)

uniqueID

name of the field where the survey unique ID is stored: string

Value

dst same dataset as the inputed one but with survey marked for deletion if errors are found and delete=TRUE (or NULL)

ret_log list of the errors found (or NULL)

var a list of value (or NULL)

graph graphical representation of the results (or NULL)

Author

Yannick Pascaud

Examples

{ ds <- HighFrequencyChecks::sample_dataset enumeratorID <- "enumerator_id" uniqueID <- "X_uuid" reportingColumns <- c(enumeratorID, uniqueID) sdval<-2 list[dst,ret_log,var,graph] <- surveyOutliers(ds=ds, enumeratorID=enumeratorID, sdval=sdval, reportingColumns=reportingColumns, enumeratorCheck=FALSE) head(ret_log,10) }
#> enumerator_id X_uuid values #> 1 30003 ea2e587b-dd65-4d9c-9f0d-ca8cc3f877fd 3.021103 #> 2 96 31ca6635-07e8-44e8-afcb-e42928440931 2.048250 #> 3 13 ab916698-4dab-4919-ad70-3cd8390da3c6 2.048250 #> 4 13 4faffb4a-8e8c-4417-a8d0-aa4c36b1775d 2.241889 #> 5 91 9dfa9c28-714b-4efc-93f3-4f9fee3f88fe 2.423026 #> 6 42 25c51488-2ae2-4d9c-9bf8-653498c7e689 2.241889 #> 7 30007 a0724b86-fd95-4ab1-b4e3-77dd1157fdca 2.048250 #> 8 102 a8c524d3-6ae7-4018-9972-ed66fe04f397 2.992609 #> 9 10038 6b5d4b42-0c76-4917-a932-040f10fd7b44 2.048250 #> 10 18 3e5ede43-369c-40f6-b59b-c73a5525a81d -2.340188 #> ind DistributionType #> 1 consent_received.respondent_info.respondent_age LogNormal #> 2 consent_received.respondent_info.respondent_age LogNormal #> 3 consent_received.respondent_info.respondent_age LogNormal #> 4 consent_received.respondent_info.respondent_age LogNormal #> 5 consent_received.respondent_info.respondent_age LogNormal #> 6 consent_received.respondent_info.respondent_age LogNormal #> 7 consent_received.respondent_info.respondent_age LogNormal #> 8 consent_received.respondent_info.respondent_age LogNormal #> 9 consent_received.respondent_info.respondent_age LogNormal #> 10 consent_received.respondent_info.hh_size LogNormal