This function provide a report showing all outlier values for each numerical fields. The function will try to automatically determine the type of distribution (between Normal and Log-Normal) based on the difference between mean and median between untransformed normalized and log transformed normalized distribution.
surveyOutliers( ds = NULL, enumeratorID = NULL, sdval = 2, reportingColumns = c(enumeratorID, uniqueID), enumeratorCheck = FALSE )
ds | dataset containing the survey (from kobo): data.frame |
---|---|
enumeratorID | name of the field where the enumerator ID is stored: string |
sdval | (Optional, by default set to 2) number of standard deviation for which the data within is considered as acceptable: integer |
reportingColumns | (Optional, by default it is built from the enumeratorID and the UniqueID) name of the columns from the dataset you want in the result: list of string (c('col1','col2',...)) |
enumeratorCheck | (Optional, by default set to FALSE) specify if the report has to be displayed for each enumerator or not: boolean (TRUE/FALSE) |
uniqueID | name of the field where the survey unique ID is stored: string |
dst same dataset as the inputed one but with survey marked for deletion if errors are found and delete=TRUE (or NULL)
ret_log list of the errors found (or NULL)
var a list of value (or NULL)
graph graphical representation of the results (or NULL)
Yannick Pascaud
{ ds <- HighFrequencyChecks::sample_dataset enumeratorID <- "enumerator_id" uniqueID <- "X_uuid" reportingColumns <- c(enumeratorID, uniqueID) sdval<-2 list[dst,ret_log,var,graph] <- surveyOutliers(ds=ds, enumeratorID=enumeratorID, sdval=sdval, reportingColumns=reportingColumns, enumeratorCheck=FALSE) head(ret_log,10) }#> enumerator_id X_uuid values #> 1 30003 ea2e587b-dd65-4d9c-9f0d-ca8cc3f877fd 3.021103 #> 2 96 31ca6635-07e8-44e8-afcb-e42928440931 2.048250 #> 3 13 ab916698-4dab-4919-ad70-3cd8390da3c6 2.048250 #> 4 13 4faffb4a-8e8c-4417-a8d0-aa4c36b1775d 2.241889 #> 5 91 9dfa9c28-714b-4efc-93f3-4f9fee3f88fe 2.423026 #> 6 42 25c51488-2ae2-4d9c-9bf8-653498c7e689 2.241889 #> 7 30007 a0724b86-fd95-4ab1-b4e3-77dd1157fdca 2.048250 #> 8 102 a8c524d3-6ae7-4018-9972-ed66fe04f397 2.992609 #> 9 10038 6b5d4b42-0c76-4917-a932-040f10fd7b44 2.048250 #> 10 18 3e5ede43-369c-40f6-b59b-c73a5525a81d -2.340188 #> ind DistributionType #> 1 consent_received.respondent_info.respondent_age LogNormal #> 2 consent_received.respondent_info.respondent_age LogNormal #> 3 consent_received.respondent_info.respondent_age LogNormal #> 4 consent_received.respondent_info.respondent_age LogNormal #> 5 consent_received.respondent_info.respondent_age LogNormal #> 6 consent_received.respondent_info.respondent_age LogNormal #> 7 consent_received.respondent_info.respondent_age LogNormal #> 8 consent_received.respondent_info.respondent_age LogNormal #> 9 consent_received.respondent_info.respondent_age LogNormal #> 10 consent_received.respondent_info.hh_size LogNormal