Configure forms can be done both through the Kobobuilder interface or within Excel. using kobobuilder comes with Risk when the form is complex.

The second options is preferred as it allows for more controls.


The form needs to be defined using the xlsform format. XLSFORM is a form standard created to help simplify the authoring of forms in Excel. Authoring is done in a human readable format using a familiar tool that almost everyone knows - Excel. XLSFORMs provide a practical standard for sharing and collaborating on authoring forms. They are simple to get started with but allow for the authoring of complex forms by someone familiar with the syntax described below.

The XLSFORM is then converted to an X-Form, a popular open form standard, that allows you to author a form with complex functionality like skip logic in a consistent way across a number of web and mobile data collection platforms.

The documentation on xlsform is here. The same format is actually used by multiple data collection technological platform such as kobotoolbox(the platform used by UNHCR on, OpenDataKit or ONA.

Some tips

Good questionnaire design can save hours at latter stage for the person that will need to analyse the data…. do invest enough time for this step and allow for peer review before starting the testing phase !!

Manage your xls form

  • Make sure your file is saved in the .xls format and contains no spaces or special characters (‘‐’ and ‘_’ are allowed).

  • Make sure that your column headers are in lowercase (i.e. “label” or “name”, not “Label” or “Name”)

  • Make sure that your sheet names are appropriately named (i.e. “survey” not “Sheet 1”, “Survey” or “surveys”)

  • Make sure that the question names are unique and do not contain spaces or special characters (‘‐’ and ‘_’ are allowed).

  • By the end of the coding sessions, there will be several versions of the MS Xls files. It is crucial to separate each according to versions and also to make sure the names do not in clude any spaces or begin with numbers or special characters.

Variable Encoding

  • The “name” column must be within 32 characters, cannot begin with a number or any special character and cannot contain any spaces in between; otherwise ODK Validate will issue an error. The longer your name is and the more time you will have to spend searching for what that specific variable or choice is in your database. Also, Kobo doesn’t accept answer choice names that contain more than 32 characters and question/ variable names no longer than 30 characters

  • Please use a short meaningfull name for your variable rather than a code like Q1 or Q2. The name should mean something to you and should be clear as to what it refers to.

  • Having a consistent and harmonised naming system throughout your survey, with descriptive/meaningful and easy to remember names for your variables/questions and choices, will allow you to quickly find the variable(s) to be modified. On the contrary, should you have a non-harmonised naming system with names that are non-descriptive, it will take you much more time than expected to find the variables of interest and to modify them.

  • In order to anticipate the analysis with a statistical package like koboloadeR, limit variable names lenght to 12 characters and do not start a variable name with a number.

  • Avoid underscores («_») in the technical name of the variable. It can sometimes be recognised as a «separator» by certain statistical packages. Do not hesitate to rather use the «camelCase» (consists on writing a group of words in lowercase with the first letter of the lined words in uppercase).

  • Name should not contain any spaces, accentuation (“à è ô”) or special characters (“+”, “%”, “&”,“(“, “/”, etc.).

  • Names should not start with a number: You can use numbers in your names, but rather at the end or in the middle of your name so that it is instantly clearer as to what information it refers to.

More tips on variable name encoding are available here

Labeling questions

  • Numbering questions does not provide added value – during data entry enumerator are forced to respect the order any way. In addition, if you start numbering questions, any change in the questionnaire will then require to renumber all questions. It’s a good example of something that was important when people were using paper form but has no reason to be used anymore when moving to digital data collection.

  • Quick translation can be done with Google translate. Obviously the resulting translation needs to be carefully reviewed but it save the time to retype the whole text and does not prevent from the translation review described here. It’s also a good way to test the simplicity of the sentence used for the questions: if the artificial intelligence from Google translate can not decipher the questions, it means that it will be also probably difficult for the respondant. One test to confirm the question phrasing can also to translate and re-translate back in the original language in order to see if the meaning of the question is preserved.

  • When there’s a question like first reason, second reason, third reason or first source of income, second soure of income, third source of income, the bad practise is actually to have 3 separate select_one questions: aka first reason - then second reason, then thirst reason – First, from a cognitive point of view, respondents will have a hard time to rank among more than 5 options, second when recorded through 3 distinct variables, data will need a lot of coding to reshape the data in way where the analysis will make sense… – here the 2 options are either:

    • select_multiple with a constraint count on the number of modality you can select “not(count-selected(.) < 4” — often it’s simpler as very likely the ranking between options will be difficult to analyse

    • or go for a rating likert questions for each options – and then the ranking will be performed during the analysis through the sum of the rates for each respondent.

Questionnaire structure

  • All “begin group” types must have a corresponding “end group”. It is advised to name each of the begin group and end group “name” as well as “label” columns for easy identification of where a group starts and ends, e.g. "type– begingroup. Name – grp_agri_aman. Label – Aman Rice. T ype – end group. Name – endgrp_agri_aman. Label – no labels are required as this won’t show up in the form.

  • There can be a mother group with several child groups within, however each group essentially works like brackets which open, can remain open, close or close at the end. It can get rather complicated once several questions within a group are to be skipped.

  • DO NOT LEAVE A SPACE AFTER your “type” or “name” variables, e.g. “end group”. It is an invisible error that will cause the form to malfunction.

  • It is a good practice to keep naming the questions “label” as “Name of Household Head… Age of Respondent…Please specify what you mean by”Other“…and so on”. The same can be done for “name” variables but one must remember that they cannot start with an uppercase letter, e.g. “name_hhh” is allowed.

Make your questionnaire smart

  • Use hints as much as possible - a good example is to recall to your enumerator how to proceed with select question for instance two options are possible :
    • Read the list and select all (or only the right) answers
    • Do not read the list - wait for the answer to select the right answer(s)
  • The “note” and “calculate” type of questions can be really handy and must be used whenever there is a reason to check back with data entered previously. E.g. “type – calculate, name – calc_loan, calculation - ${loan_1}+{loan_2}+{loan_3}” with "type – note, name – chk_total_loan , label – Your breakdown of total loans ${loan_1} and ${loan_2} and ${loan_3} add up to ${calc_loan}. Please make sure they are equal to the amount entered before as total loan ${totalloan}.

  • Systematically add constraint to your numeric questions: age can not be negative or bigger than 120. Add also constraint on linked values: for instance if you know the number of children in a family, the number of boys can not be bigger than the total number of children. Do not forget to add constraint_message to help your enumerator to fix the issue!

  • When you do a calculation, beware of zero : To convert empty values to zero, use either the coalesce() function or the if() function. See the form logic functions reference below. "+ ‐ *" correspond to add, subtract, and multiply. Division however, is special, and you need to use the word “div” to do division.

  • The best types of constraint and relevance are usually the simplest. When typing the binding guidelines ( - design/binding/) can simplify the entire line of code. Also this is the most interesting and perhaps challenging part of writing questionnaires. Users are free to develop their own lines of code and put them to the test. For any help the Google ODK groups are more than sufficient.

  • If the questions need to be translated into a local language, the input must be in Unicode. Also it is imperative to make sure that questions in the base language are translated with the meaning intact. Difference in interpretation can cause severe errors in the quality of the data.

  • Systematically add the metadata field: type: ‘start’, ‘end’, ‘today’, ‘deviceid’, ‘imei’, ‘phonenumber’

  • To record admin level, use cascading drop down lists. Note that this does not work well whem more than 500 options. In this case, one option is to use select_external : but this will work only if data collection is done with Android client (not the web client).

  • Record correctly registration case number using a regex constraint where caret is the ASCII special character:
    • for case number: “regex(.,‘caret[0-9]{3}-[0-9]{2}[C][0-9]{5}$’)
    • and for Individual number “regex(.,‘caret[0-9]{3}-[0-9]{2}[I][0-9]{5}$’)
  • Record case number using bar code scanner whenever possible - rather than manual data entry.

  • Add constraint on email validation through the same regex: “regex(., ‘[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,4}’)”. This can be more specific if a specific email domain is expected.

  • Calculate years from date (age from date of birth) “round((decimal-date-time(today) − decimal − date − time({dob})) div 365.24, 0)”.

  • Validate past date (constraint) “decimal-date-time(.)<decimal-date-time(${today})”.

  • Use “or_other” whenever you have a select_one or select_multiple questions that may have additional modalities that you can not anticipate

  • If you have a select_multiple question with one modalites being “None of the options mentionned here”, use a contraint to ensure that respondant will not select this options with other: “not(count-selected(.) > 1 and selected(., ‘no’))

  • Review the appearance options : Beware some options (like ) are only for web client while other (like “quick”) are only for the android client.

Keep your xlsform legible

  • It is possible to separate different sections of the questionnaire using different cell colours. It does not make a difference for ODK, but makes the form much more navigable.

  • In the same idea, in Excel, freeze the top row cell and add automatic filters to quickly navigate through your questions (survey) and answers (choices).

  • In the choices sheet, a gap or (blank row) can be left after each set of answers. Again, it doe s not make a difference for ODK but makes it much easier for navigation.

Validate & Deploy the form

  • kobotoolbox has some built-in validation check & debugging check when a xlsform is uploaded. Sometimes better validation / debugging support can be obtained by other online validator such as

  • For review purpose, you may find usefull to have a beautifull transcript in word of your xlsfrom using PPP for Open Data Kit

  • Note for field deployment: if possible, users must try and keep a copy/image of the entire " ODK " folder inside the memory of the tablet and save it elsewhere for later reference. If planning requires users to clear out " ODK " and re insert a questionnaire for easy tracking, then this step becomes very a vital safe guard.

  • one point is also that as for dataviz, good questionnaire are the result of multiple persons looking at it – hence the importance of peer review – peer review is different from testing - it requires people with expertise in questionnaire design to review - this is among the support than can be requestioned from regional offices.

Support Forum

When faced with specific challenges, multiple online forums can be consulted (as xlsform is used on different platform):