Marcel Das is the director of CentERdata and professor of Econometrics and Data Collection at the department of Econometrics and OR.
In my previous column I introduced myself and referred to the enormous potential of Econometrics. There is a huge variety of applications in which econometric techniques help to find optimal solutions or answers to (scientific) questions. However, in many applied research projects econometric tools are not the only requirement to find the answers. Something else is needed as well: data.
There are many different kind of data sources used by econometricians: sales data, data from the stock market, data from surveys, data from experiments, register data, etc. The term big data in particular is very popular nowadays. The more traditional researcher has to get used to this new phenomenon. The standard set-up of a research project is to first come up with research questions and hypotheses, followed by a search for suitable data. Once a model is developed and estimated on the basis of the data, conclusions can be drawn and the initial questions can be answered. With big data this process seems to be reversed entirely. One starts with (big) data and from this data the research questions are formulated. Although I really see the potential of big and unstructured data I have to get used to this idea.
Back to the more traditional types of data. You may have learned that it is important to first look at the quality of the dataset, before you start using it to estimate your econometric model. Take this seriously. Make sure the data are useful for your particular project, and in case the dataset has some imperfections try to correct or deal with it. And do not underestimate the efforts that have been taken to collect (new) data. Data collection is not a straightforward activity. There are numerous things that need careful attention, like the way in which the questions are formulated, presented or asked. And when setting up an experiment it may be a good idea to first run a pilot study to make sure all elements of the experiment are optimized.
In your search for high quality data you may be confronted with a less pleasant surprise: high quality data come with a high price. In particular when you want to collect new data with a survey or experiment using ‘real’ respondents you need a budget which is unreasonably high for a student. Even a much smaller budget for using administrative data sources from Statistics Netherlands is unaffordable for a student. However, there is (at least) one rich data source that is available for the academic community free of charge, including for you as a student: data from the LISS panel.
The LISS panel is administered by CentERdata (Tilburg University) and runs since 2007. The panel consists of 5,000 households, comprising 7,500 individuals. It is based on a true probability sample of households drawn from the population register by Statistics Netherlands. Households that could not otherwise participate are provided with a computer and Internet connection. Panel members complete an online questionnaire every month, lasting about 15 to 30 minutes in total. They are paid for each completed questionnaire. Part of the interview time available in the LISS panel is reserved for the LISS Core Study. This longitudinal study is repeated yearly and is designed to follow changes in the life course and living conditions of the panel members. It includes data from the following topics: health, politics and values, religion and ethnicity, social integration and leisure, family and household, work and schooling, personality, and the economic situation (assets, income, and housing). Next to the core study many other projects were (and are) run in the LISS panel, with a huge variety of topics. All data ever collected are available to you. Again: free of charge.
Looking for data for any scientific project, bachelor or master’s thesis? Have a look at what’s available in the LISS Data Archive: https://www.dataarchive.lissdata.nl/ and be positively surprised.
Text by: Marcel Das