Statistic Denmark’s New Sandbox: The Data Science Lab
Sensitive data used in research is stored in a secure section of Statistics Denmark, which also works with ‘open science’, where researchers do not believe that the datasets they have generated are theirs.
The vision is big with the new Data Science Lab in Statistics Denmark, DST. It is about using new data sources and methods to produce more up-to-date statistics that better describe the society we live in.
“In many ways, we make statistics as we have done for 150 years. We mainly use numbers. We have started to analyse text and images, but in the digitised world, there are many more possibilities,” says Laust Hvas Mortensen, professor of epidemiology and head of DST’s Data Science Lab.
Today, researchers can access data for use in research through DST’s Research Service. But there’s a lot of data that either hasn’t been used before or is very sensitive, and those projects move into the Data Science Lab, where data stays within the thick firewall of the building under extra security scrutiny. It could be pathology samples, images of tissues and eyes, or location data. They are more sensitive and legally more complicated. So they need to be ‘handheld’ and given extra protection before researchers can access them more widely via the Research Service.
“In the Data Science Lab, we practice working with the more sensitive and new weird data,” explains Laust Hvas Mortensen. “We don’t want to risk exposing some people’s data. Citizens need to be able to trust that their data won’t get out and be used for something it shouldn’t be used for.”
In other words, Danes’ personal data in the DST should never be used as, say, a sales object, or to make decisions on welfare benefits or tax issues, or to be denied insurance. One case that still haunts researchers, who take privacy seriously, is that of the Serum Institute, where data from pregnant women ended up in the US and was commercialised – contrary to what the women expected, when they donated their data.
“We are extremely careful that data does not get out and harm citizens,” says Laust Hvas Mortensen.
All of the Novo Nordisk Foundation’s Challenge projects are located in the Data Science Lab. So is a large project on social networks, as well as a number of EU projects.
Open Science
One of Laust Hvas Mortensen’s great hopes is open science.
“Data sets that might be of value to other researchers are stored and shared with other researchers. We don’t sit on our datasets and capitalise on them. Some researchers simply think they own the datasets they have generated and see it as a competitive advantage to sit on them. We’d like to stop that.”
Today, researchers typically share their results with the outside world. But many don’t share the enriched datasets they’ve created in the process.
“But if these datasets are really based on everyone’s data it’s not actually the researchers’ data,” says Laust Hvas Mortensen. “It’s all our data.”
Photo: Branislav Nenin
All very good, but there is a human aspect that should not be forgotten! If a researcher has spent years of efforts in creating a dataset, as it often happens, but haven’t finalized the exploration of the research ideas conceived, which by it self may take years, the request for open access to these data should be carefully coordinated with the researcher who created them to prevent that the research opportunities are taken out of the hands of this researcher. Moreover, the use of the data by outside researchers should imply that researcher who have created them are offered involvement in the new use of the data! If such mutual respect is disregarded, it leads to very counterproductive relationships, and hence less good science than could have been produced.