There are many reasons why data journalism needs new scientific approaches, and we have discussed some of them at length. So far we haven’t talked much about the reverse claim, which is, however, equally true: this journalistic project also advances science.
So why are our scientists so passionate about the new/s/leak project? And what kind of scientific challenges do we face?
/S/cience on a Mission
All our scientists agree that it’s an invaluable experience to work on real use cases, solving real-world problems and collecting extensive feedback from real users.
Learning how journalists work and researching new ways to help them would be already exciting enough on its own – but investigative journalism is way more than just an intriguing application scenario: journalistic work has essential social impact, and our software will help to increase transparency not only for journalists, but also everyone out there who reads, watches or listens to their stories.
Genuine use cases come with genuine research challenges: for both the visual and the backend part of new/s/leak, we need to turn scientific prototypes into a scalable, user-friendly, big-data-proof application.
On the backend side, we thus need techniques to speed up data processing without sacrificing quality, for which we also need lots of engineering with new frameworks and tools.
Of course we also need a way to keep the actual user interface clean and responsive, regardless of the (possibly huge) amount of data behind it. This is a core challenge tackled from the visualization side.
Interaction design is a challenge for new/s/leak in many ways: First of all, we have our visualization scientists devoted to the challenge of finding a smart interface that allows for intuitive user interaction. On the backend side, we need to integrate user interactions into the language processing pipeline (see our Requirements Analysis), because we want to enable users to define entities.
We also need to create possibilities for the users to interact collaboratively within the newsroom.
And, of course, we need to design our own interaction process for the interdisciplinary development of frontend and backend, and we need to translate between journalists and scientists. As with almost all projects, the things that make new/s/leak more exciting also do bring more challenges.
Scientists like to measure success in reproducible numbers. For example, we could rate an algorithm for Entity Recognition by counting how many entities it recognizes correctly, using a text in which all entities were marked by human experts. This is great because you can compare different systems, and you can track the progress of your own approach. From a scientist’s point of view, we’d strive for such an evaluation strategy for new/s/leak, too – but it’s not all that easy.
We cannot just count how often new/s/leak shows something which is relevant, because one single fact (or even sentence) of a leak is hardly ever relevant on its own. We cannot reverse-engineer this problem either: an article based on a leak has no particular list of text snippets that constitutes all the information contained in the story. Rather than counting and comparing, we will we take an approach which our experts for graphical interactive systems also use regularly: we will (and have) conduct(ed) user studies, and then ask questions that allow us to quantify success without using exact measures. Like the whole project, the definition of success needs to be scientifically grounded, but entirely user-centric.
For new/s/leak’s scientists (and also the journalists, of course), this project will be successful if the software will help many users to discover information that matters. And all of us hope that the work on new/s/leak will be sustainably continued in follow-up projects.