New/s/leak at WissensWerte 2018

We present new/s/leak at a panel discussion at¬†WissensWerte 2018, Germany’s most important dialogue forum for science journalists. On November 20, 2018 together with panelists from journalism, IT startups, and other universities, we will discuss how artificial intelligence contributes to journalistic work. In case of new/s/leak, we employ machine learning to automatically extract relevant information such as named entities and keywords from texts. This enables us to create interactive comprehensive visualizations of large text collections which contribute to a fast exploration for investigative purposes.

The session description as well as the full conference program can be found here.

Paper accepted at SocInfo 2018 conference in St. Petersburg

Newsleak will be presented at the Social Informatics conference 2018 which takes place from 25-28th of September in St. Petersburg, Russia. The conference paper is published in LNCS series of Springer (here). A preprint can be found here.

Abstract: Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society. Faced with these challenges, journalists are increasingly cooperating in international networks. To support such collaborations, we present the new version of new/s/leak 2.0, our open-source software for content-based searching of leaks. It includes three novel main features: 1) automatic language detection and language-dependent information extraction for 40 languages, 2) entity and keyword visualization for efficient exploration, and 3) decentral deployment for analysis of confidential data from various formats. We illustrate the new analysis capabilities with an exemplary case study.

Newsleak 2.0 pre-release software demo

Since the first version of Newsleak, a lot has been improved behind the scenes as well as in the front-end of the software. We want to encourage journalists, to try out a pre-release of Newsleak 2.0 on their own. For this, we provide a software demonstration. This demo is populated with ca. 26,500 documents collected from Wikipedia in four languages (English, German, Hungarian and Spanish) and mostly centered on the topic of World War II. The idea behind this demo is to show you the analysis capabilities to quickly explore a large, multilingual collection.

For lazy clickers, we provide a Youtube video where you can follow a proceeding of an exploratory analysis and filtering process drilling down to some details of inner-Chinese political tensions during WW2.

Presentation at #EIJC18 & Dataharvest conference

This Saturday, we present new/s/leak at the European investigative journalism conference (EIJC). Here you can find the slides of our presentation about “Information Extraction and Visualisation for Investigative Journalism”.

If you are interested to try new/s/leak with your own data, visit the Github page containing the Docker setup of our application.

In June, we will publish a detailed blog post on how to setup Hoover and Newsleak to analyze collections on your own machines.

Dataharvest Conference #EIJC18

From Thursday 24 to Sunday 27 May 2018, the EIJC 2018 conference (European Investigative Journalism Conference) will take place in Michelen (Belgium). We as newsleak project will participate and discuss requirements and needs of our targeted user group. All about the conference you can find out on this website:

Funding extension

We are happy to announce that the new/s/leak project receives some additional funding from the Volkswagen Stiftung. Until summer 2018, new/s/leak will be extended and refactored to achieve the following goals:

  • easy deployment for own usage
  • comprehensive and detailed documentation
  • improved user interface
  • improved information extraction (better keyterm extraction, named entity recognition, support of user dictionaries)
  • support for multiple languages (among others english, german, spanish, french, arabic, chinese)

Follow the updates on this blog to see how far we got ūüôā


new/s/leak @ VIP

Last week, new/s/leak had its academic debut in the visualization science community at the Visualization in Practice Workshop, co-located with the IEEE VIS 2016 conference.

Here is the paper¬†documenting the software with a focus on visualization. Needless to say that it’s always fun to present new/s/leak and get more feedback:

Kathrin presenting new/s/leak

Thanks to everyone who came and visited us!


Paper accepted @ VIS 2016

Our Paper “new\s\leak — A Tool for Visual Exploration of Large Text Document Collections in the Journalistic Domain” has been accepted for presentation at the poster session of the Visualization in Practice Workshop, which is part of the IEEE VIS 2016 conference. The workshop will take place in¬†Baltimore Maryland, USA on October 24-25.

VIS is one of the most important conferences in visualization science. new/s/leak fits perfectly in this year’s VIP workshop, the focus of which is design, development, distribution, and application of open source¬†visualization and visual analytics software.

Meet us at the demo session in Baltimore!