Author Archives: Gregor Wiedemann

New/s/leak at WissensWerte 2018

We present new/s/leak at a panel discussion at WissensWerte 2018, Germany’s most important dialogue forum for science journalists. On November 20, 2018 together with panelists from journalism, IT startups, and other universities, we will discuss how artificial intelligence contributes to journalistic work. In case of new/s/leak, we employ machine learning to automatically extract relevant information such as named entities and keywords from texts. This enables us to create interactive comprehensive visualizations of large text collections which contribute to a fast exploration for investigative purposes.

The session description as well as the full conference program can be found here.

Paper accepted at SocInfo 2018 conference in St. Petersburg

Newsleak will be presented at the Social Informatics conference 2018 which takes place from 25-28th of September in St. Petersburg, Russia. The conference paper is published in LNCS series of Springer (here). A preprint can be found here.

Abstract: Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society. Faced with these challenges, journalists are increasingly cooperating in international networks. To support such collaborations, we present the new version of new/s/leak 2.0, our open-source software for content-based searching of leaks. It includes three novel main features: 1) automatic language detection and language-dependent information extraction for 40 languages, 2) entity and keyword visualization for efficient exploration, and 3) decentral deployment for analysis of confidential data from various formats. We illustrate the new analysis capabilities with an exemplary case study.

Newsleak 2.0 pre-release software demo

Since the first version of Newsleak, a lot has been improved behind the scenes as well as in the front-end of the software. We want to encourage journalists, to try out a pre-release of Newsleak 2.0 on their own. For this, we provide a software demonstration. This demo is populated with ca. 26,500 documents collected from Wikipedia in four languages (English, German, Hungarian and Spanish) and mostly centered on the topic of World War II. The idea behind this demo is to show you the analysis capabilities to quickly explore a large, multilingual collection.

For lazy clickers, we provide a Youtube video where you can follow a proceeding of an exploratory analysis and filtering process drilling down to some details of inner-Chinese political tensions during WW2.

Presentation at #EIJC18 & Dataharvest conference

This Saturday, we present new/s/leak at the European investigative journalism conference (EIJC). Here you can find the slides of our presentation about “Information Extraction and Visualisation for Investigative Journalism”.

If you are interested to try new/s/leak with your own data, visit the Github page containing the Docker setup of our application.

In June, we will publish a detailed blog post on how to setup Hoover and Newsleak to analyze collections on your own machines.

Dataharvest Conference #EIJC18

From Thursday 24 to Sunday 27 May 2018, the EIJC 2018 conference (European Investigative Journalism Conference) will take place in Michelen (Belgium). We as newsleak project will participate and discuss requirements and needs of our targeted user group. All about the conference you can find out on this website:

Funding extension

We are happy to announce that the new/s/leak project receives some additional funding from the Volkswagen Stiftung. Until summer 2018, new/s/leak will be extended and refactored to achieve the following goals:

  • easy deployment for own usage
  • comprehensive and detailed documentation
  • improved user interface
  • improved information extraction (better keyterm extraction, named entity recognition, support of user dictionaries)
  • support for multiple languages (among others english, german, spanish, french, arabic, chinese)

Follow the updates on this blog to see how far we got 🙂