The News Source Diversity Meter is a tool that analyses media companies’ content data, such as news archives and their metadata. It scans the text and identifies different sources and the information related to them (gender, job title, political leaning), and it provides simple lists and graphical presentations of who has been interviewed or cited in the media. The meter is used internally by media companies, and users only receive information about their own news data.

A demo version of the meter was developed for the Uutisraivaaja 2019 Media Innovation Challenge. At present, the meter identifies news sources (interviewees, cited person) in the data, describes the gender distribution of sources, and reveals which political parties have their voices heard. The results can be viewed by topic and by the day/duration of publication. Other search features are under development.


The News Source Diversity Meter is based on Natural Language Processing (NLP) technology. The text recognition in the demo version of the meter is based on the morphological analysis provided by libvoikko, a list of first names and last names from the Finnish Population Register Centre, and manually created analysis rules.


The service identifies interviewees and cited persons in the data and creates statistics about their characteristics. The meter identifies the genders of interviewees based on the first names, and it separates the job titles and political leanings associated with each name if they are mentioned in the text. The meter can categorise the results according to the day of publication or the topic of the article, which enables more precise searches to be performed. Searches can be performed on the entire dataset with one search term (such as who are interviewed or cited the most or how the genders of interviewees are distributed in the entire media archive) or as combinations of search terms (such as the distribution of sources representing the various political parties in articles about immigration in 2016). The search results are mainly anonymous.

Many of the meter’s search functions are still under development. Also the reliability of the results needs to be further developed.


The reading habits and behaviour of media consumers are closely monitored. Digital distribution channels have provided media companies with a wealth of information on what the public consumes, which articles hold people’s attention, and what they spend their time on. Information about the public’s behaviour also helps to build various recommendation algorithms.

It has been suggested that a similar leap forward in digital technology should also be made in the field of journalistic content analysis as a whole (suomenlehdisto.fi). As society becomes increasingly pluralistic, the diversity and pluralism of journalism in particular have become significant goals in terms of politics and the self-understanding of the media. However, conducting automated analyses of journalistic content has proven complicated.

A study commissioned by the Finnish Ministry of Communications in 2018 found that the available indicators of media diversity reveal the most about the number of media outlets and the diversity of media owners and content providers, as well as the diversity of media consumption for the reasons mentioned above. Conversely, there was no data for measuring the diversity of content, so, ultimately, the proposal was for qualitative indicators based on limited datasets. The same problem has plagued the EU-led Media Pluralism Monitor, which is tasked with evaluating the risks to media diversity in each country. MPM evaluations were developed from 2012 to 2014 and have been carried out since then (since 2015 in Finland), and it has been necessary to reduce the number of indicators focusing on journalistic content due to a lack of available data on several occasions.

However, some innovations have arisen as more journalistic data has become available. One good example is the development towards binary gender equality among people appearing in the media. The most traditional player in this field is the Global Media Monitoring Project (GMMP), carried out since 1995. The project monitors the share of women and men among the people appearing in the media worldwide. The researchers conduct the evaluation manually using carefully defined material. A few years ago, Prognosis, a Swedish innovation, automated this calculation using an Equality Bot, which reports the proportions of women and men in various online media outlets every day. The first gender meters introduced for media companies themselves hit the news in 2018 (hs.fi). Examples of the most advanced gender meters include the Gender Gap Tracker, a Canadian project based on the same technology as the News Source Diversity Meter. Another similar project is American Press Institute’s tool called Source Matters which supports automated, customizable source diversity tracking.

Tracking news sources is a very daunting task because there are so many variables and the data available is often not commensurate. A good overview of the complexity of the subject has been made by AIJO project. According to them, the diversity of sources can be approached either through photographs (like JanetBot by Financial Times), text (like our meter or Gender Gap Tracker) or data collected by the journalists themselves (like Dex tool by NPR). The fragmentation of the field means that everyone is working on the same challenge on their own – there are few providers of total commercial solutions (see, however, the Ceretai’s Diversity Dashboard). An individual question is how to integrate a good diversity tool as a part of the journalistic culture in different newsrooms. (For example Ringier Group has succeeded here with their EqualVoice project.)

The idea behind the News Source Diversity Meter was to find out who can make their voice heard in journalism. The idea is that by making the media’s source choices more visible, they become easier to develop. As such, it is not enough to know the proportion of women and men being interviewed. It is just as important – or even more so – to know which experts are invited to explain which topics, which bodies in society have their voices heard the most, and which political parties get to have a say on which issues. It is also relevant to distinguish between the weight attributed to sources’ words. For example, does a person appea as one of the sources in a minor story or as the only interviewee in a long-form article?

Journalism research has repedeatly shown that the sources journalists choose to quote in their stories has lot of influence on whose stories get told, how they are told and to who they are told. That’s why tracking sources is so important.

A meter based on machine analysis has only a limited ability to answer such questions. Interpretations and conclusions will remain the domain of researchers and media professionals. Quoting American Press Institute: “It’s not enough just to track source diversity. It also requires community listening, relationship building, time to develop new sources, training and coaching.”