24sata's Google-DNI-project: „A historic moment, also for STYRIA“

Together with the Styria.AI data science team from Trikoder and the support of Styria Digital Development, 24sata recently managed to go live with its Google Digital News Initiative project, showing it on mobile sites in the first step. Hrvoje Dorešić (24sata) and Dominik Šafarić (Styria.AI) tell us what’s behind the brand’s success story with optimised content-tagging, contextual recommendation and personalisation of content.


What’s the project all about – in just a few words?
Dominik Šafarić: Within the project, our data science team developed three advanced services based on machine learning technology: the first is the tag optimisation, which will ensure a standardised and optimised content-tagging process for all those who work in the production of media content for digital platforms. The platform will offer journalists the tag that best suits the assigned theme. This will greatly simplify both the tagging, as well as the searching, of desired content. The contextual recommendation system is the next service that will ensure, for all brands using the platform, the pertinent recommendation of content based on the content that the user is currently viewing and can thereby offer the user a unique and complete experience of the themes offered. The third progressive service, the personalisation of content, rounds out the idea of adapting content to the user based on their interests, as well as their behaviour on the platform itself, which is to say, the content that interests him.
Hrvoje Dorešić: Creating a personalised homepage for readers is something we’ve been doing experiments since 2015 with 24sata. We showed that there is some potential, then we applied to Google DNI – and we got it! It was quite a ride to build a whole that system which involves getting data from users, using that data to train machine learning models and presenting content every time a user lands on our homepage.


Focussing on personalisation – what’s your long-term goal?
Dorešić: Keep users coming to our website and finding relevant content there. Our main mission is to keep readers informed about things which they should know about and personalisation will help people to find articles which they’re interested in, with specialized stuff and customised content.


You went live with it a few weeks ago. Could you give us an insight?
Dorešić: Right now, we have five articles on the top of the 24sata website placed by editors, underneath there are ten put from the machine, the rest is a manual input again. We’re starting small here to measure conversions to these articles provided by data science. We will relaunch all the platforms from 24sata in 2020, mobile, desktop and applications. All of them will have personalised homepages and a personalised feed under the article level. So, we have a few months left to make this algorithm work the way we want it, to get better results so we can implement it on the new site.


How do journalists deal with that?
Dorešić: We have talked about it and it is a controversial topic. But what I can say: Machines have to learn quite a bit to be better than our editors. There are so many elements taking an impact on how you curate articles. But the end result of our project is a better experience and more time for editors to do journalism.


So what about the technical challenges?
Šafarić: The fundamental challenge was creating machine learning models that produce accurate recommendations - keyword, contextual and personalized article recommendations. Designing the models' architecture, actually training and validating the correctness of their models' recommendations were among the greatest challenges. From an engineering perspective, we had to create systems that are highly scalable, performant and fault tolerant. The result is that the systems process roughly 800 million user clickstream events monthly and provide the user with personalized article recommendations in 60 milliseconds on average!


Are there any plans to use the project for other STYRIA brands? 
Dorešić: If this works fine, I’m sure other brands will be able to use this technology. It’s ours. 


Is it actually unique?
Dorešić: We’re using the same kind of recommendation that works with Netflix, YouTube or others. It’s the first time that we managed to close the circle. Getting the data, making a personalization model, train the model, put it into production for users so they can see it. And it‘s definitely the first time that in the region somebody works with personalised homepages. This is a historic moment, also for STYRIA. I’m sure Data science and machine learning will play an essential role in journalism and will become more important in the years to come. 

Last question: How do you personally feel now, after it went live?
Dorešić: We experienced that data science is something new and exciting. We had tremendous help from the development team from SDD to make this work. There were challenges all along the way but finally we achieved it in time and we’re very happy. So, we and STYRIA can be proud.
Šafarić: When starting the project, we were aware of the various challenges that might endanger the success of this project. But our talented team has once again managed to push the boundaries of seen and we are extremely proud!


Picuture (c) Igor Kralj/PIXSELL: Antonio Šarabok, Vedran Vekić, Matko Vrbanec, Marko Pranjić, Nikola Tucković, Josip Kaurinović (Standing); Marija Paić, Petra Rebernjak, Dominik Šafarić, Roman Kosanović, Davor Škalec, Hrvoje Dorešić, Nikolina Vidonis (Sitting, all from left to right)