Google's logo, all blurry. From the Sydney office.
Mitchell Luo

In focus: the random Google News audio downloads

· By James Cridland · 6.8 minutes to read

At the beginning of March, Podnews noticed a high number of podcast downloads for our own podcast, Podnews Daily. Episode downloads, normally around 2,500, suddenly increased to around 40,000 downloads.

In a personal blog post, I wrote up what was going on, in quite tedious technical detail. Examining the traffic led me to the conclusion that it was Google News’s audio briefings - “Hey, Google, play the latest news” - but they’re only coming from Indonesia and Malaysia, and only using an Android phone’s generic Dalvik user-agent (similar to Apple’s CoreAudio, it’s the default user-agent).

From more than 32,000 subscribers to the Podnews newsletter, there are 67 subscribers in Indonesia, and 48 subscribers in Malaysia. So, why are, seemingly, almost 1 million people downloading our podcast in those countries?

Examining the traffic, it appears clear to me that these are automated downloads with a constant amount of downloads per hour, and Saturday and Sunday seeming to show no slowdown in terms of total downloads (even though I never publish a new episode on those days).

At the end of March, I ended with 1.1mn downloads of the show; and a hosting bill that was $200 higher than usual, in spite of mitigating the effect by lowering the audio quality of this file down to 32kbps.

But is anyone else seeing odd behaviour from Google News?

We’re not the only one

Sharon Taylor, EVP of Podcast and Content Delivery at Triton Digital, has also seen some strange behaviour. She tells Podnews:

Beginning in June/July last year, we began noticing some abnormal traffic patterns with the generic Dalvik User Agent that caused an increase in downloads for a handful of news podcasts for publishers in their core geographic region. We immediately kicked off in-depth investigations across our measurement and IVT teams, and found that the downloads were spread across a surge of unique IPUA combinations on a variety of mobile devices – many of which had accessed the content prior to the spike. However, the volume of consumption for these IAB defined listeners had surged substantially.

The large number of downloads occur with new episodes only and are spread across a large number of listeners, at different times of day, and in different regions of the country. These patterns are similar to the patterns we saw prior to the download spike, only in much larger numbers. This is why we have not labelled the activity as invalid traffic.

In our extensive analysis and testing process, Google News Assistant emerged as the likely source for these downloads. We informed the Google News team of this finding and they are conducting their own investigation into the matter.

We are grateful for Google’s continuing efforts to investigate this matter with us. We welcome additional insights from them into how their tech stack has changed in regards to their podcast delivery process, how that may be impacting Google News Feeds, and the scenarios in which a user would (or would not) get a podcast download of the latest episode – to gain clarity for us and the podcast industry at large.

While the investigation is ongoing, due to our role in the industry as a measurement provider, we have been removing the above traffic from the publishers affected in our public rankers until we have certainty on the cause from Google. In certain regions, we are working with the joint industry body or committee on a market-specific approach which may deviate from this.

We are committed to remaining the trusted source of podcast downloads for advertisers and the wider industry, which is why a log-based approach paired with machine learning and human analysis, working with relevant industry bodies, is best in class. We will continue closely monitoring the situation and will share our findings once investigations are concluded.

I asked questions

This looks similar to what I’m seeing; so I thought I’d ask a little more.

“Are these downloads coming from all countries, or just from one or two?”

ST: We’re seeing the downloads in general coming from several countries, however can only speak to the podcasts that we measure. Those publishers are located around the world, so as a whole across our data we are seeing spikes in several countries. But when we drill into the downloads on a per-publisher basis, the majority of each publishers’ download increase is almost entirely occurring in their home country (we also see some smaller spikes in secondary countries that speak the same language, for some specific publishers).

I have tried to discover whether Podnews is the only default provider in Malaysia or Indonesia in Google News. I’ve asked Google News, but they’ve not answered. I’ve asked anyone in Indonesia or Malaysia to get in touch to see what they’re seeing - asking in Podnews itself, and the audio version apparently downloaded by almost a million people in Malaysia and Indonesia. I’ve not had anyone from those two countries contact us. Which, by itself, is quite telling.

“Have you been able to ascertain that these downloads are coming from Google News?”

ST: We ran a definitive test with one of [our clients], where the Google News team removed the show on their side – the spike disappeared for a 48+ hour period. When the show was put back, the spike resumed.

As definitive as the Podnews approach of tagging the files with the download source, I think.

“Do you see this behaviour with the NEWEST episode only?”

ST: Yes, it’s newest episode only. Once a new episode is published, the large number of downloads don’t continue for the prior episode. Eg: if a news podcast only publishes new episodes on weekdays, the Friday episode will still have substantial volume over the Saturday and Sunday (not as much, but still a large number).

This is what I see, too. I suspect the “real” downloads taper off over the weekend, but what I consider to be automated downloads continue to be made. Triton have larger customers with considerable traffic, so things may be different.

“Are you seeing the same consistent number of downloads per hour from this traffic, like we are?”

ST: Downloads are occurring throughout the day, for affected shows. We also see downloads by time buckets (in minutes post publishing) to be very similar pre and post noticing the spike. That indicates to us that it’s not necessarily an auto download or pre-fetch of content.

Google says it’s investigating

I contacted Google’s Audio News Support on Mar 3. They said:

“We are now looking into this matter and we’ll immediately inform you once we receive further clarifications from the internal team regarding this issue.”

Google didn’t ask for any logs, or any information.

On Mar 6, I contacted them again with a little more information. They said:

“At this time, we are still in the process of gathering all the necessary details and conducting a thorough checking."

Google didn’t ask for any logs, or any information.

On Mar 17, ten days later, I contacted them again. I was concerned, worrying that “This appears to be a bug, where Google is directing these phones to automatically download this news briefing, which is disrespectful towards the data costs of people in these two countries. The alternative is a security issue within Google itself. Both of these outcomes are now looking like a media story.”

Google responded:

“Please be informed that our internal team is actively working to fix the issue, and we have also escalated this matter.”

Google didn’t ask for any logs, or any information. It’s now Apr 7, and I’ve heard nothing further from the team.

Not meaning to be uncharitable towards the Google News team, but if they haven’t asked for logfiles, nor any further detail, I do not believe they are sincere in their claim that they are actively investigating this issue.

Meanwhile, I’m $200 out of pocket.

These are IAB verified downloads

Worthwhile noting that these are, according to the IAB Podcast Measurement Guidelines, entirely verified downloads. Each one comes from a different-looking phone (and a different user-agent); each one is from a residential IP address within either Malaysia or Indonesia. Most of them are a total download of the full audio file.

The IAB v2.2 guidelines do now state that “Anomalies, such as uncharacteristic spikes or drops in data, should be identified and metrics adjusted based on deeper investigation” - and it’s admirable that Triton Digital have done exactly that for their ranking data. Podnews doesn’t sell ads in our podcast based on total downloads, so this effect shouldn’t cause concern for advertisers.

Another reason to ask for better analytics

Finally, this is another reason to ask for better analytics for podcasting. The “download”, increasingly, doesn’t cut it. If we can get a decent sample size from podcast apps with some consumption data - followers, plays and completion data - then we’re in a significantly better position.

I’ll keep you in touch with developments, assuming Google ever comes back to me. (I could just block those two countries, of course). And, somewhere, I’ll find the $200 in extra hosting fees I’ve had to pay this month - although right now, my view is that I should just send the invoice to Google: it appears to be quite clearly their issue, after all.

James Cridland
James Cridland is the Editor of Podnews, a keynote speaker and consultant. He wrote his first podcast RSS feed in January 2005; and also launched the first live radio streaming app for mobile phones in the same year. He's worked in the audio industry since 1989.

Readers and supporters

Gold supporters

Silver supporters

Our supporters pay for Podnews, so everyone can access. Join them today.

Get a global view on podcasting and on-demand with our daily news briefing