Things I learnt when self-hosting my own podcast
· Updated June 18, 2021 · By James Cridland · 6 minutes to read
I host my own podcast (it’s called Podnews podcasting news, a daily news briefing about podcast and on-demand). I’ve learnt quite a lot about self-hosting it. I thought I’d write down some detail.
Here’s my setup. I use a web server on Amazon Lightsail, which produces an RSS feed. I wrote the RSS feed generation script, which uses a database. I host the audio on Amazon S3.
Amazon Cloudfront is in front of both of these. Cloudfront is charged based on total bandwidth transferred.
Here are all the tools I use.
Learning: use a content delivery service for audio
I used to think that a content delivery network like Cloudfront wasn’t necessary for a podcast. Most people get their podcasts automatically downloaded overnight, for example, and fast speed or even a consistent transfer rate isn’t necessary if you’re downloading unattended.
Things have changed, though: most notably, the advent of Google Podcasts and smart speakers, both of which don’t download but rather stream podcast audio on-demand. Suddenly, it’s important to burst-deliver the first bit of a podcast as quick as you can (to lessen the wait between hitting “play” and hearing the audio). Speed is now important for audio, so a CDN becomes useful.
Secondly, Cloudfront keeps its own logs, and can be configured to drop them all into an Amazon S3 bucket for analysis later. By piping all my traffic through Cloudfront, I have one simple logfile regardless of where all the stuff is.
Cloudfront’s “behaviours” allow you to direct different URL patterns to different origin servers, too. This allows me to switch, if I want to, from Amazon S3 to somewhere else to host the audio — and keep the URL the same. This has already been useful (in the early days, I was serving audio from the webserver, rather than S3).
For the US and Europe, where the majority of my traffic is, Cloudfront is a little cheaper in terms of bandwidth:
- Data transfer pricing for Cloudfront is $0.085 per GB in the US and Europe, and $0.0075/$0.0090 per 10,000 requests.
- Data transfer pricing for raw Amazon S3 is $0.09 per GB, and $0.005 per 10,000 requests.
You can keep pricing down by restricting Cloudfront to US/EU only. Some areas are significantly more expensive.
The RSS feed
A podcast needs an RSS feed, of course, to function. This gets hit many, many times and is quite a large file. The RSS feed is therefore cached on Cloudfront and is normally fed to clients in a compressed format.
Podnews’s main RSS feed is served just under 15,000 times a day; a total of 2.74GB of data, costing roughly $0.24 per day to serve. It’s almost as expensive to serve the RSS feed as it is to serve the audio (below).
I produce a different version of the RSS feed for each user-agent (so that I can do some fancy monitoring). This is normally a bad thing for caching, but RSS is a little different, given that there’s a finite amount of useragents for RSS feeds. In a typical day, that feed sees 460 different useragents, and 86% of my RSS feeds are still cached by Cloudfront.
The RSS stats page shows when podcast aggregator apps come to check on my RSS feed. Overcast, Google and PodcastAddict check roughly every four minutes.
Learning: use WebSub
The way RSS feeds work is that someone like Apple Podcasts comes along every so often to check whether I’ve just added a new podcast. While I can influence the “every so often”, I’ve no actual control over when Apple, or anyone else, comes back to check. It’s very wasteful in terms of bandwidth, too.
WebSub is the computer equivalent of me telling you: “Stop asking me all the time whether I’ve something new for you. I’ll tell Bill over there, and Bill will give you a call, OK? Go give him your telephone number.”
Bill is a “hub” server. I link to Bill in my RSS feed. If you’re running a podcast app, you can just ask Bill to let you know (“subscribe”) when I’ve “published” something new to my RSS feed. I let Bill know as soon as I update it.
The upshot of this computer gobbledegook is that when I publish a new podcast, it appears on Google Podcasts instantly, since Google uses WebSub. Literally, I press the publish button in my own code, it informs the hub, I look at my phone, and there’s the new podcast.
I’ve written up more about WebSub: more and more podcast hosts are supporting it.
I also support PodPing, a service run by the Podcast Index, which does similar.
Learning: produce multiple pieces of audio
I produce three versions of the podcast: one at 64kbps stereo AAC-HE at -16 LUFS, and one at 112kbps MP3, also at -16 LUFS.
Many apps get given the AAC version; some get the MP3 version.
I do this with the audio editor I use, Hindenburg Journalist Pro, which allows you to configure more than one publishing point, and I therefore produce these three automatically at the end of the production process. I attach cover images using a complicated bit of AppleScript.
AAC is supported by virtually everything that supports podcasts these days. AAC and -16 LUFS is what Apple wants; -16 LUFS is also what Google wants. Everyone’s happy. (I’ve written a thing about LUFS and loudness).
The MP3 is much higher bandwidth, and costs me significantly more to serve.
Additionally, I also produce a version of the podcast at 12kbps mono Opus, at -14 LUFS. This is used in the
alternateEnclosure tag, and is selectable on a few different podcast apps. It’s there in case you pay for your bandwidth and want a fine-sounding version that won’t cost the earth to download.
And in fact, I also produce another version of the podcast: an ad-free version with a little less tech-talk in it, for use by Apple’s Siri service and by Podcast Radio. That’s produced in fancy MP3 for Podcast Radio, and in AAC for Siri.
In total, in a typical day, I see 2,686 requests for audio, and a total of just over 3GB of data transfer. That’s about $0.26 per day.
Podcast stats are possible with a self-hosted solution, but are hard. Normal hit-based server log analysis won’t work, since they count things that aren’t actual downloads of audio.
Option one is to use a redirect service. Podtrac, Blubrry or Chartable all produce these. I was using Chartable for some time: it is a simple and robust service.
Every day I run a cronjob to make a query to Athena, to pull just yesterday’s podcast data into a CSV file. Here’s the current query:
SELECT lower(to_hex(md5(to_utf8(requestip)))) AS encoded_ip, referrer,date,time,uri,bytes,useragent,querystring FROM cloudfront_logs WHERE (uri LIKE '/audio/pod%' AND method='GET' AND SUBSTR (uri,-3,1) = 'm') AND "date"=current_date - interval '1' day ORDER BY time
This gives me a full list of all calls to audio. I’m querying 2.5GB of data, and the cost is about $0.015 each query.
I then store the resulting CSV file in an S3 bucket, and write its ID into a database.
Then, the podcast stats page is some ugly PHP that iterates through all the log lines, and does the counting and matching. It discards requests that aren’t 750KB (which is roughly one minute of audio); which currently means it won’t catch any clients who request audio in 250KB chunks, for example; not sure there are too many of those, but I should look.
Hope that’s interesting to some. Use the 'contact us’ page if you’ve questions.
|James Cridland is the Editor of Podnews, a keynote speaker and consultant. He wrote his first podcast RSS feed in January 2005; and also launched the first live radio streaming app for mobile phones in the same year. He's worked in the audio industry since 1989.|