Things I learnt when self-hosting my own podcast
· Updated July 19, 2020 · By
I host my own podcast (it’s called Podnews, a daily news briefing about podcast and on-demand). I’ve learnt quite a lot about self-hosting it. I thought I’d write down some detail.
Here’s my setup: I use a web server on Amazon EC2, which produces an RSS feed. (I wrote the RSS feed generation script, which uses a database). I host the audio on Amazon S3.
Amazon Cloudfront is in front of both of these. Cloudfront is charged based on total bandwidth transferred.
Here are all the tools I use.
Learning: use a content delivery service for audio
I used to think that a content delivery network like Cloudfront wasn’t necessary for a podcast. Most people get their podcasts automatically downloaded overnight, for example, and fast speed or even a consistent transfer rate isn’t necessary if you’re downloading unattended.
Things have changed, though: most notably, the advent of Google Podcasts and smart speakers, both of which don’t download but rather stream podcast audio on-demand. Suddenly, it’s important to burst-deliver the first bit of a podcast as quick as you can (to lessen the wait between hitting “play” and hearing the audio). Speed is now important for audio, so a CDN becomes useful.
Secondly, Cloudfront keeps its own logs, and can be configured to drop them all into an Amazon S3 bucket for analysis later. By piping all my traffic through Cloudfront, I have one simple logfile regardless of where all the stuff is. This may be helpful and useful.
Cloudfront’s “behaviours” allow you to direct different URL patterns to different origin servers, too. This allows me to switch, if I want to, from Amazon S3 to somewhere else to host the audio — and keep the URL the same. This has already been useful (in the early days, I was serving audio from the webserver, rather than S3).
Learning: you can set Cloudfront to US/EU edge servers only
Cloudfront has thousands of edge servers all over the world; but you can set Cloudfront to only use US and European edge servers. That can save money (some other countries are expensive), but also can save your servers from additional stress.
For a while, I set mine as US/EU servers only. Because my Cloudfront distribution includes the website, though, I switched to “all” countries in early 2020. I’ve not noticed a sigificant increase in bandwidth costs.
The RSS feed
A podcast needs an RSS feed, of course, to function. This gets hit many, many times and is quite a large file. The RSS feed is therefore cached on Cloudfront and is normally fed to clients in a compressed format.
Since the podcast is a timely one, the RSS feed will only be cached (on Cloudfront or other servers) for five minutes. Essentially this means that if there are 4,000 users on the same Cloudfront edge server all asking for the RSS feed, my own server only produces the file once. I suspect there’s a nicer way to do this.
I’m now rewriting parts of the RSS feed for certain podcast apps, for monitoring purposes. This sort of thing is almost impossible with a standard podcast host.
It’s easy, ish, to produce an RSS feed that’s invalid: and that can get you pulled from most of the major directories. Ask me how I know…
Learning: use WebSub
The way RSS feeds work is that someone like Apple Podcasts comes along every so often to check whether I’ve just added a new podcast. While I can influence the “every so often”, I’ve no actual control over when Apple, or anyone else, comes back to check. It’s very wasteful in terms of bandwidth, too.
WebSub is the computer equivalent of me telling you: “Stop asking me all the time whether I’ve something new for you. I’ll tell Bill over there, and Bill will give you a call, OK?”
Bill is a “hub” server. I link to Bill in my RSS feed. If you’re running a podcast app, you can just ask Bill to let you know (“subscribe”) when I’ve “published” something new to my RSS feed. I let Bill know as soon as I update it.
The upshot of this computer gobbledegook is that when I publish a new podcast, it appears on Google Podcasts instantly, since Google uses WebSub. Literally, I press the publish button in my own code, it informs the hub, I look at my phone, and there’s the new podcast.
I’ve written up more about WebSub: more and more podcast hosts are supporting it.
Learning: produce multiple pieces of audio
After a bit of thought, I produce two versions of the podcast: one at 80kbps stereo AAC-HE at -16 LUFS, and one at 256kbps stereo MP3 at -14 LUFS. The audio editor I use, Hindenburg Journalist Pro, allows you to configure more than one publishing point, and I therefore produce these both automatically at the end of the production process. If I was a little cleverer, I’d get Hindenburg to do the actual uploading to Amazon S3, but the fact that it doesn’t is probably helpful to be honest, given how many times I’ve overwritten my local copy by mistake.
Most people who get the podcast will get the AAC version. It sounds very good, at considerably lower bitrates than the equivalent 128kbps MP3 would be. AAC is supported by virtually everything that supports podcasts these days. AAC and -16 LUFS is what Apple wants; -16 LUFS is also what Google wants. Everyone’s happy. (I’ve written a thing about LUFS and loudness).
The MP3 is much higher bandwidth, and would cost me almost five times more to serve. It’s exclusively given to devices that cache or transform my content, which is currently Amazon Alexa and Spotify. As chance would have it, they both also require -14 LUFS, a slightly louder output than the -16 LUFS that Apple require.
I’ve been able to give both Amazon and Spotify different RSS feeds to enable this. Amazon has a separate RSS feed, just containing one episode; Spotify uses the same RSS feed as everyone else, but with a query string identifying it as Spotify.
One more learning: add the iTunes block code on these additional feeds, otherwise Google Podcasts will find them.
Podcast stats are possible with a self-hosted solution, but are hard. Normal hit-based server log analysis won’t work, since they count things that aren’t actual downloads of audio.
Option one is to use a redirect service. Podtrac, Blubrry or Chartable all produce these, and I‘m using Chartable’s solution for now.
However, I’d prefer a more bespoke service, and am cooking one up. It turns out that if you dump all your Cloudfront logs into an Amazon S3 bucket, you can configure Amazon Athena to see these as a giant database, which you can do SQL queries against. (I’m querying 2.5GB of data, and the cost is about $0.015 each query). So, here’s a moderately compliant SQL statement (as long as you strip bots out of this):
SELECT count(*),uri,useragent FROM cloudfront_logs WHERE "date" BETWEEN DATE '2019-01-20' AND DATE '2019-04-20' AND uri LIKE '/audio/%' AND bytes>750000 GROUP BY uri,useragent,requestip ORDER BY useragent
…however, this won’t catch people grabbing the podcast in little chunks.
I’d like to work with others to produce more interesting reports. I think you could do some good reporting just using a clever SQL statement — with the one above, if there’s a separate table with some regex, I reckon you could get this looking rather prettier.
And that’s the lot for now
But if you’ve any questions — since quite a few people ask about my systems — I’d love to help. There are comments just down there.
|James Cridland is the Editor of Podnews, a keynote speaker and consultant. He wrote his first podcast RSS feed in January 2005; and also launched the first live radio streaming app for mobile phones in the same year. He's worked in the audio industry since 1989.|