commit 5e7d8845d06ee71df63b8fa9713465c8fcf55274
parent 472821afd2da0104fb5c3ecd2e65f12909f27e85
Author: Andrew Laack <andrew@laack.co>
Date: Sat, 27 Sep 2025 19:49:33 -0500
Finished post
Diffstat:
1 file changed, 69 insertions(+), 16 deletions(-)
diff --git a/posts/sustainability-of-youtube.gmi b/posts/sustainability-of-youtube.gmi
@@ -2,38 +2,91 @@
## Context
-I dislike using cloud services because they may discontinue my service [1]. This concern, along with concerns about privacy [2], has led me to keep information and content I care about away from cloud services. This does make me wonder, how many people would be distraught about the loss of their content if YouTube terminated their accounts? This is not the topic today, nor is it something I can easily answer, but it is something I wonder about and would like others to consider.
+I dislike using cloud services because they may discontinue my service [1] or they may do something stupid [2] that negatively impacts me. These concerns, along with concerns about privacy [3], have led me to keep information and content I care about away from cloud services. This does make me wonder, how many people would be distraught about the loss of their content if YouTube terminated their accounts? This is not the topic today, nor is it something I can easily answer, but it is something I wonder about and would like others to consider.
-Similarly, I am skeptical of 'free' services. It's incorrect to say "if something is free, you are the product" because charity does exist, but when it comes to Google they aren't a charity. Their current model with YouTube is to have people upload videos to their site and show ads prior to some users watching said videos. There are also subscriptions and such, but this is the general strategy. Notice they do not purge content on a regular basis except for the exception of ToS violations. As such, there is a (nearly) monotonically increasing function that describes the storage requirements of YouTube. Along with this, there are hard limits for the density of information storage [3] and limited growth potential for the company to sustain their data storage. This leads to my question posed below.
+Similarly, I am skeptical of 'free' services. It's incorrect to say "if something is free, you are the product" because charity does exist, but when it comes to Google, they aren't a charity. Their current model with YouTube is to have people upload videos to their site and show ads to some users when they watch said videos. There are also paid subscriptions, but their primary monetization comes from ads. An important point is they don't purge content on a regular basis, except in cases of ToS violations. As such, there is a (nearly) monotonically increasing function that describes the storage requirements of YouTube. This motivates my question below.
## Question
-When will YouTube's storage costs exceed their revenue if they don't start purging old content?
+When will YouTube's storage costs exceed their revenue if they don't start purging old content, assuming their revenue does not increase over time?
-## Findings
+## How to Answer This Question
-YouTube states on their official blog there are over 20 million videos uploaded per day [4]. I can not find any believable metrics for how much data this is because so many of these SEO slop sites regurgitate the same numbers which don't correspond with YouTube's stated number of video uploads per day. While I don't trust YouTube very much, and they likely have incentives to inflate the numbers, they seem more trustworthy in this context as they are, in fact, the ones who are hosting the content. As such, I will accept this metric and try to work backwards to an approximate amount of data being saved to their servers.
+We need the following information to answer this question:
-### Scraping
+- What is YouTube's annual net profit?
+- How much data does YouTube store?
+- How much does data storage cost?
-I wrote a simple python script [5] that used a curated list of popular Google Trends searches over the past few decades [6] that uses the YouTube search endpoint, sorting by most recently uploaded, to compile a list of ~7.65 million YouTube video URLs along with their duration.
+## YouTube's Profit
-While this seems to be a reasonable proxy for YouTube video lengths, there are some limitations of the approach such as:
+According to Alphabet's 2025 Q2 earnings release [4], YouTube ads made a revenue of $9.769 billion. Annualized, this is $39.076 billion, but this is only revenue, not net profit. If we assume the operating margin across Alphabet matches the operating margin of YouTube (32%), we find an approximate net profit of $12.50432 billion / year. Actual net profit could differ from this, but since we are concerned with how much data storage this could support, we don't need to factor in how this would be taxed.
-- Only searching for top US terms
-- Small dataset size
-- YouTube likely imposing some amount of algorithmic filtering
-- The videos in questions are all public
-- ...
+## Storage Needs
-These limitations, and more, are flaws in my methodology that make them not entirely representative of all videos uploaded to the platform, but any approach will be imperfect without the data directly from YouTube.
+### Total Videos
+
+YouTube states on their official blog there are over 20 million videos uploaded per day [5]. While I don't trust YouTube very much, and they don't have many incentives to be honest on this topic, they seem more trustworthy in this context than the slop factory sites as they are, in fact, the ones who are hosting the content. As such, I will accept this metric.
+
+### Average Video Size
+
+I wrote a python script that uses a curated list of popular Google Trends searches over the past few decades [6] to search YouTube for recently uploaded videos. I ran this script and compiled a list of ~7.65 million YouTube videos.
+
+Before continuing, I will list a few limitations of this approach:
+
+- YouTube likely imposes some amount of algorithmic filtering when sorting by 'recently uploaded'
+- The videos in question are all public (not inclusive of private/unlisted videos)
+- Less popular search terms may have a different distribution of video sizes
+
+These are the main flaws in my methodology, but any approach will be imperfect without being able to get the data directly from YouTube.
+
+Of these 7.65 million videos, I sampled 615,222 of them and queried YouTube using yt-dlp [7] to find all video resolutions and formats YouTube will serve.
+It seems unlikely to me that YouTube stores each of these resolutions on their servers, but I think it is very likely that YouTube is storing the highest resolution version they are willing to serve to users.
+
+Based on my findings, I propose a lower bound of ~396.17 MB / video, which assumes they are only storing the highest resolution version and all other versions are generated in real time via transcoding (I am confident this isn't the case, but it provides a nice lower bound). I also propose an upper bound of ~1.44 GB / video, which assumes they are storing every resolution and format for each video they are serving.
+
+All of the code used for this is available on my git server [8].
+
+### Annual Storage Increase
+
+Using my findings above about video size and YouTube's stated video upload rate, we find:
+
+Lower bound:
+
+- 7.923 PB / Day
+- 2.89 EB / Year
+
+Upper bound:
+
+- 28.895 PB / Day
+- 10.547 EB / Year
+
+Note: These values may vary depending on rounding, but they should be similar to what anyone else would find.
+
+## Storage Cost by Volume
+
+GCP currently charges $26 / month for 1 TB of standard multi-region, US based, cloud storage [9]. If we assume the same 32% profit margin as before, this would cost ~$17.68 / TB / month or $212.16 / TB / year. I don't know if this is high or low relative to what they actually pay. YouTube requires quick access to many of their videos, but many of their videos are likely retrieved infrequently. Additionally, it seems likely Alphabet's cloud storage margins are higher than the average margins across the organization. Additionally, these are only US storage prices so this could vary depending on the regions this data is being hosted in. In any case, I think this is a fair estimate.
+
+## Answer to the Question
+
+Given YouTube's approximated net profit of $12.50432 billion / year and an estimated cost of $212.16 / TB / year for cloud storage, we find their profits can support an additional ~58.94 EB of data.
+
+At the lower bound of 2.89 EB / year we find YouTube's storage costs will surpass their current profits in ~20.39 years.
+
+If we assume our upper bound of 10.547 EB / year we find YouTube's storage costs will surpass their current profits in ~5.59 years.
## Conclusion
+These are very rough bounds, especially given how difficult it is to estimate the cost per TB / year for storage of this data given their retrieval needs, but we find that in ~5.59 - ~20.39 years, YouTube will be forced to start purging old content to remain profitable at their current profit rate.
+
## Citations
=> https://killedbygoogle.com/ Examples of discontinued Google services
+=> https://arstechnica.com/gadgets/2024/05/google-cloud-accidentally-nukes-customer-account-causes-two-weeks-of-downtime/
=> https://www.gnu.org/proprietary/proprietary-surveillance.html Privacy concerns of using the cloud
-=> https://en.wikipedia.org/wiki/Bekenstein_bound Theoretical limit for information storage in a finite region of space
+=> https://abc.xyz/assets/cc/27/3ada14014efbadd7a58472f1f3f4/2025q2-alphabet-earnings-release.pdf Alphabet earnings
=> https://web.archive.org/web/20250911091711/https://blog.youtube/press/ Videos uploaded per day
-=> gemeini://blog.laack.co/python_script Python Script for YouTube
+=> https://trends.google.com/trending?geo=US&hl=en-US&hours=168 Trending Search Terms in the US
+=> https://github.com/yt-dlp/yt-dlp
+=> http://git.laack.co/blog/log.html
+=> https://cloud.google.com/storage/pricing#multi-regions Storage cost GCP