TLDR: The default 10,000 unit/day quota will burn through in ~10 naive user requests. Three tricks pulled my per-user cost down 50× and let me ship TubeVocab on the free tier.
When I started building TubeVocab — an ESL learning tool that turns any YouTube video into a clickable, vocab-learning interactive transcript — I assumed the YouTube Data API v3 would be the cheap, easy part. "It's Google. It scales. The free tier is generous." That kind of gut feeling.
I was wrong. The free tier is generous, but only if you understand how quota math actually works. Most public tutorials skip this. Here's what I learned the hard way.
The quota arithmetic nobody puts in the quickstart
Default daily quota: 10,000 units. Sounds like a lot.
Then you start reading the cost table and realize:
-
search.list— 100 units per call. That's how you find a video by query. -
videos.list— 1 unit per call. That's how you fetch metadata once you have an ID. -
captions.list— 50 units. Thumbnails of available subtitles. -
captions.download— 200 units. The actual subtitle data.
If your user-facing flow is "search a YouTube channel → pick a video → load subtitles → render the interactive player," you're looking at roughly 100 + 1 + 50 + 200 = 351 units per single user session. The 10,000 free units evaporate in 28 sessions/day.
That's not a side project. That's a 30-DAU launch and you're paying for quota expansion the next morning.
Three tricks that cut my per-user cost ~50×
1. Don't use search.list for known IDs
This sounds obvious in hindsight but it took me a week to see. If a user pastes a YouTube URL, the video ID is right there in the URL. Parse it. Skip search.list entirely.
// Bad: 100 units per pasted URL
const result = await youtube.search.list({ q: pastedUrl, type: 'video', part: 'snippet' });
// Good: 0 units, regex the ID
const id = pastedUrl.match(/(?:v=|youtu\.be\/)([\w-]{11})/)?.[1];
const result = await youtube.videos.list({ id, part: 'snippet,contentDetails' }); // 1 unit
This one change took the average pasted-URL flow from 351 units → 251 units.
2. Skip the official captions.* endpoints entirely
The captions.download endpoint costs 200 units per video AND requires OAuth (the user has to be the video owner). For non-owner subtitle access — i.e. the actual ESL use case — you need a different path.
The trick: YouTube serves the auto-generated and uploader-provided subtitles through an undocumented but stable XML endpoint that doesn't count against your quota at all. You can get the timed transcript via https://video.google.com/timedtext?lang=en&v=VIDEO_ID, parse the XML, and you're done. 0 quota units.
(Caveat: this endpoint is undocumented, so it can break. I have a fallback path that uses youtube-transcript-api style scraping. The combined approach gets ~95% subtitle hit rate without touching the official caption quota.)
After this, my "load subtitles" cost dropped from 250 → 1 unit per session.
3. Cache aggressively at the video-ID level
Every time someone watches a video on TubeVocab, the metadata + subtitle + thumbnail set is the same until the video itself changes. I run a per-video-ID cache (just SQLite — overkill is fine) with no expiry. Subsequent views of the same video cost zero quota, regardless of how many users watch it.
Once I had ~500 popular videos cached, my marginal cost per session was effectively zero. The quota is now spent only on first-time-seen videos.
What actually shipped
After these three optimizations:
- Average new-video session: ~2 units (videos.list + occasional fallback)
- Average cached-video session: 0 units
- Daily ceiling on the free tier: ~5,000 unique new videos/day before I'd need to start budgeting
That's enough headroom for the foreseeable lifetime of a side project.
If you're building anything in the YouTube + content-analysis space — vocabulary tools, accessibility, search, analytics — the playbook is roughly: assume search.list is poison, route around captions.*, and cache by video ID forever. The free tier becomes more than generous once you stop fighting it.
For context: I built TubeVocab using exactly this stack — it's a click-to-flashcard ESL tool that turns any YouTube video into vocabulary practice. The quota math was the single most underestimated technical risk of the whole project. Hope this saves someone a week.
United States
NORTH AMERICA
Related News
UCP Variant Data: The #1 Reason Agent Checkouts Fail
7h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago
How Braze’s CTO is rethinking engineering for the agentic area
10h ago

Décryptage technique : Comment builder un téléchargeur de vidéos Reddit performant (DASH, HLS & WebAssembly)
17h ago
How AI Reduced Manual Driver Verification by 75% — Operations Case Study. Part 2
4h ago