YouTube Subtitles vs Transcript: What's the Difference?
"Subtitles," "captions," and "transcript" all refer to text versions of a video's audio — but they're not interchangeable. Each is built for a different job. Here's a quick guide to telling them apart, and how to get whichever one you need.
Subtitles
Subtitles are the short timed lines of text that appear on top of a video while it plays. Each subtitle line typically covers 2-5 seconds of speech and is shown one or two at a time, then replaced as the audio moves on.
Subtitles originated as a translation tool — to let viewers understand a video in a language they don't speak. Strictly speaking, "subtitles" assume the viewer can hear the audio; they translate the speech but don't usually annotate sound effects or speaker IDs.
Format on disk: SRT, VTT. Each entry has a start time, end time, and a short text line.
Best for: watching a video in a language you don't speak.
Closed captions
Captions look like subtitles but include audio information the viewer can't hear: speaker names ("[Alice]"), tone markers ("[laughs]"), and ambient sound ("[explosion]"). They're designed primarily for deaf and hard-of-hearing viewers — anyone who can't access the audio at all.
"Closed" means the captions can be toggled on or off (vs. "open" captions, which are burned into the video). On YouTube, the CC button toggles them.
Format on disk: Same as subtitles (SRT, VTT) with additional bracketed cues.
Best for: accessibility — and as a general "I want to read along while watching" option.
Transcript
A transcript is the full text of everything spoken in the video, in reading order, with timestamps optional. It's not timed to playback — it's meant to be read as a document, not viewed as an overlay.
Where subtitles split sentences into 2-line on-screen blocks, a transcript runs paragraphs together so you can read at your own pace. Where captions add sound cues, a transcript usually just contains the words spoken (though formal interview transcripts do include speaker tags and edits).
Format on disk: Plain text, Markdown, or PDF. Some tools include line-by-line timestamps; others don't.
Best for: reading, searching, quoting, translating, summarizing, citation.
The same source, three views
For any given YouTube video, all three views come from the same underlying audio. The differences are purely in how that audio is presented as text:
- Subtitles / captions are the audio split into 2-5 second chunks, timed to the playback. Each chunk is short enough to fit on screen.
- Transcript is the same words, but reassembled into a document — paragraphs instead of timed lines.
This is why a good app like YouTube Translate can give you both: it pulls one source, and offers a "subtitle" view (timed lines, sync with video) and a "transcript" view (continuous reading mode) toggled with one tap.
Get both subtitles AND transcripts from any video
YouTube Translate gives you a timed subtitle view and a continuous transcript view, side-by-side. Free on iOS and Android.
Which one do you want, in practice?
- Watching with the sound off? Subtitles or captions.
- Studying a foreign-language video? Subtitles to follow along + transcript to study after.
- Pulling a quote for an article or paper? Transcript. Subtitles fragment sentences mid-thought.
- Translating the whole video into your language? Transcript. Translating subtitle-by-subtitle produces choppy output.
- Searching for a specific phrase the speaker said? Transcript with timestamps. Subtitle search is awkward.
- Generating an AI summary? Transcript. Summaries built from subtitle chunks tend to miss context that crossed chunk boundaries.
- Building flashcards or notes from the video? Transcript.
Short version: subtitles are for watching, transcripts are for working.
Auto-generated vs. manual
Both subtitles and transcripts can be either auto-generated by YouTube's speech recognition or uploaded by the creator. Manual versions are nearly always better:
- Proper punctuation and sentence breaks.
- Correct spelling of names, places, technical terms.
- Speaker identification on interviews.
- Notation of important non-speech audio when present.
A captioning indicator (CC button on YouTube) shows which version is available. If both exist, manual is the default.
When the video has neither
If YouTube has no subtitle track at all — the creator didn't upload one, and auto-generation failed — you'll need AI transcription to get any text out of the video. Modern transcript apps handle this fallback automatically: Gemini-powered speech-to-text generates a transcript from the audio in 1-3 minutes for most videos.
Closing thoughts
Subtitles, captions, and transcripts are three views of the same underlying audio — pick whichever one matches your job. For watching, subtitles. For reading, studying, quoting, translating, or summarizing, transcripts. A good app gives you both without making you choose.
Try YouTube Translate
Free on iOS and Android. Subtitles, transcripts, translation, AI summaries.