Bookshaper is in early access — download now and early-access pricing is applied automatically.

Blog/Data Study

The First 500 Words: What 36 Classic Novel Openings Measure Like

We ran the opening of 36 public-domain novels through Bookshaper's pacing analyzer. Here's what the numbers actually say.

The Bookshaper Team

Methodology

Sample: the first 500 words of narrative prose from 36 public-domain, originally-English-language novels (1719–1925) via Project Gutenberg — English-language only, to keep a translator's style out of the numbers. Instrument: the same rule-based analyzer that powers our free Pacing Analyzer. Sentence detection is approximate (abbreviations like "Mr." and ellipses can split a sentence early), and 500 words is a slice, not a whole book — so read these as directional, reproducible signals, not verdicts. The corpus list and the analysis script live in the Bookshaper repo so anyone can re-run them.

A row of antique open novels with their first pages glowing, a luminous data visualization of bars and a measurement curve rising from the pages.

Measure, don't opine

There is no shortage of advice about how to open a novel. Start with action. Start with a hook. Keep your first sentence short and punchy. Open in scene, not summary. Most of it is opinion, repeated until it sounds like law.

We wanted numbers instead. So we took 36 public-domain novels — Austen to Fitzgerald, 1719 to 1925 — pulled the first 500 words of narrative from each, and ran them through the exact same analyzer that powers our free Pacing Analyzer. Same rule-based code a reader runs in their browser; a corpus anyone can re-download from Project Gutenberg.

We're not here to tell you what a good opening is. We're here to report what these openings actually measure like, and let you draw your own conclusions. Here's what the data says.

Finding 1: openings don't have a 'normal' sentence length

Glowing horizontal bars of widely varying lengths floating over a dark book page — a visualization of sentence-length rhythm.

Across the 36 openings, the average sentence runs about 24 words (mean 24.4, median 24.4). But the average is almost useless on its own, because the spread is enormous: from 10.9 words per sentence in the opening of Pride and Prejudice to 45.5 in Robinson Crusoe. That's a four-fold range in the very same task — writing the first page of a novel.

The more interesting signal is variation within each opening. Our analyzer scores rhythm as the standard deviation of sentence length relative to the mean (0–100); higher means more variety between long and short sentences. The openings averaged 73/100 — high. Several openings — Moby Dick, Great Expectations, Tom Sawyer, The Picture of Dorian Gray — scored a perfect 100. Strong openings rarely march in step; they mix a long, winding sentence against a short, flat one.

The sentence-length mix bears this out. Averaged across the corpus, opening prose is 27% short sentences (10 words or fewer), 35% medium (11–25), 22% long (26–40), and 16% very long (40+). Nearly two-thirds of every opening is short-to-medium, but the long and very-long sentences are always there too — almost 40% of the prose, doing the heavy descriptive lifting.

Takeaway you're free to ignore: the openings that 'work' aren't uniformly short or uniformly grand. They vary sentence length hard — that variance is the most consistent thing in the data.

Finding 2: most novels open in narration, not dialogue

This was the most lopsided result. The median opening is just 2% dialogue by word count, and 16 of the 36 novels — 44% — contain no dialogue at all in their first 500 words. The classic opening is a narrator setting a scene, not two characters talking.

But the average dialogue ratio is 12.9%, far above the median, because a vocal minority open mid-conversation and skew it. Pride and Prejudice is 74% dialogue in its first 500 words — the famous epigram about a single man in want of a wife gives way almost immediately to Mr. and Mrs. Bennet bickering. The Hound of the Baskervilles (64%), The Time Machine (57%), and Tom Sawyer (54%) all open the same way: drop the reader straight into talk.

So the distribution is bimodal. Most openings are pure narration; a smaller group are dialogue-forward from the first line. Almost nothing sits in the 'balanced' middle. If you open in dialogue, you're in a real but minority tradition — and notably, those dialogue-heavy openings also had the shortest average sentences (talk is clipped), which is why Pride and Prejudice and The Hound of the Baskervilles sit at the bottom of the sentence-length range.

Finding 3: the 'short punchy first line' is the exception, not the rule

The advice to open on a short, sharp sentence is real — and rarely followed. First-sentence length in our corpus ranged from 3 words (Moby Dick's "Call me Ishmael.") to 118 (the cascading "It was the best of times…" that opens A Tale of Two Cities). The median first sentence was 23 words. Only about a fifth of the novels opened on a sentence under 10 words.

The iconic short openers — "Call me Ishmael," "All children, except one, grow up" — are memorable precisely because they're unusual. Far more novels open on a long, subordinate-clause-laden sentence that sets a whole world in motion before the first period. Both work. Neither is the "correct" way to start.

(One honesty note, since rigor is the whole point here: our sentence detector is rule-based and splits on terminal punctuation, so an opening like "Mr. Utterson the lawyer…" gets clipped at the period after "Mr." — counting a spurious one-word first sentence. We left the artifact in rather than hand-cleaning the data; it's the same imprecision the live tool has, and we'd rather show it than hide it.)

Finding 4: opening sentences have gotten shorter over time

Splitting the corpus by date, the twelve novels published before 1860 averaged 27.4 words per opening sentence; the twenty-four from 1860 onward averaged 22.9. A clear drift toward shorter, faster opening prose across the 18th-to-20th-century span — visible even in this small sample.

It tracks with what you'd expect: the long periodic sentences of Defoe and Austen giving way to the leaner openings of the early modernists. The trend almost certainly continues past 1925, but public-domain limits stop our corpus there — which is itself worth stating plainly rather than pretending the sample is something it isn't.

The openings, book by book

A representative slice of the 36, oldest to newest. 'Avg sentence' is words per sentence; 'Rhythm' is sentence-length variation (0–100, higher = more varied); 'Dialogue' is the share of the first 500 words inside quotation marks.

NovelYearAvg sentenceRhythmDialogue
Robinson Crusoe171945.5800%
Pride and Prejudice181310.99474%
Frankenstein181827.8600%
Moby Dick185114.71000%
A Tale of Two Cities185938.5760%
Great Expectations186129.41004%
Middlemarch187141.7570%
The Adventures of Tom Sawyer187611.910054%
The Picture of Dorian Gray189026.310019%
The Time Machine189513.27357%
Heart of Darkness189919.2520%
The Hound of the Baskervilles190211.68364%
Peter Pan191115.6812%
The Great Gatsby192531.3445%

What you can do with this

Nothing here is a rule. But if you're staring at your own first page, the data offers a few honest reference points. Openings vary their sentence length hard — if yours is a wall of same-length sentences, that's the one pattern the canon almost never shows. Narration-led openings are the norm, so you're not obligated to open on dialogue; but if you do, you're in good, if smaller, company. And the 'short killer first line' is a real technique used by a memorable few, not a quota you have to hit.

The point of measuring isn't to find the formula. It's to replace 'an editor once told me' with something you can check. You can run this exact analysis on your own opening — same code, same metrics — in the free Pacing Analyzer, and see where your first 500 words land against this corpus.

Get the next data breakdown

Occasional emails when we publish a new measured study. No spam, unsubscribe anytime.