Polyphonic Piano Transcription Explained

A single sung melody is one note at a time, and software has been able to follow that for years. A piano is a different animal. It plays five, six, eight notes at once, with one hand holding a chord while the other moves a line through it, and the whole thing rings together into a single wash of sound. Teaching a computer to pull that apart into separate written notes is the core hard problem of music transcription, and piano is where the field measures itself.

This is a plain-language explainer of how polyphonic piano transcription actually works: what makes simultaneous notes so difficult, how a modern model reads a chord, and why a clean piano recording gives you the best result. No equations, just the ideas that matter.

What Polyphonic Transcription Means

Polyphonic means many voices. Polyphonic transcription is the job of taking audio in which several notes sound at the same time and writing every one of them down. Its opposite is monophonic transcription, a single line of one note at a time, which is far easier because there is never any question of how many notes are present. Our explainer on monophonic versus polyphonic transcription covers that split. Piano music is polyphony at full strength: dense chords, overlapping lines, and a sustain pedal that lets it all bleed together.

Why Simultaneous Notes Are So Hard

A single musical note is not a single frequency. It is a fundamental plus a stack of overtones, the harmonic series, and that stack is what gives the note its color. When you play one note, a system can find the fundamental and read the pitch. When you play several at once, their harmonic stacks land on top of each other, and the trouble begins.

Shared harmonics. Notes a fifth or an octave apart share many overtones, so the evidence for one note looks like the evidence for another.
The octave problem. A note and the note an octave up share nearly all their harmonics, which is why phantom or misplaced octaves are the classic polyphonic error.
Counting the notes. Before naming pitches, the system has to decide how many notes are even sounding, and a soft inner voice can hide under a loud outer one.
Onsets in a blur. Working out exactly when each note starts is hard when several begin near the same instant and the pedal lets earlier notes ring through.

How AI Reads a Chord

Modern systems treat this as a learning problem rather than a hand-built rulebook. A neural network is trained on a large amount of audio paired with the true notes, so it learns the actual timbre of a piano: how its harmonics behave, how loud and soft notes differ, how a struck note decays. To read a recording, the model slices the audio into many short frames and, for each frame, estimates which pitches are active and where notes begin. Those per-frame judgments are then stitched into discrete notes with a start, a pitch, and a duration, and laid out as notation. Because the model has heard so much real piano, it can use musical context, what came before, what tends to go together, to resolve the ambiguities that defeat a naive frequency analysis. It is closer to how a trained musician hears a chord than to a calculator reading frequencies.

Why Piano Is the Benchmark

Piano is the instrument the whole field has pushed hardest on, and for good reason. It is the densest common polyphony, its notes have a clear, consistent onset and a predictable decay, and there is a lot of data to learn from. That combination has made piano the most mature transcription target, and it is Songscription's most developed model too. A clean solo piano recording produces a result close to what a skilled human transcriber would write, including the inner voices a listener cannot pick out by ear. Our guides to transcribing piano music with AI and transcribing jazz piano chords show that maturity on real music.

What Affects Accuracy, and How to Help

The recording is the biggest lever. A clean, isolated, fairly dry solo piano gives the model the clearest evidence; heavy reverb and a heavy sustain pedal blur harmonies together and cause most of the errors you will see. If you played the piece on a digital piano, capturing the MIDI directly gives perfect note data and skips the audio problem entirely. After transcribing, the piano roll is the best place to check the dense spots, since each note is a separate block and an octave error or an extra note jumps out and is easy to drag into place. You can see that grid in our guide to turning audio into a piano roll, and our explainer on why AI transcription accuracy varies covers the rest of the factors. For the broader question of several different instruments at once, see can AI transcribe multiple instruments at once. To try it on your own playing, start with piano transcription.

Frequently Asked Questions

What is polyphonic piano transcription?

Polyphonic piano transcription is the task of taking a recording in which many notes sound at the same time, the chords and overlapping lines a piano plays, and writing all of them down as notation. The polyphonic part means several pitches at once, as opposed to monophonic, where only one note sounds at a time. It is harder than transcribing a single melody because the notes share the same moment in time and their harmonics overlap, so the system has to decide how many notes are present and which pitches they are.

Why is transcribing chords harder than a single melody?

A single melody is one pitch at a time, so the system only has to track one fundamental frequency as it moves. A chord stacks several notes whose overtones overlap and reinforce each other, so the evidence for one note can look like the evidence for another. The system has to count how many notes are sounding and untangle which harmonic belongs to which pitch, all at the same instant. That counting-and-untangling under overlap is what makes polyphony the hard part of transcription.

What causes octave errors in transcription?

Octave errors happen because a note and the note an octave above it share most of their harmonics. The overtone series of a low C lines up closely with the harmonics of the C above it, so a system can place a note an octave too high or too low, or add a phantom octave that was not played. Models reduce these errors by learning the timbre of the instrument and the musical context, but octaves remain one of the most common things to glance at and correct in a polyphonic transcription.

How can I get the most accurate polyphonic transcription?

Give the model the cleanest, most isolated piano recording you can. A clear solo piano take, with little reverb and no competing instruments, lets the model read the inner voices accurately. Heavy sustain pedal blurs harmonies together and is the most common source of errors, so a drier recording helps. If you played the piece on a digital piano, capturing the MIDI directly gives perfect note data and sidesteps the audio problem entirely. After transcribing, scan dense chords in the piano roll, where octave and extra-note errors are easy to spot and fix.

Polyphonic Piano Transcription Explained

What Polyphonic Transcription Means

Why Simultaneous Notes Are So Hard

How AI Reads a Chord

Why Piano Is the Benchmark

What Affects Accuracy, and How to Help

Frequently Asked Questions

What is polyphonic piano transcription?

Why is transcribing chords harder than a single melody?

What causes octave errors in transcription?

How can I get the most accurate polyphonic transcription?

How Does AI Music Transcription Work?

What Is Stem Separation, and How Does It Help Transcription?

What Audio Formats Work Best for Music Transcription?

What Polyphonic Transcription Means

Why Simultaneous Notes Are So Hard

How AI Reads a Chord

Why Piano Is the Benchmark

What Affects Accuracy, and How to Help

Frequently Asked Questions

What is polyphonic piano transcription?

Why is transcribing chords harder than a single melody?

What causes octave errors in transcription?

How can I get the most accurate polyphonic transcription?

About the author

Related Posts

How Does AI Music Transcription Work?

What Is Stem Separation, and How Does It Help Transcription?

What Audio Formats Work Best for Music Transcription?