It is one of the first things people ask about AI transcription: can it take a full song, with drums and bass and guitar and vocals all going at once, and write out every part? The honest answer is more interesting than a yes or no, because the question is actually two questions wearing one coat, and they have very different answers.
Short version: reading many notes at once from a single instrument is something AI does well, especially on piano. Pulling many different instruments apart from one mixed recording into separate parts is the hard frontier, and the practical way to handle it today is to separate first and transcribe each part. Here is what is really going on, and how to get readable notation from a multi-instrument recording right now.
Two Different Questions in One
When people say "multiple instruments at once," they usually mean one of two things, and the difference is everything.
- Many notes at the same time from one instrument, like a thick piano chord. This is polyphonic transcription, and it is a problem AI has gotten very good at.
- Many different instruments at the same time in one mixed recording, like a full band. This requires un-mixing the recording into separate sources before transcribing, which is genuinely hard.
Conflating the two is what makes the topic confusing. A tool can be excellent at the first and still find the second difficult, because they are not the same skill. Our explainer on monophonic versus polyphonic transcription covers the first axis in detail.
Many Notes at Once: Polyphony
Reading a stack of notes that sound together is polyphonic transcription, and it is where modern AI shines. A piano playing a five-note chord with an inner line moving underneath is exactly the kind of dense, simultaneous music that used to take a trained ear ages to pick apart, and a good model reads it cleanly. So if your "multiple" means the rich, many-voiced playing of a single piano, the answer is a confident yes. Our deep dive on polyphonic piano transcription covers how that works and why piano is the strongest case for it.
Many Instruments at Once: The Harder Problem
A full-band recording mixes several instruments into one stream of sound, and their frequencies overlap and mask each other. To write each instrument on its own staff, a system first has to separate the mix back into its parts, then transcribe each one, then decide how to lay them out. Each of those steps adds error, and the masking means some detail is genuinely lost in the mix. This is why no tool turns an arbitrary full song into a flawless, fully separated orchestral score at the press of a button. It is an active research frontier, not a solved problem, and any honest tool will tell you so. What you can reliably get is covered next.
The Practical Workflow
There are two dependable ways to get readable notation from a multi-instrument recording today, and you pick by what you actually want.
- Want separate parts? Split the song into stems first, then transcribe each instrument on its own with the model that matches it. This gives the cleanest result per part, because each model gets an isolated source.
- Want one playable score? Transcribe the recording to a single condensed part, usually a piano reduction that gathers the melody, harmony, and bass onto a grand staff. Faster, and often all a player needs.
Our guide to transcribing multi-track audio to sheet music walks through the stem-by-stem route, and transcribing a full band recording covers the condensed-score route. Either way, you start from any recording with audio to sheet music.
Which Instruments Songscription Reads
Songscription transcribes with per-instrument models. Piano is the most mature and handles dense polyphony best. Newer models cover guitar, bass, and drums, along with melodic lines like trumpet, saxophone, and violin, while vocals remain experimental. The throughline is isolation: the cleaner and more separated an instrument is in the audio, the more accurately its model reads it, which is exactly why splitting a mix into stems before transcribing beats aiming one model at the whole band. For a single melodic part, our guides to transcribing a bass line and drums show the per-instrument approach in action.
Frequently Asked Questions
Can AI transcribe multiple instruments at once?
It depends what you mean. AI handles many notes at once from one instrument very well, which is what polyphonic transcription does on a piano. Pulling several different instruments apart from one mixed recording into separate written parts is much harder, and it is the frontier of the field. The practical answer today is to separate the recording into stems and transcribe each instrument, or transcribe the whole thing to a single condensed part like a piano reduction. Both give you readable notation; the second is faster, the first keeps the parts distinct.
What is the difference between polyphonic transcription and separating instruments?
Polyphonic transcription is reading several notes sounding at the same time from one source, like a piano chord, and writing them all down. Source separation is splitting a mixed recording into its individual instruments, the vocal, the bass, the guitar, before any transcription happens. They are different problems: one is about hearing a stack of notes, the other is about un-mixing a recording. Multi-instrument transcription usually needs both, which is why it is harder than transcribing a single clean instrument.
How do I transcribe a song with several instruments?
Two practical routes. If you want separate parts, split the song into stems first, then transcribe each instrument on its own with the matching model, which gives the cleanest result per part. If you want one playable score, transcribe the recording to a condensed arrangement, typically a piano reduction that gathers the melody, harmony, and bass onto a grand staff. Choose by your goal: distinct parts for a score, or a single part to play.
Which instruments can Songscription transcribe?
Piano is the most mature model and handles dense, polyphonic playing best. Newer models cover instruments like guitar, bass, drums, and melodic lines such as trumpet, saxophone, and violin, with vocals still experimental. The cleaner and more isolated an instrument is in the recording, the better any of these models reads it, which is why separating a mix into stems before transcribing each part produces a more accurate result than pointing one model at the full band.
