What Is Imaging In Audio? (and thoughts on why most headphones and IEMs suck at it)

What follows is a reflection of how I understand and use the term "imaging" in my writing. But as I will make clear quickly, it is not intended to be an authoritative post on the subject, so feel free to agree or disagree with my thoughts. Important terms are in bold


Imaging. This is a piece of audio jargon that's frustrated me to no end, and it's not hard to see why. Like many terms slung around in the audio world, "imaging" has as many definitions as there are opinions, and one won't find a concrete definition of what it constitutes. If we look up the Wikipedia definition (probably the most official one we're going to find) of the word, we see that imaging "refers to the aspect of sound recording and reproduction of stereophonic sound concerning the perceived spatial locations of the sound source(s), both laterally and in depth". It's not hard to see why, for most listeners, soundstage - a more straightforward term, the distance a transducer is able to create between the listener and the instruments in a track - is a distinct characteristic of sound. But give it a little more thought. By definition, the perception of soundstage is crafted by positioning; therefore, soundstage is a derivative of imaging.  

The Measurable

Now that we've done some fancy semantics and established that imaging is more closely a catch-all term, let's bring that to the realm of the measurable. More specifically, let's talk about how frequency response interacts with our perception of imaging. There are a number of "tricks" that come to mind: 
  • Dipping the upper-midrange frequencies, so say ~3-4kHz, will often increase the perception of soundstage depth by pushing vocals further back in the mix.
  • Similarly, easing back the pinna compensation (the part of an IEM or headphone's frequency response that is traditionally elevated to account for natural bodily resonance peaks) can lend to a more diffused, distant presentation. 
  • Dipping and elevating certain parts of the treble response can lend to unique reverb and, by extension, to the way the stage is imaged. 
Heck, to the point of the above, treble in general is incredibly important to imaging! Consider what we hear everyday. In real life, there are “reverb trails, room ambiance, and positional precision. When extension suffers, our ears can tell something is off. We hear high frequencies in room ambiance that we may not realize, and all instruments have interactions at that region and others as well” (quoted from Head-Fi user luisdent). It's not hard to see why a transducer that is able to extend audibly has a much, much better chance of creating the perception of soundstage. Indeed, I have yet to hear a transducer with poor treble extension that has good imaging, although the opposite does not always hold true. 

The Intangible

That last comment is a good segue into the realm of the intangible. Why does the opposite not hold true; that is, why does a transducer with good treble extension not always have good imaging? It might prove fruitful to circle back to soundstage and some new terms such as positional accuracy, layering, and "holographic" imaging. Note that what follows below does not answer the question; however, it is more so an interim, a placeholder, for describing characteristics of sound that we have not been able to measure (or interpret) yet. 
  • Positional accuracy is a term most readers are probably more familiar with because it aligns closely with the colloquial definition of imaging. This is the degree to which a transducer is able to localize instruments on the soundstage; then, the degree to which a listener can pinpoint them.
  • Layering, often used interchangeably with the term separation, is the sense of physicality and space between instruments on the stage. Furthermore, it is indicative of the extent to which a transducer is able to give individual instruments a well-defined spot on the stage. You can see this has overlap with positional accuracy. 
  • Holographic imaging is a term that's thrown around far too generously in my opinion. This is the perception with which instruments - usually percussive ones - "float" on the soundstage. By extension, this plays into soundstage height and the way a transducer shapes the walls of the stage.

Transducers and Imaging

Finally, finally, let's bring imaging to the realm of transducers: IEMs, headphones, and speakers. There is an unspoken hierarchy in the audio world when it comes to imaging. IEMs are at the bottom of the barrel, and it's not hard to see why. While IEMs can have good positional accuracy and layering, their ability to shape the sense of space around a listener is decidedly limited to tricks in frequency response, acoustic chambers, and other "gimmicks". Even the best IEMs have soundstage height, width, and layering that pales compared to mid-tier headphones. But headphones also fall short when it comes to one aspect of imaging: soundstage depth. In fact, I'd say headphones tend to pull the short end of the stick here even more than IEMs because they don't tend to use the aforementioned tricks! 

The lauded king of soundstage, the HD800S, has zero soundstage depth to my ears. 

The problem with imaging in IEMs and headphones is that the sound is coming from two separate channels; it's not at all like what you'd hear in real-life. To top it off, these miniature speakers and drivers are being placed very close to your ears. Suffice it to say imaging is a psychoacoustic illusion for headphones and IEMs in which your brain is doing the majority of the legwork. Of course, one could argue that perception is reality. But it's not hard to see why, positionally, instruments and vocals that token the center image come from inside one's head. For this reason, I always get a chuckle out of reviews where an IEM or headphone supposedly has good soundstage "depth" or has more soundstage depth than it does width. Either a) said reviewer has low standards for soundstage depth, or b) said reviewer has never heard a speaker system. Indeed, when one hears a two-channel system, the perception of depth and that elusive "third speaker" coming from the center of the stage (see phantom center) becomes readily apparent. There's simply no comparison to hearing sonic information in real-time, physically in front of you.


And there you have why I think most headphones and IEMs suck at imaging. Hopefully, this has also lent some further explanation to what is, at least in my opinion, one of the most misused and confusing words in audio. Now, that really points to a more innate problem with the audio hobby - a lack of consistency and set definitions between listeners - but it is an inevitable one that will no doubt pervade to the ends of time. At least now you know my side of the story on the matter. 

View the product ratings on Antdroid's IEM Ranking List and/or Antdroid's Headphone Ranking List


  1. It's interesting how changing freq resp between 1-5kHz changes the way image is formed in 2ch system. Sometimes half a dB here and there can make or break otherwise very good set up. The same with tonality.

  2. It is truly a well-researched content and excellent wording. I got so engaged in this material that I couldn’t wait to read. Read more info about kids headphones with cat ears. I am impressed with your work and skill. Thanks.


Post a Comment