An Experiment in English Phonology

We normally think of English as having voiceless stops, /p t k/ and voiced stops, /b d g/. Voiceless stops have an aspirated allophone [p_h t_h k_h] that appears in certain places, particularly when the stop appears at the beginning of a syllable. Aspiration, in this model of English pronunciation, is a redundant secondary feature of voiceless stops.

However, on the Conlang Mailing List, And Rosta proposed an alternative model. His idea is that English actually has aspirated stops /p_h t_h k_h/ and unaspirated stops /b d g/. Voicing would be a redundant secondary feature of unaspirated stops - the [t] in "stop" would actually be a /d/ that's lost its voice. And put forward a number of interesting theoretical arguments for this, but I thought that it needed to be tested experimentally.

If the standard interpretation of English phonology is correct, English speakers should find voiced and voiceless stops easier to tell apart than aspirated and unaspirated stops. In And's interpretation, they should find aspirated and unaspirated stops easier to tell apart than voiced and voiceless.

Bengali distinguishes between plain (voiceless, unaspirated), apsirated, voiced and voiced aspirated (breathy voiced or murmured) stops. I asked a Bengali speaker to record a sample of 20 words which you can listen to here and transcribe them in CXS. I then asked three volunteers from the Conlang Mailing list, all of whom were monoglot English speakers, to listen to the recording, and trascribe their first impression of what they heard. Here are the results.

OriginalListener 1Listener 2Listener 3

For each stop and affricate in the sample, I then recorded which of the four categories it fell into, and how the volunteers identified it. The results are as follows.
Heard as
ActualPlainAspiratedVoicedVoiced aspiratedOther
Voiced aspirated32001

From this we can see that English speakers correctly identify plain stops as voiceless 70% of the time. They almost always identify voiceless stops as plain, whether or not they are aspirated. Aspirated voiceless stops are always identified as voiceless. Voiced stops are almost always correctly identified, and voiced aspirated stops, which are alien to English, are never correctly identified.

These results are more consistent with voicing being the primary feature than aspiration. Sorry, And.


  1. I don't have any argument with your results - interesting study - and I don't mean to sidetrack this discussion, but you seem to know what you are talking about when it comes to such topics, so..

    Not being well versed in linguistics myself, I would like to ask your opinion on something not exactly related to the OP. In my conlang, I have consonants paired as "mouthy"/"throaty" - mostly this comes down to voicing, but not always. I have one drifter, though, that I have trouble placing, viz. the dental nonsibilant fricative.

    It is my feeling that English speakers, at least, overall are not terribly careful distinguishing "th" voicing during the process of articulating the voiced version of this sound, and that there is a common-sense reason for this. Mainly, it is because it takes deep voicing and significant airflow, so it seems to me, anyway, that the air tends to come out before the voicing oftentimes.

    This seems to provide a small reinforcement to the fact that there was not really a good "place" for a voiced/unvoiced pair for this sound in my phoneme inventory. Instead, "it" (whatever it may be) seems to me to pair better with /h/.

    So, what I'm getting at is this:

    *I think I need some sort of voiced version

    *I want it to be mostly "mouthy" if possible

    *It seems not too difficult to vibrate the tongue in the mouth instead of only vibrating the throat to achieve this "mouthiness", but I have no idea what phonetic symbol this would be or whether this would be practical sound to use as the main "th" sound in a simple human language

    Any hints? Does any of this make sense?

  2. You know, talking about this out loud, I'm thinking I should go back to my original plan, which was just to use the voiceless "th".

    The reason I wanted to get away from it, if I remember correctly, is because it just seemed to take too much "preparation" as the initial sound, but it just works so well as a final or connecting sound, it's hard to give up.

    Okay, I'm relinquishing your comments section :). If you happen to have some input on this, I'm still all ears, though. Thank you.

  3. ['anonymous' = And]

    This is interesting, loath though I am to be dragged from my linguistics armchair by the fell legions of empiricism...

    The hypothesis your experiment is based on is that the allophony of English phonological categories will affect subjects' ability to discriminate among phonetic categories, and hence that one can reason backwards from their ability to discriminate among phonetic categories to draw conclusions about the allophony of English phonological categories. I think that hypothesis is in fact pretty plausible, but the evidence it brings to bear on English phonology is pretty indirect. A more direct sort of evidence would have been gained if you'd recorded the Bengali speaker reading Bengali words that have English-like phonotactics and then presented them to the subjects as 'English nonsense words', and asked the subjects to spell the nonsense words.

    Aspiration is reliably encountered in the realization of /p,t,k/ only in (roughly speaking) the onsets of stressed syllables; elsewhere, there are other phonetic cues, such as duration of the preceding vowel. So even within your current experimental methodology, I think you should confine your attention to onsets of stressed syllables. In your results we see initial kh heard as k and kh, ph heard as p, p heard as b by all, b heard as b, t heard as d(`) by two and as t by one. There's also an initial k heard as d by two and as t by one. So of the 3 tokens of word-initial voiceless unaspirated, 7/9 were heard as voiced, i.e. as 'my theory' would predict.

  4. On the other hand, why would a phonemic feature be more restricted in its distribution than a merely phonetic one?