While reading a book last summer, one curiosity came into my mind: what was the frequency in which different letters appeared in a text.
So I decided to count them… not only that, but I also wanted to compare the results between Spanish and English language.
The texts I used are both from this blog, the latest one in English from yesterday’s post and one about housing in Spanish from four months ago.
Here is the graphic with the results:
Even though these are just two posts of my blog and thus not a statistically relevant sample to allow for any definite conclusion, these are the results that I got:
- In English the most common letter is “e” (13.5%), while in Spanish is “a” (12.5%) closely followed by “e” (12.2%).
- In Spanish vowels account for 46.3% of the speech while in English they are 39.9%; more consonants in English.
- In English the most common consonant is “t” (9.8%… from “the”?) followed by “s”, while in Spanish is “s” (7.4%… “ser”?) followed by “n”.
- The most striking differences I found were:
- In Spanish we use “a” over 50% more often than in English.
- Some of the consonants we use way more in Spanish than in English are “d” (“de”?), “l” (“la”, “el”?) and “q” (“que”?).
- Some of the consonants that are used more in English than in Spanish: “h” (5 times more!… “the”?), “t” and “w” (practically not used in Spanish).
12 responses to “Most common letters in English and Spanish are…”
Las vocales se supone que son nexos, y las consonantes necesitan vocales cerca. Hay muchas menos vocales que consonantes, así que uno esperaría ver las primeras mucho más arriba. Interesante pues ver que la s, por ejemplo, es más común que la i…
Cuando terminé mi tesis hice algo parecido, pero con palabras en vez de letras:
Una de las cosas más interesantes fue ver la unicidad de las palabras. Me refiero a ver cuántas veces aparecen repetidas las palabras. En mi tesis sólo el 3% estaban una vez. Lo que indica que el inglés tiene muchas construciones obligatorias que repiten contenido.
Sería interesante ver cómo comparan con los idiomas.
I just realized I made the comment in Spanish… Sorry!
I justs says that it is interesting that vowels are supposed to be the glue to put consonants together, as these don’t like to be together, usually.
Therefore one would expect vowels way higher than consonants. It actually happens that s is more common than i…
Reading now Luca’s response it makes sense to see the h so high up in English.
ns seem to be the highest consonants in spanish above i
sh and ns seems to be the ones in English above i.
Bonus: ‘aseo’ uses the most common ones in spanish. In English, ‘ateo’
Bruno, I enjoyed very much your post about the PhD! You didn’t leave anything without measuring it.
Hahaha very insightful Bruno, I wonder if it reflects on respective national character 😉
In English there are more examples of two consonants together which are really just one sound, like ‘ch’, ‘th’, ‘sh’, ‘ng’. Maybe the results between English and Spanish would be more similar if you count these as just one consonant?
I think the lack of a nice, round, sunny and satisfying ‘a’ sound in English is what will make Spanish win in the end ;).
Back in 1903, Arthur Conan Doyle made Sherlock Holmes state that the E is the most common letter in English:
Thanks for the link, David.
Those are more accurate frequencies, after measuring 40,000 words (the texts I used had between 500 and 700 hundred). Nevertheless, the spectrum of frequencies is not that different.
interesting analysis and I am surprised even at how high the correlation is. If we consider the correlation between French and English I am sure that it would be even higher. Something like 60% of the words in English are coming from French. This comes from the last time Britain was invaded in 1066 by Guillaume (William the Conquerer – the Norman Conquests) and up until the end of the 14th century norman french was the language used by the ruling classes in England.
However, the basis of English comes from old germanic languages and, as a result, of the top 100 words used in English only 2 are derived from french – or latin. It is the latin link that provides the similarity between English and Spanish reflected in your analysis.
Despite this high percentage of latin-derived words in English, interestingly of the 100 words most commonly used only 2 come from french (latin or norman french); “number” around 60 in the list and “people” (or in some lists “person”) between 70 and 100 in the list -although the list does differ according to the origin of the database e.g. if it is US english based.
Thanks for the interesting historical point, Geoff. As you said, indeed it’s surprising that only 2 words are amongst the 100 most used if 60% of the words come from French.
About the similarity in the letters… as my partner said sometime ago during some holidays in Oslo when I caught her reading some Norwegian paper: “it’s all Indo-European… it’s not that different” (she’s native/fluent/acquainted with 5 languages, though not Norwegian, yet).
Pingback: Writing without using “e” « The Blog by Javier
Very interesting points. I was wondering, is there any information about the frequency of the first letters of a word in the Spanish language?