Vol. 56 No. 3 (2008)
Research Article

Name Clustering on the Basis of Parental Preferences

Published 2008-09-01



Parents do not choose first names for their children at random. Using two large datasets, for the UK and the Netherlands, covering the names of children born in the same family over a period of two decades, this paper seeks to identify clusters of names entirely inferred from common parental naming preferences. These name groups can be considered as coherent sets of names that have a high probability to be found in the same family. Operational measures for the statistical association between names and clusters are developed, as well as a two-stage clustering technique. The name groups are subsequently merged into a limited set of grand clusters. The results show that clusters emerge with cultural, linguistic, or ethnic parental backgrounds, but also along characteristics inherent in names, such as clusters of names after flowers and gems for girls, abbreviated names for boys, or names ending in –y or -ie.


  1. Bloothooft, Gerrit, 2001. ‘Naming in Dutch Families Between 1983 and 1999,’ Naamkunde, 33: 1–47 (in Dutch).
  2. Bloothooft, Gerrit, 2002. ‘Naming and Subcultures in The Netherlands,’ Proceedings of the International Conference of Onomastic Sciences, ed. E. Brylla and M. Wahlberg, Uppsala, Part 2, 53–62 (published 2006).
  3. Bloothooft, Gerrit, van Nifterick, Emma, and Gerritzen, Doreen, 2004. On First Names — How The Netherlands Gets its First Names, Utrecht: Het Spectrum (in Dutch).
  4. Everitt, Brian, Landau, Sabine, and Leese, Morven, 2001. Cluster Analysis, Oxford: Oxford University Press.
  5. Galbi, Douglas A., 2002. ‘Long-Term Trends in Personal Given Name Frequencies in England and Wales,’ Working paper version 1.1, Federal Communications Commission, Washington DC, USA.
  6. Fryer, Roland G., and Levitt, Steven D., 2004. ‘The Causes and Consequences of Distinctively Black Names,; The Quarterly Journal of Economics, 119(3): 767–805.
  7. Harding, Seeromanie, Dews, Howard, and Ludi Simpson, Stephen, 1999. ‘The Potential to Identify South Asians Using a Computerised Algorithm to Classify Names,’ Population Trends, 97: 46–49.
  8. Gorsuch, Richard L., 1983. Factor Analysis, New Jersey: Erlbaum.
  9. Lauerdale, Diane S., and Kerstenbaum, Bert, 2000. ‘Asian American Ethnic Identification by Surname,’ Population Research and Policy Review, 19: 283–300.
  10. Lloyd, Daryl A., Webber, Richard, and Longley, Paul A., 2004. ‘Surnames as Quantitative Evidence Resource for the Social Sciences,’ http://www.casa.ucl.ac.uk/surnames/papers.htm
  11. Mateos, Pablo, Webber, Richard, and Longley, Paul, 2007. ‘The Cultural, Ethnic, and Linguistic Classification of Populations and Neighbourhoods Using Personal Names,’ CASA working paper 116, http://www.casa.ucl.ac.uk/working_papers/paper116.pdf
  12. Mateos, Pablo, 2003. ‘A Review of Name-based Ethnicity Classification Methods and Their Potential in Population Studies,’ Population, Space and Place, 13: 243–263.
  13. Tucker, D. Kenneth, 2003. ‘Surnames, Forenames, and Correlations,’ Dictionary of American Family Names, ed. P. Hanks, New York: Oxford University Press, xxiii–xxvii.