## The Annals of Applied Probability

### The total path length of split trees

#### Abstract

We consider the model of random trees introduced by Devroye [SIAM J. Comput. 28 (1999) 409–432]. The model encompasses many important randomized algorithms and data structures. The pieces of data (items) are stored in a randomized fashion in the nodes of a tree. The total path length (sum of depths of the items) is a natural measure of the efficiency of the algorithm/data structure. Using renewal theory, we prove convergence in distribution of the total path length toward a distribution characterized uniquely by a fixed point equation. Our result covers, using a unified approach, many data structures such as binary search trees, $m$-ary search trees, quad trees, median-of-$(2k+1)$ trees, and simplex trees.

#### Article information

Source
Ann. Appl. Probab., Volume 22, Number 5 (2012), 1745-1777.

Dates
First available in Project Euclid: 12 October 2012

https://projecteuclid.org/euclid.aoap/1350067985

Digital Object Identifier
doi:10.1214/11-AAP812

Mathematical Reviews number (MathSciNet)
MR3025680

Zentralblatt MATH identifier
1254.05037

Subjects
Primary: 05C05: Trees 60C05: Combinatorial probability
Secondary: 68P05: Data structures

#### Citation

Broutin, Nicolas; Holmgren, Cecilia. The total path length of split trees. Ann. Appl. Probab. 22 (2012), no. 5, 1745--1777. doi:10.1214/11-AAP812. https://projecteuclid.org/euclid.aoap/1350067985

#### References

• [1] Asmussen, S. (2003). Applied Probability and Queues. Springer, New York.
• [2] Baeza-Yates, R. A. (1987). Some average measures in $m$-ary search trees. Inform. Process. Lett. 25 375–381.
• [3] Baker, A. (1990). Transcendental Number Theory, 2nd ed. Cambridge Univ. Press, Cambridge.
• [4] Bell, C. J. (1965). An investigation into the principles of the classification and analysis of data on an automatic digital computer. Ph.D. thesis, Leeds Univ.
• [5] Bergeron, F., Flajolet, P. and Salvy, B. (1992). Varieties of increasing trees. In CAAP’92 (Rennes, 1992). Lecture Notes in Computer Science 581 24–48. Springer, Berlin.
• [6] Broutin, N. and Devroye, L. (2006). Large deviations for the weighted height of an extended class of trees. Algorithmica 46 271–297.
• [7] Broutin, N., Devroye, L. and McLeish, E. (2008). Weighted height of random trees. Acta Inform. 45 237–277.
• [8] Broutin, N., Devroye, L., McLeish, E. and de la Salle, M. (2008). The height of increasing trees. Random Structures Algorithms 32 494–518.
• [9] Bruhn, V. (1996). Eine methode zur asymptotischen behandlung einer klasse von rekursionsgleichungen mit einer anwendung in der stochastischen analyse des quicksort-algorithmus. Ph.D. thesis, Univ. Kiel.
• [10] Chauvin, B. and Pouyanne, N. (2004). $m$-ary search trees when $m\ge27$: A strong asymptotics for the space requirements. Random Structures Algorithms 24 133–154.
• [11] Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist. 23 493–507.
• [12] Coffman, E. G. and Eve, J. (1970). File structures using hashing functions. Communications of the ACM 13 427–436.
• [13] Devroye, L. (1998). Universal limit laws for depths in random trees. SIAM J. Comput. 28 409–432.
• [14] Dobrow, R. P. and Fill, J. A. (1999). Total path length for random recursive trees. Combin. Probab. Comput. 8 317–333.
• [15] Drmota, M. (2009). The height of increasing trees. Ann. Comb. 12 373–402.
• [16] Drmota, M., Iksanov, A., Moehle, M. and Roesler, U. (2009). A limiting distribution for the number of cuts needed to isolate the root of a random recursive tree. Random Structures Algorithms 34 319–336.
• [17] Fill, J. A. and Janson, S. (2001). Approximating the limiting Quicksort distribution. Random Structures Algorithms 19 376–406.
• [18] Fill, J. A. and Janson, S. (2002). Quicksort asymptotics. J. Algorithms 44 4–28.
• [19] Finkel, R. A. and Bentley, J. L. (1974). Quad trees, a data structure for retrieval on composite keys. Acta Inform. 4 1–19.
• [20] Flajolet, P., Roux, M. and Vallée, B. (2010). Digital trees and memoryless sources: From arithmetics to analysis. In 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10). Discrete Mathematics and Theoretical Computer Science Proceedings AM 233–260. Assoc. Discrete Math. Theor. Comput. Sci., Nancy.
• [21] Fredkin, E. (1960). Trie memory. Communications of the ACM 3 490–499.
• [22] Gut, A. (2009). Stopped Random Walks: Limit Theorems and Applications, 2nd ed. Springer, New York.
• [23] Hibbard, T. N. (1962). Some combinatorial properties of certain trees with applications to searching and sorting. J. Assoc. Comput. Mach. 9 13–28.
• [24] Hoare, C. A. R. (1962). Quicksort. Comput. J. 5 10–15.
• [25] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
• [26] Holmgren, C. (2012). Novel characteristics of split trees by use of renewal theory. Electron. J. Probab. 17 1–27.
• [27] Holmgren, C. (2010). Random records and cuttings in binary search trees. Combin. Probab. Comput. 19 391–424.
• [28] Holmgren, C. (2011). A weakly 1-stable distribution for the number of random records and cuttings in split trees. Adv. in Appl. Probab. 43 151–177.
• [29] Iksanov, A. and Möhle, M. (2007). A probabilistic proof of a weak limit law for the number of cuts needed to isolate the root of a random recursive tree. Electron. Commun. Probab. 12 28–35.
• [30] Jacquet, P. and Régnier, M. (1988). Normal limiting distribution for the size and the external path length of tries. Technical Report 827, INRIA-Rocquencourt.
• [31] Janson, S. (2006). Random cutting and records in deterministic and random trees. Random Structures Algorithms 29 139–179.
• [32] Janson, S. (2010). Renewal theory for the analysis of tries and strings: Extended abstract. In Proceedings of the International Conference on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA). Discrete Mathematics and Theoretical Computer Science Proceedings AM 427–438. Assoc. Discrete Math. Theor. Comput. Sci., Nancy.
• [33] Janson, S., Łuczak, T. and Rucinski, A. (2000). Random Graphs. Wiley, New York.
• [34] Kirschenhofer, P., Prodinger, H. and Szpankowski, W. (1989). On the variance of the external path length in a symmetric digital trie. Discrete Appl. Math. 25 129–143.
• [35] Kirschenhofer, P., Prodinger, H. and Szpankowski, W. (1994). Digital search trees again revisited: The internal path length perspective. SIAM J. Comput. 23 598–616.
• [36] Knuth, D. E. (1973). The Art of Computer Programming: Sorting and Searching, Vol. 3. Addison-Wesley, Reading, MA.
• [37] Lorden, G. (1970). On excess over the boundary. Ann. Math. Statist. 41 520–527.
• [38] Mahmoud, H. M. (1991). Limiting distributions for path lengths in recursive trees. Probab. Engrg. Inform. Sci. 5 53–59.
• [39] Mahmoud, H. M. and Pittel, B. (1989). Analysis of the space of search trees under the random insertion algorithm. J. Algorithms 10 52–75.
• [40] Meir, A. and Moon, J. W. (1970). Cutting down random trees. J. Aust. Math. Soc. 11 313–324.
• [41] Meir, A. and Moon, J. W. (1974). Cutting down recursive trees. Math. Biosci. 21 173–181.
• [42] Mohamed, H. and Robert, P. (2005). A probabilistic analysis of some tree algorithms. Ann. Appl. Probab. 15 2445–2471.
• [43] Mohamed, H. and Robert, P. (2010). Dynamic tree algorithms. Ann. Appl. Probab. 20 26–51.
• [44] Munsonius, G. O. (2011). On the asymptotic internal path length and the asymptotic Wiener index of random split trees. Electron. J. Probab. 16 1020–1047.
• [45] Neininger, R. and Rüschendorf, L. (1999). On the internal path length of $d$-dimensional quad trees. Random Structures Algorithms 15 25–41.
• [46] Neininger, R. and Rüschendorf, L. (2004). A general limit theorem for recursive algorithms and combinatorial structures. Ann. Appl. Probab. 14 378–418.
• [47] Pyke, R. (1965). Spacings (with discussion). J. Roy. Statist. Soc. Ser. B 27 395–449.
• [48] Rachev, S. T. and Rüschendorf, L. (1995). Probability metrics and recursive algorithms. Adv. in Appl. Probab. 27 770–799.
• [49] Régnier, M. (1989). A limiting distribution for quicksort. RAIRO Inform. Théor. Appl. 23 335–343.
• [50] Rösler, U. (1991). A limit theorem for “Quicksort”. RAIRO Inform. Théor. Appl. 25 85–100.
• [51] Rösler, U. (1992). A fixed point theorem for distributions. Stochastic Process. Appl. 42 195–214.
• [52] Rösler, U. (2001). On the analysis of stochastic divide and conquer algorithms. Algorithmica 29 238–261.
• [53] Schachinger, W. (2004). Concentration of size and path length of tries. Combin. Probab. Comput. 13 763–793.
• [54] Szpankowski, W. (2001). Average Case Analysis of Algorithms on Sequences. Wiley, New York.
• [55] Tan, K. H. and Hadjicostas, P. (1995). Some properties of a limiting distribution in Quicksort. Statist. Probab. Lett. 25 87–94.