## The Annals of Applied Probability

### Optimal mean-based algorithms for trace reconstruction

#### Abstract

In the (deletion-channel) trace reconstruction problem, there is an unknown $n$-bit source string $x$. An algorithm is given access to independent traces of $x$, where a trace is formed by deleting each bit of $x$ independently with probability $\delta$. The goal of the algorithm is to recover $x$ exactly (with high probability), while minimizing samples (number of traces) and running time.

Previously, the best known algorithm for the trace reconstruction problem was due to Holenstein et al. [in Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms 389–398 (2008) ACM]; it uses $\exp(\widetilde{O}(n^{1/2}))$ samples and running time for any fixed $0<\delta<1$. It is also what we call a “mean-based algorithm,” meaning that it only uses the empirical means of the individual bits of the traces. Holenstein et al. also gave a lower bound, showing that any mean-based algorithm must use at least $n^{\widetilde{\Omega}(\log n)}$ samples.

In this paper, we improve both of these results, obtaining matching upper and lower bounds for mean-based trace reconstruction. For any constant deletion rate $0<\delta<1$, we give a mean-based algorithm that uses $\exp(O(n^{1/3}))$ time and traces; we also prove that any mean-based algorithm must use at least $\exp(\Omega(n^{1/3}))$ traces. In fact, we obtain matching upper and lower bounds even for $\delta$ subconstant and $\rho\:=1-\delta$ subconstant: when $(\log^{3}n)/n\ll\delta\leq1/2$ the bound is $\exp(-\Theta(\delta n)^{1/3})$, and when $1/\sqrt{n}\ll\rho\leq1/2$ the bound is $\exp(-\Theta(n/\rho)^{1/3})$.

Our proofs involve estimates for the maxima of Littlewood polynomials on complex disks. We show that these techniques can also be used to perform trace reconstruction with random insertions and bit-flips in addition to deletions. We also find a surprising result: for deletion probabilities $\delta>1/2$, the presence of insertions can actually help with trace reconstruction.

#### Article information

Source
Ann. Appl. Probab., Volume 29, Number 2 (2019), 851-874.

Dates
Revised: April 2018
First available in Project Euclid: 24 January 2019

https://projecteuclid.org/euclid.aoap/1548298932

Digital Object Identifier
doi:10.1214/18-AAP1394

Mathematical Reviews number (MathSciNet)
MR3910019

Zentralblatt MATH identifier
07047440

#### Citation

De, Anindya; O’Donnell, Ryan; Servedio, Rocco A. Optimal mean-based algorithms for trace reconstruction. Ann. Appl. Probab. 29 (2019), no. 2, 851--874. doi:10.1214/18-AAP1394. https://projecteuclid.org/euclid.aoap/1548298932

#### References

• [1] Andoni, A., Daskalakis, C., Hassidim, A. and Roch, S. (2012). Global alignment of molecular sequences via ancestral state reconstruction. Stochastic Process. Appl. 122 3852–3874.
• [2] Batu, T., Kannan, S., Khanna, S. and McGregor, A. (2004). Reconstructing strings from random traces. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms 910–918. ACM, New York.
• [3] Borwein, P. and Erdélyi, T. (1997). Littlewood-type problems on subarcs of the unit circle. Indiana Univ. Math. J. 46 1323–1346.
• [4] Borwein, P., Erdélyi, T. and Kós, G. (1999). Littlewood-type problems on $[0,1]$. Proc. Lond. Math. Soc. (3) 79 22–46.
• [5] Choffrut, C. and Karhumäki, J. (1997). Combinatorics of words. In Handbook of Formal Languages, Vol. 1 329–438. Springer, Berlin.
• [6] De Anindya, A., O’Donnell, R. and Servedio, R. A. (2017). Optimal mean-based algorithms for trace reconstruction. In STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing 1047–1056. ACM, New York.
• [7] Dudík, M. and Schulman, L. J. (2003). Reconstruction from subsequences. J. Combin. Theory Ser. A 103 337–348.
• [8] Holenstein, T., Mitzenmacher, M., Panigrahy, R. and Wieder, U. (2008). Trace reconstruction with constant deletion probability and related results. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms 389–398. ACM, New York.
• [9] Janson, S. (2014). Tail bounds for sums of geometric and exponential variables. Available at http://www2.math.uu.se/~svante/papers/sjN14.pdf.
• [10] Kalashnik, V. V. (1973). Reconstruction of a word from its fragments. Computational Mathematics and Computer Science (Vychislitel’naya Matematika i Vychislitel’naya Tekhnika), Kharkov 4 56–57.
• [11] Kannan, S. and McGregor, A. (2005). More on reconstructing strings from random traces: Insertions and deletions. In IEEE International Symposium on Information Theory 297–301.
• [12] Krasikov, I. and Roditty, Y. (1997). On a reconstruction problem for sequences. J. Combin. Theory Ser. A 77 344–348.
• [13] Levenshtein, V. I. (2001). Efficient reconstruction of sequences from their subsequences or supersequences. J. Combin. Theory Ser. A 93 310–332.
• [14] Levenshtein, V. I. (2001). Efficient reconstruction of sequences. IEEE Trans. Inform. Theory 47 2–22.
• [15] Manvel, B., Meyerowitz, A., Schwenk, A., Smith, K. and Stockmeyer, P. (1991). Reconstruction of sequences. Discrete Math. 94 209–219.
• [16] McGregor, A., Price, E. and Vorotnikova, S. (2014). Trace reconstruction revisited. In Algorithms—ESA 2014. Lecture Notes in Computer Science 8737 689–700. Springer, Heidelberg.
• [17] Mitzenmacher, M. (2009). A survey of results for deletion channels and related synchronization channels. Probab. Surv. 6 1–33.
• [18] Mossel, E. (2016). Personal communication, October 2016.
• [19] Mossel, E. (2013). MSRI open problem session. Available at https://www.msri.org/c/document_library/get_file?uuid=4a885484-bcdd-4238-a3da-21c05713034c&groupId=14404.
• [20] Nazarov, F. and Peres, Y. (2017). Trace reconstruction with $\exp(O(n^{1/3}))$ samples. In STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing 1042–1046. ACM, New York.
• [21] Peres, Y. and Zhai, A. (2017). Average-case reconstruction for the deletion channel: Subpolynomially many traces suffice. In 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017 228–239. IEEE Computer Soc., Los Alamitos, CA.
• [22] Scott, A. D. (1997). Reconstructing sequences. Discrete Math. 175 231–238.
• [23] Smyth, C. (2008). The Mahler measure of algebraic numbers: A survey. In Number Theory and Polynomials. London Mathematical Society Lecture Note Series 352 322–349. Cambridge Univ. Press, Cambridge.
• [24] Viswanathan, K. and Swaminathan, R. (2008). Improved string reconstruction over insertion-deletion channels. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms 399–408. ACM, New York.