Vishnu's Pages

50% of Roger Federer is "er"

TL;DR for busy readers: A meme about Roger Federer led to a coding challenge: find the most repeated substring in a string. This post explores an elegant solution using Kotlin's windowed() function to create sliding windows of text and groupingBy().eachCount() to count their frequencies.


Recently I got this meme on WhatsApp:

Meme: 50% of Roger Federer's name is er

I laughed for a while. Then I thought — how can I find the most repeated pattern programmatically for a given name (or any text, for that matter)? I've been looking for ideas for my "One Solution Every Week" challenge and decided to take this up for that week.

The Algorithm

Initially my thought process was this:

  1. Given the text of length N, remove whitespace and lowercase it.
  2. Generate all possible tokens having length of 2, 3, 4... up to N/2.
  3. For every token in this list, find the number of occurrences in the text and put the count in a Map.
  4. Pick the token with the highest count from the Map.
  5. Done.

But something felt off. For such a simple problem, this is too many operations. It's not efficient. There should be a simpler way.

Then it clicked — I don't have to search for the number of occurrences of each token inside the original text; I can just count how many times each token is generated.

For example, for "rogerfederer" (cleaned and lowercase), the generated 2-character tokens would be:

[ro, og, ge, er, rf, fe, ed, de, er, re, er]

Here I don't have to search the occurrence of each token inside the original text. I just count how many times each token shows up in this list and put it in a Map like this:

[
  ro = 1, 
  og = 1, 
  ge = 1, 
  er = 3, 
  rf = 1, 
  fe = 1, 
  ed = 1, 
  de = 1, 
  re = 1
]

Here the most repeated token is "er" and it appears 3 times, that is 50% of rogerfederer.

I wrote Kotlin code for this. Manually generated token using loops, counted them using maps, and it worked like a charm.

Again, I had a gut feeling — there should be standard library function for these operations in Kotlin. And yes, there are.

The Windowing and Grouping Functions in Kotlin

The windowed() function uses a sliding window to generate substrings of a given length. Perfect for creating the tokens we need.

Here's a sample code:

fun main() { val text = "rogerfederer" val n = text.length for (i in 2..n / 2) { val tokens = text.windowed(i) println("Tokens of length $i: $tokens") } }

Once we have a list of tokens, we need to find the number of occurrences of each token. One approach is to find the distinct tokens and count them manually into a Map.

But Kotlin provides a more elegant way using groupingBy(). Please note, groupingBy() doesn't return a Map; it returns a Grouping object that lets you run operations like counting via the eachCount() function.

Important: Not to be confused with groupBy(), which groups items by a key and returns a Map.

Here's sample code that uses groupingBy() and eachCount():

fun main() { val text = "rogerfederer" val n = text.length for (i in 2..n / 2) { val tokens = text.windowed(i) val tokenCounts = tokens.groupingBy { it }.eachCount() println("Token counts for length $i: $tokenCounts") } }

Once we have the counts, calculating the percentage is simple. See the final code here.


So, remember next time: There's a stdlib function for that.™