Problem 1.1

I’m preparing to go back on the software job market, and to sharpen my skills I’m going back through books I initially worked through as a novice. I’ve got over a decade more experience now, and so I want to approach these problems as a seasoned developer – to try and find something more interesting than just solutions to problems that could be uncovered in half an hour.

To begin, I’m starting with Cracking the Coding Interview, by Gayle Laakmann McDowell.

“Problem 1.1 Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures?”

The first part of this question is straightforward to do with a hash map. In fact, it’s so straightforward that generative AI can get you most of the way towards the answer.

// Solution1A is a straightforward attempt to solve the problem with a hashmap.
func Solution1A(s string) bool {  
  seen := make(map[rune]bool)  
  for _, r := range s {  
   if seen[r] {  
    return false  
   }  
   seen[r] = true  
  }  
  return true  
}

Benchmarking

This wouldn’t be a fun exploration of the problem if we didn’t include some benchmarks. Suppose this uniqueness-detecting algorithm were in a location critical to performance. How well do we do, and can we do any better?

So we need some test data. Ideally test data for benchmarking reflects actual call patterns. Since the input data are strings, the main properties of the input that matter for performance are:

the distribution of string lengths,
the set of possible characters, and
the distribution of characters.

Possible length distributions:

All the same
Uniform in a range
Poisson distribution (English word lengths appear to have this distribution)
1. https://www.sciencedirect.com/science/article/abs/pii/0378375886901692

Possible character sets:

All-caps letters (26 possibilities)
Upper and lowercase letters (52 possibilities)
All printable ASCII (95 possibilities)
All Unicode characters as of Unicode 16.0 (155,063 possibilities)

Possible character distributions:

Uniform distribution
Pareto distribution (similar to actual text?)

Like with any good optimization problem, different choices here may result in different optimal solutions. Some may even completely eliminate certain solutions.

To start, we’ll go with:

All strings are of length 12
All characters are printable ASCII
All characters are equally likely

Twelve is a nice number of characters to begin with as approximately half of all strings will have a repeated character. Using the birthday problem formula, we can see:

95!
---        x 95^12 ~= 0.484
(95-12)!

For the benchmark we’ll generate 1,024 random strings, and cycle through evaluating them.

const (  
  n     = 1 << 10  
  nMask = n - 1  
)  

func BenchmarkSolution1(b *testing.B) {  
  rng := rand.New(rand.NewSource(time.Now().UnixNano()))  

  randomStrings := make([]string, n)  
  for i := 0; i < n; i++ {  
   randomStrings[i] = RandomString(rng, 12, RandomASCII)  
  }  

  for solution, f := range solutions {  
   b.Run(solution, func(b *testing.B) {  
    for i := 0; i < b.N; i++ {  
     f(randomStrings[i&nMask])  
    }  
   })  
  }  
}

Hash Map Solution

As it turns out, while our initial solution works, it’s pretty slow for what it does:

BenchmarkSolution1/Solution1_A
BenchmarkSolution1/Solution1_A-64              1958344           602.7 ns/op

So … why is it slow? For this, we turn to the handy builtin profiler that allows is automatically enabled during testing.

go test ./chapter1/ -run=NONE -bench=BenchmarkSolution1/Solution1A -cpuprofile=cpu.out

The big things to notice here are:

Only 89% of the time is Solution1A running (chapter1.Solution1A)
68% of the time is creating new entries in the map (runtime.mapassign_fast32)
16% of the time is accessing map entries (`runtime.mapaccess1_fast32)
2% of the time is unavoidable runtime calls (runtime.rand)

Note that the call to runtime.rand is not related to the benchmark setup code – you can see this for yourself if you remove the random string generation, the call is still present in the profile.

So in total, 84%/89%, or 94% of our time is spent dealing with the hash map. Even if we made all other parts of our logic free – taking zero time – at best we could save 36ns out of over 600ns. If there is an improvement, it’s either in how we’re using the hash map or just the fact that we are using a hash map.

For better ways of using the hash map – we could try reducing the size of the keys, or preallocate the map so it doesn’t ever need to be reallocated. In the first case, the idea is to reduce the size of the map’s keys, possibly saving time in memory allocation and in the runtime of the hash algorithm. For the latter we preallocate the map at size 12 (which it should never grow beyond, as strings have exactly 12 characters), saving that time. Implemented:

// Solution1A1 is a slight modification to Solution1A where runes are reduced to
// bytes as all are assumed to be ASCII.  
func Solution1A1(s string) bool {  
  seen := make(map[byte]bool)  
  for _, r := range s {  
   if seen[byte(r)] {  
    return false  
   }  
   seen[byte(r)] = true  
  }  
  return true  
}  

// Solution1A2 is a slight modification to Solution1A where maps are initialized
// to size 12 to avoid needing to reallocate the underlying array.  
func Solution1A2(s string) bool {  
  seen := make(map[rune]bool, 12)  
  for _, r := range s {  
   if seen[r] {  
    return false  
   }  
   seen[r] = true  
  }  
  return true  
}

Trying these with our benchmark:

BenchmarkSolution1/Solution1A-64              2068266           617.7 ns/op
BenchmarkSolution1/Solution1A1-64             1332189           939.7 ns/op
BenchmarkSolution1/Solution1A2-64             2088864           572.0 ns/op

We can see that there is a notable regression for converting runes to bytes, and a slight improvement for avoiding reallocating the map. (I tried other values, such as 11, and saw no further improvement)

There could be other improvements using Go’s native hash map, but I’m not aware of them.

Array Solution

Notice that we have a fairly small number of possible characters – 95. Also notice that their byte values range from 32 to 126, all under 128. Thus, if we initialized a bit array of length 128, we could store in that whether we’ve seen a character corresponding to each value.

// Solution1B is like the above but uses a bit array.
func Solution1B(s string) bool {  
  seen := make([]bool, 1<<7)  
  for _, r := range s {  
   if seen[r] {  
    return false  
   }  
   seen[r] = true  
  }  
  return true  
}

We can immediately see a drastic improvement:

BenchmarkSolution1/Solution1A-64              1892926           626.3 ns/op
BenchmarkSolution1/Solution1A1-64             1231104           948.8 ns/op
BenchmarkSolution1/Solution1A2-64             2023808           573.0 ns/op
BenchmarkSolution1/Solution1B-64             82965651            13.96 ns/op

Our bit array is about 50x faster than our hash map solution. Looking at the CPU profile, we can see that nearly all of the time is spent in the solution logic (0.94s of 0.96s) – any potential gains are likely in the code of that function.

But what gains are even possible?

One possibility is avoiding the heap allocation incurred by defining a slice instead of using an array directly.

// Solution1B1 is like the above but uses an array directly instead of a slice.
func Solution1B1(s string) bool {  
  var seen [1 << 7]bool  
  for _, r := range s {  
   if seen[r] {  
    return false  
   }  
   seen[r] = true  
  }  
  return true  
}

This results in a slight (but reliable) slowdown:

BenchmarkSolution1/Solution1B-64             89289081            13.75 ns/op
BenchmarkSolution1/Solution1B1-64            84424354            13.90 ns/op

Another possibility is to avoid allocating the slice with each call, instead copying empty data into it.

var (  
  seenC  = make([]bool, 1<<7)  
  seenC2 = make([]bool, 1<<7)  
)  

// Solution1B2 is like the above but uses an array directly instead of a slice.
func Solution1B2(s string) bool {  
  copy(seenC, seenC2)  
  for _, r := range s {  
   if seenC[r] {  
    return false  
   }  
   seenC[r] = true  
  }  
  return true  
}

This option also results in a slowdown, larger than before:

BenchmarkSolution1/Solution1B-64             89082355            13.53 ns/op
BenchmarkSolution1/Solution1B1-64            88801987            13.77 ns/op
BenchmarkSolution1/Solution1B2-64            76817073            15.82 ns/op

Alternatively, we could even hard code the number of iterations (since we know every string is of length 12). The idea here is that it avoids a quirk of how Go iterates over strings.

// Solution1B3 is like 1B but iterates a static number of times.
func Solution1B3(s string) bool {  
  seen := make([]bool, 1<<7)  
  for i := 0; i < 12; i++ {  
   r := s[i]  
   if seen[r] {  
    return false  
   }  
   seen[r] = true  
  }  
  return true  
}

This does result in a noticeable speedup, at the cost of making the code more brittle. It isn’t obvious to me why there is a speedup. It could be that creating the (index, rune) tuple is expensive, or it could have to do with how Go checks if it is done iterating.

BenchmarkSolution1/Solution1B-64             85514907            14.02 ns/op
BenchmarkSolution1/Solution1B1-64            83047393            14.11 ns/op
BenchmarkSolution1/Solution1B2-64            74333500            16.27 ns/op
BenchmarkSolution1/Solution1B3-64            97826607            12.03 ns/op

However, we don’t actually have to make the code brittle to realize the same gain.

// Solution1B4 is like 1B but uses the string length to set the number of iterations.
func Solution1B4(s string) bool {  
  seen := make([]bool, 1<<7)  
  for i := 0; i < len(s); i++ {  
   r := s[i]  
   if seen[r] {  
    return false  
   }  
   seen[r] = true  
  }  
  return true  
}

In fact, this is even faster than the hard coded length. I suspect there’s a compiler optimization, since it would seem like recalculating the length of s every iteration would take longer than just comparing to a compile-time constant.

BenchmarkSolution1/Solution1B-64             86110658            14.15 ns/op
BenchmarkSolution1/Solution1B1-64            86076312            14.01 ns/op
BenchmarkSolution1/Solution1B2-64            72450708            15.67 ns/op
BenchmarkSolution1/Solution1B3-64            99203356            11.65 ns/op
BenchmarkSolution1/Solution1B4-64            104699756           11.10 ns/op

That’s about as far as I want to take the bit array solution. What about the other part of the question – the part I’ve ignored until now?

No Additional Data Structures

“What if you cannot use additional data structures?“

For Go in particular, this is tricky as strings are immutable. This means that a natural solution – to sort the characters of the string and then check for duplicates – isn’t possible.

If you iterate through a string, you’ll notice that you get the 4-byte rune type rather than the 1-byte byte type you get by indexing a string. In this case, it doesn’t matter as for now we’re assuming all characters are ASCII. Which means we can just index the string, even though that would be problematic if the system we’re designing for might ever encounter non-ASCII.

// Solution1C avoids using other data structures by directly comarping pairs of characters.
func Solution1C(s string) bool {  
  for i := 0; i < len(s); i++ {  
   for j := i + 1; j < len(s); j++ {  
    if s[i] == s[j] {  
     return false  
    }  
   }  
  }  
  
  return true  
}

Surprisingly this solution isn’t terrible compared to the other solutions:

BenchmarkSolution1/Solution1B4-64 103588170 11.66 ns/op
BenchmarkSolution1/Solution1C-64 39652282 28.48 ns/op

In fact, this solution *always* beats the hash map solution, despite technically having O(n^2) complexity. At 100-character strings:

BenchmarkSolution1/Solution1A2-64 1279503 940.2 ns/op
BenchmarkSolution1/Solution1B4-64 84820020 13.66 ns/op
BenchmarkSolution1/Solution1C-64 31667222 37.01 ns/op

At 1,000-character strings:

BenchmarkSolution1/Solution1A2-64 1247774 958.0 ns/op
BenchmarkSolution1/Solution1B4-64 79115478 15.28 ns/op
BenchmarkSolution1/Solution1C-64 31556229 38.59 ns/op

This is because of a fundamental limit of our problem – there are at most 95 unique characters, and so despite being an O(n^2) solution, comparing every pair of characters is faster as we exit early.

Pareto Distribution

Suppose characters occur with a frequency following a Pareto distribution. There’s a lot of debate on how to model this properly, so here I’ll pick parameters that seem to fit a letter distribution chart. What matters more is seeing what happens when there’s a small number of characters that are chosen more often than others, but still a good number of the less frequent. If you’re interested in replicating, I’ve gone with X_m=3.0 and alpha=0.45, then offset the results by -3. The results aren’t too awful and result in a more long tail distribution than actual letters, which means any measurements here are likely to be overestimates.

Perhaps unsurprisingly, all solutions perform better in this case. Given that some characters are much more common, there’s a significant likelihood of a common character appearing as the first or second character, and then again later in the string – so generally the logic allowing the algorithms to exit early triggers sooner than if characters are distributed uniformly.

BenchmarkSolution1/Solution1A2-64             2115301           563.4 ns/op
BenchmarkSolution1/Solution1B4-64            106364142           11.02 ns/op
BenchmarkSolution1/Solution1C-64             38778128            29.50 ns/op
BenchmarkSolution1P/Solution1A2-64            3728235           327.9 ns/op
BenchmarkSolution1P/Solution1B4-64           193359403            5.491 ns/op
BenchmarkSolution1P/Solution1C-64            226532450            5.123 ns/op

As a surprise, the apparently O(n^2) solution is actually the fastest compared to the others when given data that better approximates actual text! This should reinforce both that:

the time-complexity of an algorithm should not be the sole determiner of which algorithm is best, and
measuring, especially with simulations of real world data, can change which algorithm is best for an application.

In Context

So we’re evaluating these strings for repeated characters, but where do these strings come from? At 5 or 11 ns/op, how likely is this the bottleneck?

Let’s hand wave away any sort of parsing (e.g. JSON parsing is very slow) – we’re either getting this as a raw byte stream from a connection, or we’re reading raw text from a file on disk.

Consider the slower option – uniformly distributed ASCII characters in strings of length 12, since that gives a nice 1 byte/ns processed. A quick conversion shows this is equivalent to about 1 GB/s. For comparison, a mid-range SSD reads about 500 MB/s, and a gigabit connection is only 125 MB/s. So at that speed, it’s no longer a significant bottleneck.

Considering the hash map solution – even in the more lenient case of the non-uniform distribution, we get 12 bytes / 328 ns, which is about 36.6 MB/s. This is slow enough that it could reasonably be a bottleneck if almost nothing else is happening in the application.

Conclusion

There was a lot here, just looking at this problem from the perspective of Go. The O(n^2) solution being the best for normal-looking data surprised me – it strikes me as the kind of finding that’s interesting to bring up in an interview setting.

I doubt there’s more to be gained from this particular perspective at the problem. Next time I may consider other languages (like Python or Rust), or consider what happens if the characters could be any valid Unicode 16.0. Python and Rust may allow trying the sorting algorithm approach, and Unicode likely would make the byte array solution invalid. But validating that is what benchmarking is for!

Cyber Net It

Math, Programming, and the World

Problem 1.1

Benchmarking

Hash Map Solution

Array Solution

No Additional Data Structures

Pareto Distribution

In Context

Conclusion

Like this:

Leave a ReplyCancel reply

Benchmarking

Hash Map Solution

Array Solution

No Additional Data Structures

Pareto Distribution

In Context

Conclusion

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Cyber Net It