I’ve recently started following a blog which sets weekly programming challenges. I’m getting sick of Nintendo DS brain training and thought I’d try something more relevant to my line of work. Also, this gives me a good opportunity to learn more about my favoured scripting language at present, which is Ruby. So this week’s challenge states:
Given a list of n integers and another integer called m, determine (true / false) if there exist 2 numbers in that list which sum up to m.
Example: 2,6,4,9,1,12,7 and m=14 -> 2 and 12 sum up to 14, so the answer is true.
Provide the best algorithm in both manners: performance and memory to solve this puzzle. Don’t forget to mention the complexity of your solution!
Read on for my thoughts in solving this challenge.
Looking at the problem, you obviously start with an array of numbers called n. If the sum of two numbers from the array is m, then you know that the array *may* contain two elements which are equal to m, such as n[a] + n[b] = m.
However you don’t know which two elements of the array might achieve this, nor do you know how many pairs in the array would satisfy the same conditions.
So my first thoughts were to iterate through each element of the array, then for each element, subtract that value from m, and iterate through remaining elements in the array looking for that value.
def test_nosort(m,n) found=0 n.each do |a| b = m - a n.each do |x| found += 1 if (x.to_i == b) end end puts found if @debug end
The problem with this approach, is that for a starting array size of n, you have to check all possible combinations or n*n-1. Say for an array of size 100 elements, you have to check each value in the array 100-1 times (excluding current value), which means 100*99 = 9,900 possible combinations!
In the example provided above, I used the #each iterator for the array believing this to be the fastest operation when iterating through an array in Ruby.
This approach is suitable for small sized arrays (< 100 elements), but quickly becomes unusable when scaled up. In hindsight, this approach seems very similar to a bubble sort algorithm, where the list to be sorted has two items compared at a time and swapping them if they’re in the wrong order; iterating as many times until no more swaps are needed. It has a worst-case complexity described as n^2 where n is the number of items being sorted.
In the next approach, I decided to give regex a try ( I love regex
), where in this case I still use an #each iterator to loop through the array, but use a regex grep comparison against the original array. The following code demonstrates:
def test_regex(m,n) found=0 n.each do |a| b = m - a found += n.grep(b).length end puts found if @debug end
This approach performed better than the first approach. I also experimented with some other array methods such as #select, #detect and #partition all of which gave varying but slow results.
After a much needed coffee and re-think, I decided to build a hash, where each key was based on the original values in my array, with the key-value equaling a count of all occurrences for that key… Then I could just use the array #each iterator and jump straight to the number of values that match that key. The following code demonstrates:
def test_hash(m,n) found=0 hash = {} n.each do |a| hash.has_key? a ? hash[a] = hash[a].to_i + 1 : hash[a] = 1 end n.each do |a| b = m - a found += hash[b] if hash.has_key? b end puts found if @debug end
This approach was a marked success over previous approaches and it scales well for large arrays of numbers. You can see the differences in time (using the benchmark gem) for each approach when using an array size of 100 elements as follows:
user system total real test_nosort results : 0.530000 0.000000 0.530000 ( 0.534364) test_sort results : 0.530000 0.000000 0.530000 ( 0.534992) test_regex results : 0.360000 0.000000 0.360000 ( 0.357073) test_detect results : 6.390000 0.040000 6.430000 ( 6.480277) test_select results : 0.400000 0.000000 0.400000 ( 0.403032) test_partn results : 0.600000 0.010000 0.610000 ( 0.623047) test_hash results : 0.020000 0.000000 0.020000 ( 0.021390) test_binary results : 0.270000 0.000000 0.270000 ( 0.273587)
Even with array sizes of 10,000+ elements the benchmark was in the order of seconds.
However with an even larger array size of say 100,000+ elements, the memory usage of this hash search would be significant. I also tried a binary search but found this had worse performance than a hash. From reading wikipedia though it would appear that a binary tree search might be the best bet in terms of memory usage for large arrays.
The Answer?
For now I am satisfied that a hash key search is the least complex way to deal with this particular challenge, with the only caveat being a ceiling limit on the amount of elements that are in your initial array to begin with e.g. 1 – 10,000 elements. For larger arrays, binary tree algorithms might offer better memory performance but would be more complex to implement in a scripting language like Ruby.
The code I used to test this is attached below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | require 'benchmark' # m = 14 m = rand(100) # n = [2,6,4,9,1,12,7] n = Array.new(100) { |e| e = rand(100) } @debug = false def test_nosort(m,n) found=0 n.each do |a| b = m - a n.each do |x| found += 1 if (x.to_i == b) end end puts found if @debug end def test_sort(m,n) found=0 n.sort! n.each do |a| b = m - a n.reverse_each do |x| found += 1 if (x.to_i == b) end end puts found if @debug end def test_regex(m,n) found=0 n.each do |a| b = m - a found += n.grep(b).length end puts found if @debug end def test_detect(m,n) found=0 n.each do |a| b = m - a found += 1 if n.detect { |x| /\b#{b}\b/ =~ x.to_s } end puts found if @debug end def test_select(m,n) found=0 n.each do |a| b = m - a found_array = n.select { |x| x == b } found += found_array.length end puts found if @debug end def test_partn(m,n) found=0 n.each do |a| b = m - a found_array = n.partition { |x| x == b } found += found_array.length end puts found if @debug end def test_hash(m,n) found=0 hash = {} n.each do |a| hash.has_key? a ? hash[a] = hash[a].to_i + 1 : hash[a] = 1 end n.each do |a| b = m - a found += hash[b] if hash.has_key? b end puts found if @debug end def test_binary(m,n) found=0 n.each do |a| b = m - a found += 1 if binary_search(n, b) end puts found if @debug end def binary_search(array, val, left = 0, right = nil) array.sort! right = array.size unless right; mid = (left + right) / 2; return nil if left > right if val == array[mid] return mid elsif val > array[mid] binary_search(array, val, mid + 1, right) else binary_search(array, val, left, mid - 1) end end Benchmark.bm() do |timer| @debug = false timer.report("test_nosort results :\n") { 100.times { test_nosort(m,n) } } timer.report("test_sort results :\n") { 100.times { test_sort(m,n) } } timer.report("test_regex results :\n") { 100.times { test_regex(m,n) } } timer.report("test_detect results :\n") { 100.times { test_detect(m,n) } } timer.report("test_select results :\n") { 100.times { test_select(m,n) } } timer.report("test_partn results :\n") { 100.times { test_partn(m,n) } } timer.report("test_hash results :\n") { 100.times { test_hash(m,n) } } timer.report("test_binary results :\n") { 100.times { test_binary(m,n) } } end # >> user system total real # >> test_nosort results : # >> 0.530000 0.000000 0.530000 ( 0.534364) # >> test_sort results : # >> 0.530000 0.000000 0.530000 ( 0.534992) # >> test_regex results : # >> 0.360000 0.000000 0.360000 ( 0.357073) # >> test_detect results : # >> 6.390000 0.040000 6.430000 ( 6.480277) # >> test_select results : # >> 0.400000 0.000000 0.400000 ( 0.403032) # >> test_partn results : # >> 0.600000 0.010000 0.610000 ( 0.623047) # >> test_hash results : # >> 0.020000 0.000000 0.020000 ( 0.021390) # >> test_binary results : # >> 0.270000 0.000000 0.270000 ( 0.273587) |
