<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
> <channel><title>Comments on: Faster List Intersection Using Skip Pointers</title> <atom:link href="http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/feed/" rel="self" type="application/rss+xml" /><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/</link> <description>For the betterment of the software craft...</description> <lastBuildDate>Mon, 21 Nov 2011 13:57:06 +0000</lastBuildDate> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.1.2</generator> <item><title>By: Burhan</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-6657</link> <dc:creator>Burhan</dc:creator> <pubDate>Wed, 10 Nov 2010 00:33:12 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-6657</guid> <description>&quot;We continue advancing through both lists until we have matched 12 and advanced to the next item in each list.&quot;
I didn&#039;t get why 12?</description> <content:encoded><![CDATA[<p>&#8220;We continue advancing through both lists until we have matched 12 and advanced to the next item in each list.&#8221;</p><p>I didn&#8217;t get why 12?</p> ]]></content:encoded> </item> <item><title>By: Alan Skorkin</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4341</link> <dc:creator>Alan Skorkin</dc:creator> <pubDate>Fri, 26 Mar 2010 11:30:35 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4341</guid> <description>Hi Mitch,
That&#039;s really interesting. Yeah I&#039;d love for you to send me the code you used, I&#039;ll play around with it myself. If you don&#039;t mind I might even do a separate post about it (if I can fit it in that is, so much to do so little time :)).</description> <content:encoded><![CDATA[<p>Hi Mitch,</p><p>That&#8217;s really interesting. Yeah I&#8217;d love for you to send me the code you used, I&#8217;ll play around with it myself. If you don&#8217;t mind I might even do a separate post about it (if I can fit it in that is, so much to do so little time :)).</p> ]]></content:encoded> </item> <item><title>By: Mitch Kuppinger</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4340</link> <dc:creator>Mitch Kuppinger</dc:creator> <pubDate>Fri, 26 Mar 2010 07:05:36 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4340</guid> <description>Alan,
I decided to try comparing the performance of my approach to yours. I tried your timing method but was not satisfied because on my slowish hardware the resolution appears to be about 15.625 microseconds and thus on a single pass I could not distinguish the relative speeds of each approach. I then changed to Benchmark only to discover that you had already blogged about it! I think Benchmark gives a better picture of the relative performances. I ran 3 types of lists through the benchmarking process just as you had done: raw integer lists, wrapped lists and wrapped lists with skips. I stored the integer values of the list members that exist in the intersection in the output list. This made it easy to verify that each intersection method returned the same output. I created the lists once and stored them as yaml to allow repeated testing.
The results surprised me:
&lt;pre&gt;
list1 size: 1064      list2 size: 4014      intersect list size: 416
1000 passes:
Enumerator Based Intersect	        Skorks Rubyish Intersect
user	system	 total	 real		user	system	 total	 real
raws:	21.59	0.02	21.61	21.64		0.89	0.00	0.89	0.89
wraps:	38.55	0.00	38.55	38.56		9.05	0.00	9.05	9.05
skips:	51.53	0.02	51.55	51.69		0.34	0.00	0.34	0.34
&lt;/pre&gt;
The file containing the input lists in yaml occupied 678 KB on disk and I had 2+ meg of RAM available as I started this run. I don&#039;t think these results can be explained by memory problems. eg using virtual RAM on disk. It appears that the Enumerator based approach is much slower than the approach you used! I thought my suggestion was much easier to read and understand but the apparent performance difference has to make your code the clear winner. ;-)
btw. I also ran this with far larger lists that clearly exceeded available memory and found that  the Enumerator based approach appeared faster. I think that may be because that approach is not supposed to require as much memory so perhaps there was less HD thrashing going on.
The take home message is of course:
When performance matters, test your code for performance.
I can send you the code for my tests, if you would like.</description> <content:encoded><![CDATA[<p>Alan,</p><p>I decided to try comparing the performance of my approach to yours. I tried your timing method but was not satisfied because on my slowish hardware the resolution appears to be about 15.625 microseconds and thus on a single pass I could not distinguish the relative speeds of each approach. I then changed to Benchmark only to discover that you had already blogged about it! I think Benchmark gives a better picture of the relative performances. I ran 3 types of lists through the benchmarking process just as you had done: raw integer lists, wrapped lists and wrapped lists with skips. I stored the integer values of the list members that exist in the intersection in the output list. This made it easy to verify that each intersection method returned the same output. I created the lists once and stored them as yaml to allow repeated testing.</p><p>The results surprised me:</p><pre>
    list1 size: 1064      list2 size: 4014      intersect list size: 416
    1000 passes:
		Enumerator Based Intersect	        Skorks Rubyish Intersect
		user	system	 total	 real		user	system	 total	 real
raws:	21.59	0.02	21.61	21.64		0.89	0.00	0.89	0.89
wraps:	38.55	0.00	38.55	38.56		9.05	0.00	9.05	9.05
skips:	51.53	0.02	51.55	51.69		0.34	0.00	0.34	0.34
</pre><p>The file containing the input lists in yaml occupied 678 KB on disk and I had 2+ meg of RAM available as I started this run. I don&#8217;t think these results can be explained by memory problems. eg using virtual RAM on disk. It appears that the Enumerator based approach is much slower than the approach you used! I thought my suggestion was much easier to read and understand but the apparent performance difference has to make your code the clear winner. ;-)</p><p> btw. I also ran this with far larger lists that clearly exceeded available memory and found that  the Enumerator based approach appeared faster. I think that may be because that approach is not supposed to require as much memory so perhaps there was less HD thrashing going on.</p><p>The take home message is of course:</p><p>When performance matters, test your code for performance.</p><p>I can send you the code for my tests, if you would like.</p> ]]></content:encoded> </item> <item><title>By: Alan Skorkin</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4183</link> <dc:creator>Alan Skorkin</dc:creator> <pubDate>Tue, 23 Mar 2010 09:44:29 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4183</guid> <description>Any time :)</description> <content:encoded><![CDATA[<p>Any time :)</p> ]]></content:encoded> </item> <item><title>By: Atul Kash</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4182</link> <dc:creator>Atul Kash</dc:creator> <pubDate>Tue, 23 Mar 2010 08:54:26 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4182</guid> <description>I didn&#039;t know about  Skip Pointers to begin with, thanks for enlightening me</description> <content:encoded><![CDATA[<p>I didn&#8217;t know about  Skip Pointers to begin with, thanks for enlightening me</p> ]]></content:encoded> </item> <item><title>By: Radu Grigore</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4160</link> <dc:creator>Radu Grigore</dc:creator> <pubDate>Mon, 22 Mar 2010 11:22:34 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4160</guid> <description>Actually, I don&#039;t expect binary search to do well here because it doesn&#039;t use the cache nicely. But, if you try, let us know. :)
If you have two lists of the same size &lt;i&gt;n&lt;/i&gt; that are equal, then you need &#937;(&lt;i&gt;n&lt;/i&gt;) time just to output the result, so the naive merging algorithm is optimal. In other words, you may only expect to do better if (1) you look at the average-case instead of the worst-case or (2) lists have significantly different sizes. The second point leads to the idea of iterating through the short list and checking for each element if it is in the other list. Checking if an integer is in a set is done in expected constant time with hash-tables. You might want to try it.
Because hash-tables are not terribly cache-friendly (which reminds me, use linear probing not chaining if you want it fast!) you could install a Bloom filter in front of it that quickly discards many of the elements that aren&#039;t in the set. This is probably worth it only for the big lists/hashtables.
This brings the worst-case down from O(m+n) to O(min(m,n)). Analyzing at the average case is more tricky.</description> <content:encoded><![CDATA[<p>Actually, I don&#8217;t expect binary search to do well here because it doesn&#8217;t use the cache nicely. But, if you try, let us know. :)</p><p>If you have two lists of the same size <i>n</i> that are equal, then you need &Omega;(<i>n</i>) time just to output the result, so the naive merging algorithm is optimal. In other words, you may only expect to do better if (1) you look at the average-case instead of the worst-case or (2) lists have significantly different sizes. The second point leads to the idea of iterating through the short list and checking for each element if it is in the other list. Checking if an integer is in a set is done in expected constant time with hash-tables. You might want to try it.</p><p>Because hash-tables are not terribly cache-friendly (which reminds me, use linear probing not chaining if you want it fast!) you could install a Bloom filter in front of it that quickly discards many of the elements that aren&#8217;t in the set. This is probably worth it only for the big lists/hashtables.</p><p>This brings the worst-case down from O(m+n) to O(min(m,n)). Analyzing at the average case is more tricky.</p> ]]></content:encoded> </item> <item><title>By: Alan Skorkin</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4156</link> <dc:creator>Alan Skorkin</dc:creator> <pubDate>Mon, 22 Mar 2010 05:54:40 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4156</guid> <description>This is Wordpress trying to be smart and trying to interpret anything in less than and greater than signs as an html  tag, and since the spaceship html tag doesn&#039;t exist it just removes it. If you put spaces around the = sign, it should prevent Wordpress from removing it
i.e. &lt; = &gt;
The other thing you can do is to wrap the code in a &quot;pre&quot; tag e.g.:
&lt;pre&gt;
def hash
title.hash
end
&lt;/pre&gt;
but I believe it would still be better to put spaces in the spaceship operator just to make sure.
I&#039;ve fixed up your comment for the moment. If it does happen again despite all this, I&#039;ll have a look at fixing it in a more fundamental way.</description> <content:encoded><![CDATA[<p>This is WordPress trying to be smart and trying to interpret anything in less than and greater than signs as an html  tag, and since the spaceship html tag doesn&#8217;t exist it just removes it. If you put spaces around the = sign, it should prevent WordPress from removing it<br
/> i.e. < = ></p><p>The other thing you can do is to wrap the code in a &#8220;pre&#8221; tag e.g.:</p><pre>
def hash
   title.hash
end
</pre><p>but I believe it would still be better to put spaces in the spaceship operator just to make sure.</p><p>I&#8217;ve fixed up your comment for the moment. If it does happen again despite all this, I&#8217;ll have a look at fixing it in a more fundamental way.</p> ]]></content:encoded> </item> <item><title>By: Alan Skorkin</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4154</link> <dc:creator>Alan Skorkin</dc:creator> <pubDate>Mon, 22 Mar 2010 05:43:47 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4154</guid> <description>Hi Mitch,
This is awesome, I&#039;ll need to spend a little bit of time playing around with your code before I can say anything intelligent so bear with me :). It might even end up as a separate post :).
I hope you continue to find my blog interesting.</description> <content:encoded><![CDATA[<p>Hi Mitch,</p><p>This is awesome, I&#8217;ll need to spend a little bit of time playing around with your code before I can say anything intelligent so bear with me :). It might even end up as a separate post :).</p><p>I hope you continue to find my blog interesting.</p> ]]></content:encoded> </item> <item><title>By: Mitch Kuppinger</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4153</link> <dc:creator>Mitch Kuppinger</dc:creator> <pubDate>Mon, 22 Mar 2010 05:40:52 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4153</guid> <description>Once again I am defeated by the spaceship operator! Your rendering of a reply removes the spaceship operator characters. That is unfortunately the test I use in each of the case statements in the cod I submitted  above.
Is there anything I can do to avoid similar problems in the future?</description> <content:encoded><![CDATA[<p>Once again I am defeated by the spaceship operator! Your rendering of a reply removes the spaceship operator characters. That is unfortunately the test I use in each of the case statements in the cod I submitted  above.</p><p>Is there anything I can do to avoid similar problems in the future?</p> ]]></content:encoded> </item> <item><title>By: Mitch Kuppinger</title><link>http://www.skorks.com/2010/03/faster-list-intersection-using-skip-pointers/comment-page-1/#comment-4151</link> <dc:creator>Mitch Kuppinger</dc:creator> <pubDate>Mon, 22 Mar 2010 05:33:23 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1523#comment-4151</guid> <description>You might consider the use of Enumerators here as well.  The examples are written (and tested) for ruby 1.9.1.  I&#039;m assuming the raw, wrapped and wrapped_with_skips lists are structured as you described them in your post. I have no real world experience with the issues you are writing about. However, the problems seemed to be an excellent opportunity to learn more about Enumerators. I hope you and your readers find the following similarly useful.
&lt;pre&gt;
module SortedListOps
# This is the method for a &#039;raw&#039; list (a sorted list of unique integers)
def isect_raw(list1,list2)
result = []
e1, e2 = list1.each, list2.each
a, b = e1.next, e2.next
while true
begin
case a &lt; = &gt; b
when -1 then a = e1.next
when +1 then b = e2.next
else result &lt;&lt; a ;a = e1.next; b = e2.next
end
rescue StopIteration; return result
end
end
end
# This does the same for a list of integers  now wrapped in ostructs.
def isect_wrapped(list1,list2)
result = []
e1, e2 = list1.each, list2.each
a, b = e1.next, b = e2.next
loop do
begin
case a.value &lt; = &gt; b.value
when -1 then a = e1.next
when +1 then b = e2.next
else result &lt;&lt; a.value ;a = e1.next; b = e2.next
end
rescue StopIteration; return result
end
end
end
# Finally the intersection method for sorted ostructs containing an integer value and a skip_value representing the value of the ostruct one skip_size further along in the list.
def isect_wrapped_with_skips(list1,list2)
result = []
skip_size1 = Math.sqrt(list1.size).to_i
skip_size2 = Math.sqrt(list2.size).to_i
s = 1
e1 = list1.enum_for(:skip, s)
e2 = list2.enum_for(:skip, s)
a, b = e1.next, b = e2.next
loop do
begin
case a.value &lt; = &gt; b.value
when -1 then s = (a.skip_value &amp;&amp; (a.skip_value &lt; b.value) ? skip_size1 : 1); a = e1.next
when +1 then s = (b.skip_value &amp;&amp; (b.skip_value&lt; a.value) ? skip_size2 : 1); b = e2.next
else result &lt;&lt; a.value; s = 1; a = e1.next; b = e2.next
end
rescue StopIteration; return result
end
end
end
end
class Array
def skip(s = 1)
current_index = -1
loop do
current_index += s
raise StopIteration unless current_index &lt; self.length
yield self[current_index] &#124;&#124; self.last
end
end
end
&lt;/pre&gt;
The isect_wrapped_with_skips method depends on defining a skip iterator for Array. I take advantage of the fact that the arguments passed to an enumerator created by enumerable.enum_for(:method, *args) are available to the enumerator where ever it is used. Assignments to those args are seen within the enumerator and thus can be used to affect the enumerator behavior. Note also that you cannot use enumerator.first to set the initial values. This does not set the position within the enumerable  in a way to support the correct behavior of the first use of enumerator.next. This seems to me to be a bug but I&#039;ll have to investigate further.
When I put these routines through your timing routines they appear to be quite fast but I haven&#039;t had time to compare them directly with your versions.
I&#039;m very glad to have found your blog. It has been very stimulating. Thanks!
btw. My reply to your post (http://www.skorks.com/2010/03/writing-a-more-ruby-ish-array-intersection-function-and-sorting-structs/) contained several errors. I wrote that code off the top of my head and failed to test it at all before submitting it.  :-(  The current code does work in general but there are edge cases that need to be tested before I would use in any production code.</description> <content:encoded><![CDATA[<p>You might consider the use of Enumerators here as well.  The examples are written (and tested) for ruby 1.9.1.  I&#8217;m assuming the raw, wrapped and wrapped_with_skips lists are structured as you described them in your post. I have no real world experience with the issues you are writing about. However, the problems seemed to be an excellent opportunity to learn more about Enumerators. I hope you and your readers find the following similarly useful.</p><pre>
module SortedListOps
# This is the method for a 'raw' list (a sorted list of unique integers)
  def isect_raw(list1,list2)
  	result = []
  	e1, e2 = list1.each, list2.each
  	a, b = e1.next, e2.next
  	while true
  	  begin
  		case a < = > b
  		  when -1 then a = e1.next
  		  when +1 then b = e2.next
  		  else result &lt;&lt; a ;a = e1.next; b = e2.next
  		end
  	  rescue StopIteration; return result
  	  end
  	end
  end
# This does the same for a list of integers  now wrapped in ostructs.
  def isect_wrapped(list1,list2)
  	result = []
  	e1, e2 = list1.each, list2.each
  	a, b = e1.next, b = e2.next
  	loop do
  	  begin
  		case a.value < = > b.value
  		  when -1 then a = e1.next
  		  when +1 then b = e2.next
  		  else result &lt;&lt; a.value ;a = e1.next; b = e2.next
  		end
  	  rescue StopIteration; return result
  	  end
  	end
  end
# Finally the intersection method for sorted ostructs containing an integer value and a skip_value representing the value of the ostruct one skip_size further along in the list.
  def isect_wrapped_with_skips(list1,list2)
  	result = []
  	skip_size1 = Math.sqrt(list1.size).to_i
  	skip_size2 = Math.sqrt(list2.size).to_i
  	s = 1
  	e1 = list1.enum_for(:skip, s)
  	e2 = list2.enum_for(:skip, s)
  	a, b = e1.next, b = e2.next
  	loop do
  	  begin
  		case a.value < = > b.value
  		  when -1 then s = (a.skip_value &amp;&amp; (a.skip_value &lt; b.value) ? skip_size1 : 1); a = e1.next
  		  when +1 then s = (b.skip_value &amp;&amp; (b.skip_value&lt; a.value) ? skip_size2 : 1); b = e2.next
  		  else result &lt;&lt; a.value; s = 1; a = e1.next; b = e2.next
  		end
  	  rescue StopIteration; return result
  	  end
  	end
  end
end
class Array
  def skip(s = 1)
  	current_index = -1
  	loop do
  	  current_index += s
  	  raise StopIteration unless current_index &lt; self.length
  	  yield self[current_index] || self.last
  	end
  end
end
</pre><p> The isect_wrapped_with_skips method depends on defining a skip iterator for Array. I take advantage of the fact that the arguments passed to an enumerator created by enumerable.enum_for(:method, *args) are available to the enumerator where ever it is used. Assignments to those args are seen within the enumerator and thus can be used to affect the enumerator behavior. Note also that you cannot use enumerator.first to set the initial values. This does not set the position within the enumerable  in a way to support the correct behavior of the first use of enumerator.next. This seems to me to be a bug but I&#039;ll have to investigate further.</p><p>When I put these routines through your timing routines they appear to be quite fast but I haven&#039;t had time to compare them directly with your versions.</p><p>I&#039;m very glad to have found your blog. It has been very stimulating. Thanks!</p><p>btw. My reply to your post (<a
href="http://www.skorks.com/2010/03/writing-a-more-ruby-ish-array-intersection-function-and-sorting-structs/" rel="nofollow">http://www.skorks.com/2010/03/writing-a-more-ruby-ish-array-intersection-function-and-sorting-structs/</a>) contained several errors. I wrote that code off the top of my head and failed to test it at all before submitting it.  :-(  The current code does work in general but there are edge cases that need to be tested before I would use in any production code.</p> ]]></content:encoded> </item> </channel> </rss>
