<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
> <channel><title>Comments on: The Most Common Things You Do To A Large Data File With Bash</title> <atom:link href="http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/feed/" rel="self" type="application/rss+xml" /><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/</link> <description>For the betterment of the software craft...</description> <lastBuildDate>Mon, 21 Nov 2011 13:57:06 +0000</lastBuildDate> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.1.2</generator> <item><title>By: lre</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-7456</link> <dc:creator>lre</dc:creator> <pubDate>Mon, 12 Sep 2011 17:57:10 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-7456</guid> <description>Just stumbled across this post, and have to add
awk &#039;NR==1 { next} # Do not hold first line
hold!~/^$/ { print hold } # If held line is non-empty, print it
{ hold= $0 } # Hold line (thus, last line is not printed)
&#039;  inputfile &gt; outputfile</description> <content:encoded><![CDATA[<p>Just stumbled across this post, and have to add</p><p>awk &#8216;NR==1 { next} # Do not hold first line<br
/> hold!~/^$/ { print hold } # If held line is non-empty, print it<br
/> { hold= $0 } # Hold line (thus, last line is not printed)<br
/> &#8216;  inputfile &gt; outputfile</p> ]]></content:encoded> </item> <item><title>By: Alan Skorkin</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3821</link> <dc:creator>Alan Skorkin</dc:creator> <pubDate>Mon, 08 Mar 2010 13:43:22 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3821</guid> <description>I just love the names that unix and linux people come up with for their utilities :). Apparently &quot;buthead&quot; is a Program to copy all but the first N lines of standard input to standard output.</description> <content:encoded><![CDATA[<p>I just love the names that unix and linux people come up with for their utilities :). Apparently &#8220;buthead&#8221; is a Program to copy all but the first N lines of standard input to standard output.</p> ]]></content:encoded> </item> <item><title>By: Barak A. Pearlmutter</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3820</link> <dc:creator>Barak A. Pearlmutter</dc:creator> <pubDate>Mon, 08 Mar 2010 13:39:25 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3820</guid> <description>apt-get install buthead</description> <content:encoded><![CDATA[<p>apt-get install buthead</p> ]]></content:encoded> </item> <item><title>By: Dew Drop &#8211; March 7, 2010 &#124; Alvin Ashcraft&#39;s Morning Dew</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3804</link> <dc:creator>Dew Drop &#8211; March 7, 2010 &#124; Alvin Ashcraft&#39;s Morning Dew</dc:creator> <pubDate>Sun, 07 Mar 2010 14:24:13 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3804</guid> <description>[...] The Most Common Things You Do To A Large Data File With Bash (Alan Skorkin) [...]</description> <content:encoded><![CDATA[<p>[...] The Most Common Things You Do To A Large Data File With Bash (Alan Skorkin) [...]</p> ]]></content:encoded> </item> <item><title>By: Alan Skorkin</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3802</link> <dc:creator>Alan Skorkin</dc:creator> <pubDate>Sun, 07 Mar 2010 11:09:12 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3802</guid> <description>Hi Peter,
You&#039;re right they are programming languages in their own right, but you can easily pipe inputs to them and pipe outputs out of them and in that way they are very much like shell tools (i.e. head, tail etc.).
So while we&#039;re using sed and awk we&#039;re not writing full fledged scripts in them but instead are piping data through them and allowing them to perform little bits of functionality all on the command line, as per the unix philosophy.</description> <content:encoded><![CDATA[<p>Hi Peter,</p><p>You&#8217;re right they are programming languages in their own right, but you can easily pipe inputs to them and pipe outputs out of them and in that way they are very much like shell tools (i.e. head, tail etc.).</p><p>So while we&#8217;re using sed and awk we&#8217;re not writing full fledged scripts in them but instead are piping data through them and allowing them to perform little bits of functionality all on the command line, as per the unix philosophy.</p> ]]></content:encoded> </item> <item><title>By: Peter Cable</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3801</link> <dc:creator>Peter Cable</dc:creator> <pubDate>Sun, 07 Mar 2010 08:33:23 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3801</guid> <description>I&#039;m unclear why you even mention bash here. Sed, awk and ruby are interpreters for their respective programming languages and they are doing the actual work here, not bash.
You could accomplish all these tasks in bash, but I think it would be messy.</description> <content:encoded><![CDATA[<p>I&#8217;m unclear why you even mention bash here. Sed, awk and ruby are interpreters for their respective programming languages and they are doing the actual work here, not bash.</p><p>You could accomplish all these tasks in bash, but I think it would be messy.</p> ]]></content:encoded> </item> <item><title>By: Alan Skorkin</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3800</link> <dc:creator>Alan Skorkin</dc:creator> <pubDate>Sun, 07 Mar 2010 07:23:05 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3800</guid> <description>I didn&#039;t even consider that you can pipe stuff to ruby, although i am not sure why I didn&#039;t consider it since it makes perfect sense :). And having said that the example you give looks very similar to how you can pipe stuff to perl on the command line, but I never tend to use it since my perl skills are abysmal. But as you say, for simpler things I prefer to use the simple tools that are built into the shell and only take it up a notch (with perl or ruby) for more complex stuff.</description> <content:encoded><![CDATA[<p>I didn&#8217;t even consider that you can pipe stuff to ruby, although i am not sure why I didn&#8217;t consider it since it makes perfect sense :). And having said that the example you give looks very similar to how you can pipe stuff to perl on the command line, but I never tend to use it since my perl skills are abysmal. But as you say, for simpler things I prefer to use the simple tools that are built into the shell and only take it up a notch (with perl or ruby) for more complex stuff.</p> ]]></content:encoded> </item> <item><title>By: Korny</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3799</link> <dc:creator>Korny</dc:creator> <pubDate>Sun, 07 Mar 2010 05:38:57 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3799</guid> <description>I always use &#039;head&#039; and &#039;tail&#039; to trim lines, and &#039;grep&#039; to do things like remove internal blank lines.
For more complex things I have used &#039;sed&#039;, but I always have to look it up again when I need it - which is a bit of a showstopper.
Instead, I try to re-use the tool I&#039;m most familiar with - ruby.  Consider:
cat my_data &#124; ruby -pe &#039;next if $_ == &quot;\n&quot; &#039;
the &#039;-p&#039; means &#039;loop and print every line&#039;, and the call to &quot;next&quot; skips out of the loop before the printing.  Alternatively, you can use &#039;-n&#039; which loops without printing.
Or you can just roll your own loop:
cat my_data &#124; ruby -e &#039;$stdin.each {&#124;l&#124; puts l unless l == &quot;\n&quot;}&#039;
Or to truncate your first and last lines: (unfortunately loading everything into memory)
cat my_data &#124; ruby -e &#039;$stdin.to_a[1...-1].each {&#124;l&#124; puts l unless l == &quot;\n&quot;}&#039;
... though realistically, for such a simple example I&#039;d probably use head, tail, and grep!</description> <content:encoded><![CDATA[<p>I always use &#8216;head&#8217; and &#8216;tail&#8217; to trim lines, and &#8216;grep&#8217; to do things like remove internal blank lines.<br
/> For more complex things I have used &#8216;sed&#8217;, but I always have to look it up again when I need it &#8211; which is a bit of a showstopper.<br
/> Instead, I try to re-use the tool I&#8217;m most familiar with &#8211; ruby.  Consider:<br
/> cat my_data | ruby -pe &#8216;next if $_ == &#8220;\n&#8221; &#8216;<br
/> the &#8216;-p&#8217; means &#8216;loop and print every line&#8217;, and the call to &#8220;next&#8221; skips out of the loop before the printing.  Alternatively, you can use &#8216;-n&#8217; which loops without printing.<br
/> Or you can just roll your own loop:<br
/> cat my_data | ruby -e &#8216;$stdin.each {|l| puts l unless l == &#8220;\n&#8221;}&#8217;<br
/> Or to truncate your first and last lines: (unfortunately loading everything into memory)<br
/> cat my_data | ruby -e &#8216;$stdin.to_a[1...-1].each {|l| puts l unless l == &#8220;\n&#8221;}&#8217;<br
/> &#8230; though realistically, for such a simple example I&#8217;d probably use head, tail, and grep!</p> ]]></content:encoded> </item> <item><title>By: Alan Skorkin</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3798</link> <dc:creator>Alan Skorkin</dc:creator> <pubDate>Sun, 07 Mar 2010 04:45:50 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3798</guid> <description>Hi Evan,
It certainly looks like it would be, can you tell that my sed skills are sub-par :). Thanks.</description> <content:encoded><![CDATA[<p>Hi Evan,</p><p>It certainly looks like it would be, can you tell that my sed skills are sub-par :). Thanks.</p> ]]></content:encoded> </item> <item><title>By: Evan</title><link>http://www.skorks.com/2010/03/the-most-common-things-you-do-to-a-large-data-file-with-bash/comment-page-1/#comment-3797</link> <dc:creator>Evan</dc:creator> <pubDate>Sun, 07 Mar 2010 04:43:28 +0000</pubDate> <guid
isPermaLink="false">http://www.skorks.com/?p=1388#comment-3797</guid> <description>Wouldn&#039;t &quot;sed -e 1d -e &#039;$d&#039; -e &#039;/^$/d&#039; input_file &gt; output_file&quot; be a bit more efficient.  Save that large file having to to through two pipes and two extra processes!</description> <content:encoded><![CDATA[<p>Wouldn&#8217;t &#8220;sed -e 1d -e &#8216;$d&#8217; -e &#8216;/^$/d&#8217; input_file &gt; output_file&#8221; be a bit more efficient.  Save that large file having to to through two pipes and two extra processes!</p> ]]></content:encoded> </item> </channel> </rss>
