Sort Files Like A Master With The Linux Sort Command (Bash)

SortIf you do your development work in Linux, there are certain commands that you owe it to yourself to master fully. There are a number of these with the main ones being grep, find and sort. Just about everyone has at least a passing familiarity with these commands, but with most people the knowledge is superficial, they don't even realise how powerful those commands can be. So, if you really put in the effort to master them, not only will you make your own life much easier, but you will also be able to impress all you friends with your elite Linux skills when you pair with them :). I will cover grep and find (as well as other valuable commands) in subsequent posts – here we will concentrate on sort

Note: I am using bash, so your mileage might vary if you're using a different shell.

Sorting is a fundamental task when it comes to programming, if you have a decent knowledge of various sorting algorithms, their advantages and disadvantages, you will be a better software developer for it. However, often enough you just don't need to draw on this deeper knowledge. Whether you're answering an interview question about sorting or simply need to quickly sort some data in you day to day work – the Linux sort command is your friend.

The extent of most people's knowledge ends with:

sort some_file.txt

Which is fair enough, you rarely need to dig deeper, the default behaviour will usually do what you need and when it doesn't – we have Ruby or Perl, we can hack something together. Well I hope that we're somewhat more curious than your average developer :). As much is we like hacking things together, if a tool can already do all the work for us, we want to know about it. We can look at the man page for sort and discover all sorts of interesting bits, but even the man page is not really clear on the more advanced aspects of sort usage. It helps to have examples to truly grok the kinds of stuff you can do with sort, so let's have a look.

Sorting Basics

To start with we have the following file:

alan@alan-ubuntu-vm:~/tmp/sort$ cat letters.txt
b
D
c
A
C
B
d
a

We'll do the most basic sort first:

alan@alan-ubuntu-vm:~/tmp/sort$ sort letters.txt
a
A
b
B
c
C
d
D

Looks good, how about doing it in reverse:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -r letters.txt
D
d
C
c
B
b
A
a

Also easy, but what if we want to be case insensitive? Hang on a sec, according to the output it's already case insensitive. But the man page has an option for this:

-f, --ignore-case
              fold lower case to upper case characters

If sort is case insensitive by default, what is this option for. We'll that one is a bit of a gotcha, it looks like GNU sort is case insensitive by default, but the man page also contains the following:

*** WARNING *** The locale specified by the  environment  affects  sort
       order.  Set LC_ALL=C to get the traditional sort order that uses native
       byte values.

What this means that we need to set the LC_ALL environment variable to get the behaviour that we would expect from sort (i.e. capital letters before non-capitals). Let's try that:

alan@alan-ubuntu-vm:~/tmp/sort$ export LC_ALL=C
alan@alan-ubuntu-vm:~/tmp/sort$ sort letters.txt
A
B
C
D
a
b
c
d

That's better, and now our -f option is actually useful:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -f letters.txt
A
a
B
b
C
c
D
d

That looks ok, but something still seems a little funny, all the capitals appear before all the non-capitals every time. That's because the sort is not stable, but we can make it stable:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -f -s letters.txt
A
a
b
B
c
C
D
d

Now that's exactly what we wanted, it's case insensitive and stable, i.e. if the small letter appeared before the capital when unsorted (and the letters are the same), this order will be the same in the sorted list.

Ok, but what if we have numbers:

alan@alan-ubuntu-vm:~/tmp/sort$ cat numbers.txt
5
4
12
1
3
56

A normal sort is not what we want:

alan@alan-ubuntu-vm:~/tmp/sort$ sort numbers.txt
1
12
3
4
5
56

But we can fix that:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -n numbers.txt
1
3
4
5
12
56

And, if our lines happen to have some leading blanks:

alan@alan-ubuntu-vm:~/tmp/sort$ cat blank_letters.txt
b
D
   c
A
C
    B
d
a

We can easily ignore those and still sort correctly (using the -b flag):

alan@alan-ubuntu-vm:~/tmp/sort$ sort -f -s -b blank_letters.txt
A
a
b
    B
   c
C
D
d

Of course none of this actually writes the sorted output back to the file, we only get it on standard output. If we want to write it back to the file, we have to redirect the output to a new file and then replace the old file with the new file:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -f -s -b blank_letters2.txt > blank_letters2.sorted
alan@alan-ubuntu-vm:~/tmp/sort$ cat blank_letters2.sorted
A
a
b
    B
   c
C
D
d
alan@alan-ubuntu-vm:~/tmp/sort$ mv blank_letters2.sorted blank_letters2.txt
alan@alan-ubuntu-vm:~/tmp/sort$ cat blank_letters2.txt
A
a
b
    B
   c
C
D
d

As an alternative to redirection, sort also has the -o option:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -f -s -b blank_letters2.txt -o blank_letters2.sorted
alan@alan-ubuntu-vm:~/tmp/sort$ cat blank_letters2.sorted
A
a
b
    B
   c
C
D
d

Alright, this is all pretty standard stuff, let's see how we can do something a little bit more fancy.

Advanced Sort Usage

One of the most common use cases when it comes to sort is to pipe its output to uniq, in order to remove any duplicate lines, but this is often not necessary as sort has a uniq-type option built right in (-u):

alan@alan-ubuntu-vm:~/tmp/sort$ cat blank_letters.txt
b
D
c
A
C
B
d
a
alan@alan-ubuntu-vm:~/tmp/sort$ sort -f -s -u blank_letters.txt
A
b
c
D

It even took into account the fact that we wanted to be case insensitive. Compare that to a case-sensitive sort with a uniq option set:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -s -u blank_letters.txt
A
B
C
D
a
b
c
d

Nothing is removed as there are no duplicate lines.

But what if my lines are a little bit more complicated than just a single letter, what if there are multiple columns and I want to sort by one of those columns (not the first one). This is also possible:

alan@alan-ubuntu-vm:~/tmp/sort$ ls -al | sort -k5
total 36
-rw-r--r-- 1 alan alan   14 May  9 00:12 numbers.txt
-rw-r--r-- 1 alan alan   16 May  9 00:00 letters.txt
-rw-r--r-- 1 alan alan   16 May  9 00:40 blank_letters.txt
-rw-r--r-- 1 alan alan   20 May  9 00:33 blank_letters2.sorted
-rw-r--r-- 1 alan alan   20 May  9 00:33 blank_letters2.txt
-rw-r--r-- 1 alan alan   20 May  9 00:37 blank_letters3.txt
-rw-r--r-- 1 alan alan   84 May  8 23:15 file1.txt
drwxr-xr-x 3 alan alan 4096 May  8 23:13 ..
drwxr-xr-x 2 alan alan 4096 May  9 00:40 .

As you can see we sorted the output of ls by the size (the 5th column). This is what the -k option is for. Basically -k tells sort to start sorting at a particular column given a particular column separator. The column separator is, by default, any blank character. So in the above example, we told sort to sort by the 5th column given the fact that the column separator is blanks.

But we don't have to be restricted by the default separator, we can specify our own using the -t option. Let's sort the first 10 lines of my /etc/passwd file (promise you won't hack my machine since I am giving it away like that :)) by the 4th column – the group id. As you know the /etc/passwd file uses the : (colon) character as the separator (for more info on the format see here). Here is the output unsorted:

alan@alan-ubuntu-vm:~/tmp/sort$ cat /etc/passwd | head                 
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh

There is a group id of 65534 in there which should appear last. Let's sort it:

alan@alan-ubuntu-vm:~/tmp/sort$ cat /etc/passwd | head | sort -t: -k4 -n
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
games:x:5:60:games:/usr/games:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync

We had to do a numeric sort since we're dealing with numbers, and we specified : (colon) as the column separator. The output is sorted correctly with 65534 being on the last line. Pretty cool! But the fun doesn't stop there, we can sort by multiple columns, one after the other. Consider this list of IP addresses:

alan@alan-ubuntu-vm:~/tmp/sort$ cat ips.txt
192.168.0.25
127.0.0.12
192.168.0.1
127.0.0.3
127.0.0.6
192.168.0.5

Let's sort it by the first column, so that all the addresses starting with 127 go together, and then sort it by the 4th column, to make sure that the IPs are sorted by the last column within each range.

alan@alan-ubuntu-vm:~/tmp/sort$ cat ips.txt | sort -t. -k 2,2n -k 4,4n
127.0.0.3
127.0.0.6
127.0.0.12
192.168.0.1
192.168.0.5
192.168.0.25

We specified the dot as the separator. The -k 2,2n syntax has the following meaning. Do a sort by column (-k), start at the beginning of column 2 and go to the end of column 2 (2,2). The n on the end is to indicate that we want to do a numeric sort since we are dealing with numbers. That is some powerful stuff, wouldn't you agree?

Cool/Useful Stuff

There is still more we can do with the sort command. Have you ever wanted to randomize the lines in a file? It is not a common use case, but does come in handy once in a while (if only for testing purposes, sometimes). Well, the sort command has you covered here also with the -R option (that's capital R):

alan@alan-ubuntu-vm:~/tmp/sort$ cat numbers.txt
5
4
12
1
3
56
alan@alan-ubuntu-vm:~/tmp/sort$ cat numbers.txt | sort -R
5
4
1
3
12
56
alan@alan-ubuntu-vm:~/tmp/sort$ cat numbers.txt | sort -R
3
4
1
56
5
12

We get a different order every time, which is what we would expect from randomizing the lines.

If you give sort multiple files on the command line, it will combine the contents of all the files and sort it as a whole:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -n numbers.txt numbers2.txt
1
1
3
4
4
5
7
8
10
12
22
23
26
56
56
68

This is really handy, but sometimes, the files you have are already sorted, you just want to merge them. Sort provides the -m options just for this purpose. The output of using sort on two files will be exactly the same whether you use -m or not, but merging should be faster:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -n -m numbers1.sorted numbers2.sorted
1
1
3
4
4
5
7
8
10
12
22
23
26
56
56
68

Lastly, if you just want to check if a file is sorted or not, without actually performing the sort, you have the -c option:

alan@alan-ubuntu-vm:~/tmp/sort$ sort -n -c numbers.txt
sort: numbers.txt:2: disorder: 4
alan@alan-ubuntu-vm:~/tmp/sort$ sort -n -c numbers1.sorted
alan@alan-ubuntu-vm:~/tmp/sort$

There you go, the total awesomeness of the sort command laid bare. If you know any other handy things you can do with sort, do leave a comment. And remember – only use your new-found sort powers for good instead of evil :).

Image by SewPixie (so far behind!)

  • Pingback: Sortuj pliki jak mistrz poleceniem sort - develway.pl

  • http://www.devilsduke.com Kishore Mylavarapu

    Great work.Linux Is ultimate.

  • http://www.pixelbeat.org/ Pádraig Brady

    One can set locale just for the sort (or any) command like: LC_ALL=C sort
    sort can sort files inplace: sort file.txt -o file.txt
    I would not give multi column sorts in examples (sort -k5). better say -k5,5

    • http://www.skorks.com Alan Skorkin

      Ah, I didn’t realise that using -o you could supply the input file as the output, that’s pretty cool. Thanks for sharing that.

  • http://matthew.mceachen.us/blog Matthew McEachen

    This is a very gentle sort tutorial — nice!

    Using sort -n and `find` option, you can get recursive sort-by-mtime:

    http://matthew.mceachen.us/blog/recursive-sort-by-modification-time-5.html

  • http://reprog.wordpress.com/ Mike Taylor

    The column separator is, by default, any blank character.

    Minor bug here — the default column separate is any sequence of whitespace characters.

    Also on the impact of locales: sort is not the only program that respects the LC_ALL environment variable (and related environment variables such as LC_COLLATE) — there are plenty of others, of which ls is probably the most irritating. Many modern Linux distributions default the locale to something other than C, so that ls lists files in case-insensitive order: this is a real pain if you’re using to seeing your Makefile and README at the top of the listing, only to find them mingled in with all your hash.c and stack.c. The fix is to to export LC_ALL=C in your .bash_profile.

    • cgordi

      so does this mean that I don’t have to indicate tab-separated files explicitly?

  • http://www.skorks.com Alan Skorkin

    Hey Mike,

    Cheers for picking that up and thanks for the extra info.

  • Pingback: Webs Developer » Executing Multiple Commands – A Bash Productivity Tip

  • Boel

    What if one has a column (lets say column no 4) that looks like this:
    test65
    test7
    test1
    test13
    test3
    test54

    How does one sort this column based on the numbers after “test”?

    • http://www.pixelbeat.org/ Pádraig Brady

      sort -k1.5n,1
      sort -V

  • Craig

    Is there any way to use negative column values? I have filenames with variable numbers of delimited sections:
    ncom_glbl_gulf_2010071400_t000.tar
    ncom_glbl_north_america_2010071400_t000.tar

    I’d like to sort on the date-time-group, which is a fixed position, second from last. Thanks in advance.

    • http://www.skorks.com Alan Skorkin

      Hmm, I am not sure if you can do that, you may need to get a bit more creative if you want to use sort (rather than resorting to perl or ruby), for example, maybe all your date groups start with 2 and there are no other 2′s anywhere else so you could use 2 as a delimiter instead of the _. Also, maybe you can reverse all your filenames and then sort the reversed versions and then reverse again:

      ls | rev | sort -r | rev

      If nothing like that works, then you may have to go to something with a bit more grunt like ruby or perl, or whatever scripting lang you fancy.

  • Hirdesh Kumar

    Hi,
    The tutorial is nice. But I am still wondering, how can I sort the words of a line, say I have three lines,
    Porter Harry
    Mark Alan
    Fly Butter

    And I wan that it should sort the lines by words and not the lines itself. I want something like this:
    Harry Porter
    Alan Mark
    Butter Fly

    • http://www.skorks.com Alan Skorkin

      I probably wouldn’t use the sort command, I’d probably do it with a ruby one liner. Read the file line by line, tokenize each line into array, sort the array and then output. Look up ruby one liners, if you’re still confused, let me know and I’ll give you the exact line :).

      • http://[email protected] sakshi

        PLEASE GIVE THE ANS THIS QUESTION C:\>SORT/R<UNSORT WHAT DONE THIS COMMAND

  • Pingback: Linux sort, sortieren von Textfiles einfach gemacht » Server » Debian Root

  • Mike

    Alan,

    For what it’s worth, I thought I’d mention that the -R option is not available until after GNU coreutils 5.97 — GNU coreutils 6.10 has sort that supports -R option.

    Thanks for the great write up!
    Best Regards,
    Mike // Silvertip257

    # Proof of what I found…
    [st257@centos54vm sort]$ cat numbers2.dat | sort -R
    sort: invalid option — R
    Try `sort –help’ for more information.
    [st257@centos54vm sort]$ sort –version
    sort (GNU coreutils) 5.97
    Copyright (C) 2006 Free Software Foundation, Inc.
    This is free software. You may redistribute copies of it under the terms of
    the GNU General Public License .
    There is NO WARRANTY, to the extent permitted by law.

    Written by Mike Haertel and Paul Eggert.

    * Now on an Ubuntu host, running a more recent version of GNU coreutils the -R option is available to randomly sort.
    st257@iron:~$ cat numbers2.dat | sort -R
    12
    56
    1
    3
    4
    5
    st257@iron:~$ sort –version
    sort (GNU coreutils) 6.10
    Copyright (C) 2008 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.

    Written by Mike Haertel and Paul Eggert.

  • Apud

    Is there a way to get list of duplicate entries in column 2 using sort or uniq?
    something like sort -u -k2 but insted of -u i need duplicate in column 2

    or can I use something like
    sort | uniq -d but get duplicates in column 2 instead

    example data file
    a111 55555
    b222 66666
    c333 55555
    d444 77777

    so the output should be

    a111 55555
    c333 55555

    • Johnnie

      You could do it with the cut command:

      grep -e “`cut -d ‘ ‘ -f 2 data_file.txt | sort | uniq -d`” data_file.txt

      • Johnnie

        …actually, this will work if there ARE duplicates in the file. However if there aren’t any duplicates then it will end up running this:
        grep -e “” data_file.txt
        …and grep will recognise “” as a null pattern which will match everything in the file, which is not what you want!

        You could pipe the `cut -d….uniq -d` output to a file instead, and then do grep -f file_with_dups.txt data_file.txt. Or avoid the null pattern by adding something to the -e parameter list which is guaranteed not to match anything in your data file:
        e.g.
        grep -e “$^ `cut -d …

  • Beth

    what if I want to sort ps aux by the time column… if I do the sort as a number it sorts by the minutes but not by the seconds, and since the start column sometimes has dates and sometimes has times I can’t change the breaking point to ‘:’ and specify a column…

    help

  • vasa1

    In the section on IP addresses, you wrote:
    Let’s sort it by the ***first*** column, so that all the addresses starting with 127 go together, and then sort it by the 4th column, to make sure that the IPs are sorted by the last column within each range.

    but later you have …
    We specified the dot as the separator. The *** -k 2,2n *** syntax has the following meaning. Do a sort by column (-k), start at the beginning of column 2 and go to the end of column 2 (2,2).
    Should that be *** -k 1,1n *** instead?

  • Martin

    Hi there,

    any ideas how I sort the columns of a .csv-file by the first line, instead of the lines by any column?

    Cheers and thanks in advance for your advice
    Martin

  • Vivek Gadodia

    Fantastic!

  • popszegecs

    Does anyone know why sort gives this result for these values?
    $ echo -e “psdn’sdn”sdnssd” | sort -k3
    psd
    ‘sd
    “sd
    ssd

    • Aaron Newton

      It is returning the unsorted list, since there is no third field. You could try and sort using a subset of the first field (like the first character) – echo -e “psdn’sdn”sdnssd” | sort -k1.1,1.1

      • Nezo

        So Nice ^_^!~

  • Pingback: Construir un diccionario de nombres científicos para LibreOffice | Escuelas Libres

  • taggerbear.devio.us

    Do you know how to sort including “&”(ampersands)?, I’m sorting HTML lines but some characters need to be written in a special code, for example “<" is "<"

  • http://www.facebook.com/sean.zicari Sean Zicari

    Very nice, thanks!

  • CiroDuranSantilli

    great tut.

    i think -h deserves to be included here at:

    ls -lh | sort -k5hr

    • Nezar

      ls -lh | sort -k5,5 -hr

      :D

    • imvkb

      needed just this, you are an angel.

  • Sammy

    Nice Explanation. Helped a lot. Thanks for the post.

  • Roly

    Great tute but do you think you can cover the combination of numeric and alpha?.., We have an issue here that we’re not too sure how to go about it. We have a number of files as:

    1 Table of contents
    1A Summary
    2 Introduction
    2A Fact 1
    ..
    ..

    11 Appendix A
    12 Appendix B

    When we look at the files on the server they are sorted/ordered as follows (by default) as:

    1 Table of contents
    11 Appendix A
    12 Appendix B
    2 Introduction
    1A Summary
    2A Fact 1

    Which is obviously incorrect.

    When we look on a windows machine via windows explorer, it is sorted as expected. ie
    1 Table of contents
    1A Summary
    2 Introduction
    2A Fact 1
    ..
    ..

    11 Appendix A
    12 Appendix B

    Any ideas on how we can present files sorted in the same behaviour as Windows?

    Cheers,
    Roly.

    • Nick

      I’ll give it a try when i’m running ubuntu ad using the terminal again, but wouldn’t a normal, alphanumerical sort fix that? Simple sort -n?
      If not, it has to do with the fact that you subcategories are defined by a number AND a letter. maybe resort to using 1.1 instead of 1A to make it easier for yourself. If you wish to keep it 1A, 2A, etc. I’ll have a look into that. I couldn’t tell from the top of my head.

  • Ranjan

    How can i sort a file based on the last column. I have 4 columns of data in the file, but the third column can have spaces in it, but is bounded by double quotes.

    a b c x

    d e “f f” y
    g h i z

    Now if I sort by -k4 which is the 4th column, it would sort as

    d e “f f” y
    a b c x
    g h i z

    Since in the first line it considered f” as the 4th column. Is there any simpler way to sort by the last column. Or if it is possible to consider strings within double quotes as one string.

    My expected output is the same as input in this case.

  • Secmas

    Want to thank you for your tutorial, it helped me to finish my IP sorts.

  • Nick

    So what about if i wanted to sort a file with numerals and words e.g.

    45 raisins
    15 peaches
    3 pears
    3 lemons
    2 strawberries
    2 apples

    and the outcome should be

    45 raisins
    15 peaches
    3 lemons
    3 pears
    2 apples
    2 strawberries

    So the numerical order (in this case the result of the uniq -c command) stays the same, but those of the same number are ordered alphabetically.
    Also, is there a way to sort a file of text by adjacent pairs of words, e.g.

    My name is John

    then “my name” “name is” and “is John” are pairs. If I wanted to sort an entire text in such pairs, how would I go about that? They can be sorted alphabetically or not, in the end my intention is to use the uniq command to check how much a certain pair occurs.

    • Nick

      For the first one i fixed it by adding | sort -s +1 -2 | sort -s -n -r
      at the end. It worked, but seems like a bit too much. I’m wondering if there’s a shorter/easier way?

      • bcc4foss

        For the first one simply “sort +0 -1nr” should do what you want. Since the ordering of the second field alphabetically is the default behavior, there is no need to include it.

  • Pingback: Μέρα 1, Ενότητα 1 | Linux Lesson

  • Rafael

    sort 1 -3 texto.txt – Organiza o arquivo texto.txt usando como referência a segunda até a quarta palavra (segundo ao quarto campo) que constam naquela linha.
    sort -t : 2 -3 passwd – Organiza o arquivo passwd usando como referência a terceira até a quarta palavra (terceiro ao quarto campo). Note que a opção -t especifica o caracter “:” como delimitador de campos ao invés do espaço. Neste caso, o que estiver após “:” será considerado o próximo campo.

  • Joseph

    Helped a lot, thank you. Is there a way to skip lines starting with a certain regex and then sort the remaining lines? The number of lines containing the regex can vary, but is always present at the beginning of the file.

  • Mario

    You must read the whole paragraph under all option descriptions, it’s useful. For example, what do you do if you want sort by the second character of the 4th column? You use sort -k4.2 file , I knew this by reading that paragragh

  • Superschwul

    note on locales: LC_ALL=C affects more than what we need for sorting. later, i had problems inserting accented characters in vim with it. if setting LC_COLLATE=C instead, all is well.