Sorting mixed lists of numbers and strings |
June 26th, 2009 |
programming, tech |
Imagine you have this list:
fname_0006.v0_word 2 fname_0007.v0_word 12 fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 8 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0005.v0_word 24Imagine further that you want to sort it. Unfortunately, I can't get gnu
sort
to let me specify which fields are numeric
and which now. That is, I can do:
$ cat file.txt | sort fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 24 fname_0005.v0_word 8 fname_0006.v0_word 2 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12Or I can do:
$ cat file.txt | sort -n fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 24 fname_0005.v0_word 8 fname_0006.v0_word 2 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12Or I can do:
$ cat file.txt | sort -n -k1,1 -k2,2 fname_0006.v0_word 2 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 8 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12 fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0005.v0_word 24You might think this would work:
$ cat file.txt | sort -k1,1 -kn2,2 fname_0006.v0_word 2 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 8 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12 fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0005.v0_word 24But nothing seems to make it do the right thing. So I abandoned sort for python:
$ cat simple_sorter.py import fileinput def tidy(x): try: return int(x) except ValueError: return x line_bits = [] for line in fileinput.input(): line_bits.append([tidy(field) for field in line.split()]) for bits in sorted(line_bits): print " ".join(str(bit) for bit in bits) $ cat tmp.txt | python simple_sorter.py fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 8 fname_0005.v0_word 24 fname_0006.v0_word 2 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12
Update 2013-08-22: Thinking now, if I had to do it on the terminal I would do:
$ cat file | awk '{print $1, $2+1000}' | sort | awk '{print $1, $2-1000}' fname_0001.v0_word 15 fname_0002.v0_word 23 fname_0003.v0_word 5 fname_0003.v0_word 7 fname_0005.v0_word 8 fname_0005.v0_word 24 fname_0006.v0_word 2 fname_0006.v0_word 9 fname_0007.v0_word 11 fname_0007.v0_word 12Adding 1000 (or any number with more digits than your biggest number) puts in leading digits, fixing sorting. It's basically decorate-sort-undecorate.
Comment via: facebook