I know that similar questions have been asked about sorting a file by a specific column, but none of them seem to answer my question.
My input file shows coverage in 500bp windows along my genome, and it looks like
OHJ07_1_contig_10 0 500 130 500 500 1.0000000
OHJ07_1_contig_10 500 1000 180 500 500 1.0000000
OHJ07_1_contig_10 1000 1500 171 500 500 1.0000000
OHJ07_1_contig_10 1500 2000 79 380 500 0.7600000
OHJ07_1_contig_10 2000 2500 62 500 500 1.0000000
OHJ07_1_contig_10 2500 3000 96 500 500 1.0000000
OHJ07_1_contig_10 3000 3500 76 500 500 1.0000000
OHJ07_1_contig_10 3500 4000 87 500 500 1.0000000
OHJ07_1_contig_10 4000 4500 60 500 500 1.0000000
OHJ07_1_contig_10 4500 5000 64 500 500 1.0000000
OHJ07_1_contig_10 5000 5468 213 468 468 1.0000000
OHJ07_1_contig_100 0 500 459 500 500 1.0000000
OHJ07_1_contig_100 500 1000 156 500 500 1.0000000
OHJ07_1_contig_100 1000 1314 77 305 314 0.9713376
OHJ07_1_contig_1000 0 500 239 500 500 1.0000000
OHJ07_1_contig_1000 500 1000 226 500 500 1.0000000
OHJ07_1_contig_1000 1000 1500 238 500 500 1.0000000
OHJ07_1_contig_1000 1500 2000 263 500 500 1.0000000
I would like to sort it based on contig length. I have another file of ordered contig names in the first column. The other file has other information, like contig length in column 2 (this file was produced with samtools faidx), eg:
OHJ07_1_contig_25270 888266 96530655 60 61
OHJ07_1_contig_36751 583964 120924448 60 61
OHJ07_1_contig_44057 504884 134192571 60 61
OHJ07_1_contig_21721 415942 87354744 60 61
OHJ07_1_contig_46339 411691 143341916 60 61
OHJ07_1_contig_44022 330441 133783765 60 61
Since each contig is a different length, and has a different number of rows in the first column, what's the best way to go about this?
Which one is the column with contigs lengths?
I see you suggest bash, but to me this is the type of question that asks for a custom (python) script since it's a bit more complicated than simple text manipulations.