After sorting bam file not able to index ?
0
0
Entering edit mode
5.9 years ago
sunnykevin97 ▴ 990

Hi after sorting not able to index the bam file showing error, how to sort it out ?

samtools index -b -@ 4 A-DS1546_1__R1_dedup_2_ReadGroupssorted.bam

[E::hts_idx_push] unsorted positions on sequence #1: 1 followed by 0 samtools index: failed to create index for "A-DS1546_1__R1_dedup_2_ReadGroupssorted.bam"

thanks!

alignment next-gen • 7.5k views
ADD COMMENT
1
Entering edit mode

As said finswimmer in this post : How to specify the sort based on name in samtools sort?

you can only index the bam file when it ist coordinate-sorted

ADD REPLY
0
Entering edit mode

I tried to index the sorted bam file

ADD REPLY
0
Entering edit mode

Hello sunnykevin97,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Can you show the sort command?

ADD REPLY
0
Entering edit mode

samtools sort -@ 4 A-DS1546_1__R1_dedup_2_ReadGroups.bam -o A-DS1546_1__R1_dedup_2_ReadGroupssorted.bam

ADD REPLY
0
Entering edit mode

Do you have information in your A-DS1546_1__R1_dedup_2_ReadGroupssorted.bam (not empty)

ADD REPLY
0
Entering edit mode

It generated a sorted bam file ~70 GB

ADD REPLY
0
Entering edit mode

I shorten your comment as you reply the exact same thing to ATpoint and he has took his time to reformat your comment

This is a sam file not a bam file

ADD REPLY
0
Entering edit mode

Come on, show some effort. Did you try anything, did you look at the data? Please show some lines of the file.

ADD REPLY
0
Entering edit mode
samtools view A-DS1546_1__R1_dedup_2_ReadGroupssorted.bam | head

C0V63ACXX120821:6:2101:1050:8813    73  chr1    0   0   101M    =   0   0   NTTCCCGTGGGGGTGTGGCTAGGCGAGACGTCATGAGCTACACTTGGAGTGTGTGCTTGATGCCAGCTCTCTTTGATCAGCGATGATTTAGAAGGCGTATT   #1=BDFFDFHHHH:@FDHGIIIIAGBA/;ECHHEHAEDEFDBDEEEC;>(;;@?A@AC<<3>A::A?ACCCCCCCACCACCB5A@>:BD>ACCCBBB9>@>   X0:i:3  X1:i:5  MD:Z:0  XG:i:0  AM:i:0  NM:i:0  SM:i:0  XM:i:3  XN:i:100    XO:i:0  XT:A:N  RG:Z:A-DS1546_1__R1
C0V4CACXX120821:1:1107:15183:40205  89  chr1    0   0   101M    =   0   0   ATTTCCCGTGGGGGTTGGGCTAGGCGAGACGTCATGAGCTACACTTGGAGTGTGTGCTTGATGCCAGCTCTCTTTGATCAGCGATGATTTAGAAGGCGTAT   ###########################################AEGHCFB@FAD?B@BB9B3:)?:31C?919@@GF?8<<GHED>IIDFBHHDAABA;;?   X0:i:3  X1:i:0  XA:Z:chrX,-51621000,101M,5;gi|117937807|gb|DQ188829.2|,+142,101M,5; MD:Z:0  XG:i:0  AM:i:0  NM:i:0  SM:i:0  XM:i:5  XN:i:100    XO:i:0  XT:A:N  RG:Z:A-DS1546_1__R1
C0V4CACXX120821:1:2202:17464:95285  163 chr1    3000001 29  15S20M1I46M19S  =   3000233 333 ATAATCTGTGTCACCTTTCCGAGGCTACCACTTCCAAGGACTACAGGAGTCGTCTCAGGTGAAAGACCATACGAAAGAGGCCCGGTTCGTACGGTAAAACA   ;@@ADD?DF4ADDBE<GGCHG<C1E=<FHFDC?DBBB3;BD;?/9B9DGHCAFFDDDDA7@.??E>DE@C@3;(6'9A?######################   MD:Z:21C33A10   XG:i:1  AM:i:29 NM:i:3  SM:i:29 XM:i:2  XO:i:1  XT:A:M  RG:Z:A-DS1546_1__R1
C0V4CACXX120821:1:2316:15160:19446  99  chr1    3000001 29  3S20M1I76M1S    =   3000256 356 ACCTTTCCGAGGCTACCACTTCCAACGACTACAGGAGTCGTCTCAGGTGAAAGACCATAAGAAAGAGGCCTGGTGTTTATGTATAAAAAGGTAATTATTAA   @@@DFFFFH>FADGIJJGGGIHHEHGIEHGGJIJIDIBFGFHEHIGG8@GHCIIHHHHHC?>CCDC>?ABDDD;?9?CC<>>@CCACCCBD>C::>@@###   MD:Z:96 XG:i:1  AM:i:29 NM:i:1  SM:i:29 XM:i:0  XO:i:1  XT:A:M  RG:Z:A-DS1546_1__R1
C0V63ACXX120821:6:2206:5185:69456   163 chr1    3000001 29  62S20M1I18M =   3000173 273 CTTCAACTATTTATACGATGTACCAATGAACACCTTCATCAAGCTCGATAATCTGTGTCACCTTTCCGAGGCTACCACTTCCAACGACTACAGGAGTCGTC   CCCFFFFFHHHHHJJJJIJJHHIJJJIIJJJJJJJJJJJJIJJJIJJJJFHIIJJIJJJJJJIJJJJIJJHFFFEECEEEDDCDDDDDDDDDDDDD?CBB?   MD:Z:38 XG:i:1  AM:i:29 NM:i:1  SM:i:29 XM:i:0  XO:i:1  XT:A:M  RG:Z:A-DS1546_1__R1
C0V63ACXX120821:7:1106:4563:28548   163 chr1    3000001 29  31S20M1I14M35S  =   3000035 135 ACCTTCATCAAGCTCGATAATCTGTGTCACCTTTCCGAGGCTACCACTTCCAACGACTACAGGAGTCGTCTCAGGTGAAAGACCATAAGAAAGAGGCCTGG   CCCFFFFFGHHHHJJJJJJJJJJGGGIJJJJJJJJJJJIJJGIIJGIIIJGHIGIIJJHFHHFFFEFDDDDDEDD@CDACDDDCDDCDDCDDCDDDDDCDB   MD:Z:34 XG:i:1  AM:i:29 NM:i:1  SM:i:29 XM:i:0  XO:i:1  XT:A:M  RG:Z:A-DS1546_1__R1
C0V63ACXX120821:7:2206:5222:58210   163 chr1    3000001 29  62S20M1I18M =   3000225 325 CTTCAACTATTTATACGATGTACCAATGAACACCTTCATCAAGCTCGATAATCTGTGTCACATTTCCGAGGCTACCACTTCCAAGGACTACAGGAGTCGTC   @@<+B?DDHHGAHIIIBGC@CCFHJHGFGHGGIJJJIGIAHFHBGIIG@FD<FH@<FBFFF).8CDF=BEED??CECEECCCAC>=?A:?CCA<A######   MD:Z:21C16  XG:i:1  AM:i:29 NM:i:2  SM:i:29 XM:i:1  XO:i:1  XT:A:M  RG:Z:A-DS1546_1__R1
C0V63ACXX120821:7:2208:17232:6944   99  chr1    3000001 29  26S20M1I54M =   3000209 309 CATCAAGCTCGATAATCTGTGTCACCTTTCCGAGGCTACCACTTCCAAGGACTACAGGAGTCGTCTCAGGTGAAAGACCATAAGAAAGAGTCCTGGTGTTT   CCCFFFFFHHHHHJJJJJFHGHJJJJJJJJIHFHGIJJJJJJJJJJIGHIIIIEHHJIGICEHBFFFFEE>ACEC>CDDDDDDDDDDDDC@CDDDD3<<@<   MD:Z:21C41G10   XG:i:1  AM:i:29 NM:i:3  SM:i:29 XM:i:2  XO:i:1  XT:A:M  RG:Z:A-DS1546_1__R1
C0V63ACXX120821:7:2210:18879:53607  99  chr1    3000001 29  59S20M1I21M =   3000217 317 CAACTATTTATACGATGTACCAATGAACACCTTCATCAAGCTCGATAATCTGTGTCACCTTTCCGAGGCTACCACTTCCAACGACTACAGGAGTCGTCTCA   @@@DDDDDHA?DFD7CB3AFAHFB3A<+A;;@GG@<EDDEHICE)0B@F<BBBDHCDBAFEGB=@AA=E?D);;36;>C:>CBBB/':A>3(23?<22??:   MD:Z:41 XG:i:1  AM:i:29 NM:i:1  SM:i:29 XM:i:0  XO:i:1  XT:A:M  RG:Z:A-DS1546_1__R1
C0V63ACXX120821:7:2309:11600:84812  163 chr1    3000004 60  17M1I83M    =   3000299 396 CCGAGGCTACCACTTCCAACGACTACAGGAGTCGTCTCAGGTGAAAGACCATAAGAAAGAGGCCTGGTGTTTATGTATAAAAAGGTAATTATTAATAAGTT   @CCFFFFDHHGFHJJJJIIIJJJIJJJJJ@FGIHHJJJJJJFHEGIJIJIEGHIJJJHFEHBFFFDD;A=?CCDDDFDDDDDDDD3>CDDEDDCCDDCAD:   X0:i:1  X1:i:0  MD:Z:93G6   XG:i:1  AM:i:37 NM:i:2  SM:i:37 XM:i:1  XO:i:1  XT:A:U  RG:Z:A-DS1546_1__R1
ADD REPLY
0
Entering edit mode

Folks, is it normal for those first two entries to not have the -f 4 flagged, but to have mapping coordinates of 0?

ADD REPLY
0
Entering edit mode

Seems like the mate is unmapped for the two first entries. But if the read is mapped there should has be a position

Can we see the alignment command line please ?

ADD REPLY
0
Entering edit mode

hello, this the command line they used..

@SQ SN:gi|117937807|gb|DQ188829.2|  LN:17010
@RG ID:A-DS1546_1__R1   PL:Illumina PU:01   LB:Library  SM:DS1546
@PG ID:bwa  PN:bwa  VN:0.7.8-r455   CL:/groups/reich/sw/bwa-0.7.8/bwa sampe /groups/reich/reference-genomes/loxAfr3_mod_mt/bwa-0.7.8/loxAfr3com.v2.fa aln.sai1 aln.sai2 R1.trim.fastq.gz R2.trim.fastq.gz
ADD REPLY
0
Entering edit mode

Could you check if your bam is not truncated or corrupt

How to systematically check if a bam file is truncated

ADD REPLY
0
Entering edit mode

How was the unsorted bam file generated? Is this the output from an aligner or is it converted from a sam file? Check to confirm that there is a header on your sam file:

samtools view -H sortedBamFile.bam
ADD REPLY
0
Entering edit mode

For alignment BWA sampe they used, I downloaded 7 sorted bam files from a NCBI bioproject. I don't have any problem with other bam files except this. I tried to call variants using this bam file the problems persists repeatedly. How to sort it out ?

Bam file before sorting

samtools view -H A-DS1546_1__R1_dedup_2_ReadGroups.bam | head

@HD VN:1.0  SO:coordinate
@SQ SN:chr1 LN:214701375
@SQ SN:chr10    LN:103745917
@SQ SN:chr11    LN:68864707
@SQ SN:chr12    LN:83603808
@SQ SN:chr13    LN:103893473
@SQ SN:chr14    LN:70853943
@SQ SN:chr15    LN:88640724
@SQ SN:chr16    LN:45784276
@SQ SN:chr17    LN:69133140

Bam file after sorting

samtools view -H A-DS1546_1__R1_dedup_2_ReadGroupssorted.bam | head

@HD VN:1.0  SO:coordinate
@SQ SN:chr1 LN:214701375
@SQ SN:chr10    LN:103745917
@SQ SN:chr11    LN:68864707
@SQ SN:chr12    LN:83603808
@SQ SN:chr13    LN:103893473
@SQ SN:chr14    LN:70853943
@SQ SN:chr15    LN:88640724
@SQ SN:chr16    LN:45784276
@SQ SN:chr17    LN:69133140
ADD REPLY

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6