While intersecting a bam file and bed to obtain the total number of reads that map to a class of genes, bedtools outputs the following error:
bedtools intersect -a some.bam -b LINE.bed -sorted -s -wa -bed -f 1.0 -nonamecheck | wc -l
ERROR: File some.bam has inconsistent naming convention for record:
-1 -1 HWI-ST558:319:HJHNYADXX:1:1101:10003:66276 1:N:0:GATCAG 0 +
137933
I poked around the internet and this appears to occour when chromosome naming conventions do not match. However, the stated solution, option -nonamecheck
did not work. The culprit is (at least) this alignment:
samtools view some.bam | grep 'HWI-ST558:319:HJHNYADXX:2:2104:11288:74456'
HWI-ST558:319:HJHNYADXX:2:2104:11288:74456 0 chr4 61813677 0 27M * 0 0 GACAAATCAAGAAAACCATTAAACAGC FFFFHHDHFHJJJGIGFFFHGIG@GDH XA:i:0 MD:Z:27 RG:Z:1 NM:i:0 XM:i:2
Which to me does not appear to be any different from other alignments in the file:
samtools view some.bam | head
HWI-ST558:319:HJHNYADXX:1:1110:11857:4127 0 chr1 5571 255 29M * 0 0 GCCACATGTGTGGACCAGGTCAACACATA FFFFHHHHHJJJJJJJJJJIIJJJJJIJJ XA:i:2 MD:Z:5G12A10 RG:Z:1 NM:i:2
HWI-ST558:319:HJHNYADXX:1:1210:19976:51309 16 chr1 8405 255 27M * 0 0 GACTCAGTTAGTTTGATCAGTGTGTTT HCHHGIJJJIJIIIGIIJHGHHHFFDF XA:i:0 MD:Z:27 RG:Z:1 NM:i:0
HWI-ST558:319:HJHNYADXX:1:2101:16679:94808 0 chr1 13822 255 31M * 0 0 TTAATTTGCCCTCATAAACTCTGATCAAAGT FFFFHHHHHJJJJJJJJIHJIJJJJJJJJJJ XA:i:0 MD:Z:31 RG:Z:1 NM:i:0
HWI-ST558:319:HJHNYADXX:1:2209:11460:28507 0 chr1 19815 0 24M * 0 0 TGACGTTGTGTGGACATTACCACT DDFFHHHHHJJJJJJJJJJJJJJJXA:i:0 MD:Z:24 RG:Z:1 NM:i:0 XM:i:2
HWI-ST558:319:HJHNYADXX:2:2206:8659:12479 0 chr1 22434 0 27M * 0 0 TAACCGGCACGGGCTCATTCTGAAAGC FEFFHDAFHJJDEGBHIIJ>GGCHIJJ XA:i:3 MD:Z:5A0A3A16 RG:Z:1 NM:i:3 XM:i:2
HWI-ST558:319:HJHNYADXX:2:1116:12702:23989 0 chr1 24671 0 30M * 0 0 ATAGATATTATCAGACACACTGTGAACATT FFFFHFHHHJIJIJJJJJJIJJJJIJJFHI XA:i:1 MD:Z:3A26 RG:Z:1 NM:i:1 XM:i:2
HWI-ST558:319:HJHNYADXX:2:2112:15091:10459 0 chr1 37645 0 29M * 0 0 CTGTGCAGAGCTGCAGCCCTCAAGGAGTT FFADDFFDHEGBDD@GIEIGHEGHC8:FG XA:i:1 MD:Z:21C7 RG:Z:1 NM:i:1 XM:i:2
HWI-ST558:319:HJHNYADXX:1:2101:17286:97920 0 chr1 43096 0 26M * 0 0 CGGTTTGACTGTACAAGTGAAATACC FFFDHHHHHJJIJJJJIHIHEIIIJJ XA:i:0 MD:Z:26 RG:Z:1 NM:i:0 XM:i:2
HWI-ST558:319:HJHNYADXX:2:1108:5430:68222 0 chr1 43273 0 24M * 0 0 TCGTTGAGAAGAGTAGGGACGGTA FFFFHHHHHJJJJGHIJJJJIIGIXA:i:0 MD:Z:24 RG:Z:1 NM:i:0 XM:i:2
HWI-ST558:319:HJHNYADXX:2:2207:2908:11814 16 chr1 43750 255 29M * 0 0 TACGCTTCAGTACGATCACTGATGTTGAT JIJIJJIJJJJJJJJJJIJJHHHHHFFFF XA:i:1 MD:Z:9T19 RG:Z:1 NM:i:1
This happens for all 12 of my bam files (mapped to Danrer7, bowtie-0.12.8*). I am a bit at a lost of what is happening here. Bedtools Version: v2.23.0
bowtie-0.12.8. It is a very old version, but I am using it to be consistent with another set of results that someone else produced.
But this seems to be produced by bedtools intersect, rather than by bowtie, because the space is not present in the read name from the bam. Also, bedToBam returns a proper read name:
Actually, I just tested converting bam to bed before intersecting and it does not throw out any error:
If I am reading this test correctly it might be a bug in intersectBed. I will wait to see if Aaron Quinlan shows up here, otherwise I report it soonish.
But this seems to be produced by bedtools intersect: no because '
1:N:0:GATCAG
is typically produced by a sequencer see: http://en.wikipedia.org/wiki/FASTQ_format