Issue with naming convention in bedtools
2
4
Entering edit mode
9.6 years ago
A. Domingues ★ 2.7k

While intersecting a bam file and bed to obtain the total number of reads that map to a class of genes, bedtools outputs the following error:

bedtools intersect -a some.bam -b LINE.bed -sorted -s -wa -bed -f 1.0 -nonamecheck | wc -l

ERROR: File some.bam has inconsistent naming convention for record:
      -1      -1      HWI-ST558:319:HJHNYADXX:1:1101:10003:66276 1:N:0:GATCAG 0       +
137933

I poked around the internet and this appears to occour when chromosome naming conventions do not match. However, the stated solution, option -nonamecheck did not work. The culprit is (at least) this alignment:

samtools view some.bam | grep 'HWI-ST558:319:HJHNYADXX:2:2104:11288:74456'
HWI-ST558:319:HJHNYADXX:2:2104:11288:74456      0       chr4    61813677        0       27M     *       0       0       GACAAATCAAGAAAACCATTAAACAGC     FFFFHHDHFHJJJGIGFFFHGIG@GDH      XA:i:0  MD:Z:27 RG:Z:1  NM:i:0  XM:i:2

Which to me does not appear to be any different from other alignments in the file:

samtools view some.bam | head
HWI-ST558:319:HJHNYADXX:1:1110:11857:4127       0       chr1    5571    255     29M     *       0       0       GCCACATGTGTGGACCAGGTCAACACATA   FFFFHHHHHJJJJJJJJJJIIJJJJJIJJ    XA:i:2  MD:Z:5G12A10    RG:Z:1  NM:i:2
HWI-ST558:319:HJHNYADXX:1:1210:19976:51309      16      chr1    8405    255     27M     *       0       0       GACTCAGTTAGTTTGATCAGTGTGTTT     HCHHGIJJJIJIIIGIIJHGHHHFFDF      XA:i:0  MD:Z:27 RG:Z:1  NM:i:0
HWI-ST558:319:HJHNYADXX:1:2101:16679:94808      0       chr1    13822   255     31M     *       0       0       TTAATTTGCCCTCATAAACTCTGATCAAAGT FFFFHHHHHJJJJJJJJIHJIJJJJJJJJJJ  XA:i:0  MD:Z:31 RG:Z:1  NM:i:0
HWI-ST558:319:HJHNYADXX:1:2209:11460:28507      0       chr1    19815   0       24M     *       0       0       TGACGTTGTGTGGACATTACCACT        DDFFHHHHHJJJJJJJJJJJJJJJXA:i:0   MD:Z:24 RG:Z:1  NM:i:0  XM:i:2
HWI-ST558:319:HJHNYADXX:2:2206:8659:12479       0       chr1    22434   0       27M     *       0       0       TAACCGGCACGGGCTCATTCTGAAAGC     FEFFHDAFHJJDEGBHIIJ>GGCHIJJ      XA:i:3  MD:Z:5A0A3A16   RG:Z:1  NM:i:3  XM:i:2
HWI-ST558:319:HJHNYADXX:2:1116:12702:23989      0       chr1    24671   0       30M     *       0       0       ATAGATATTATCAGACACACTGTGAACATT  FFFFHFHHHJIJIJJJJJJIJJJJIJJFHI   XA:i:1  MD:Z:3A26       RG:Z:1  NM:i:1  XM:i:2
HWI-ST558:319:HJHNYADXX:2:2112:15091:10459      0       chr1    37645   0       29M     *       0       0       CTGTGCAGAGCTGCAGCCCTCAAGGAGTT   FFADDFFDHEGBDD@GIEIGHEGHC8:FG    XA:i:1  MD:Z:21C7       RG:Z:1  NM:i:1  XM:i:2
HWI-ST558:319:HJHNYADXX:1:2101:17286:97920      0       chr1    43096   0       26M     *       0       0       CGGTTTGACTGTACAAGTGAAATACC      FFFDHHHHHJJIJJJJIHIHEIIIJJ       XA:i:0  MD:Z:26 RG:Z:1  NM:i:0  XM:i:2
HWI-ST558:319:HJHNYADXX:2:1108:5430:68222       0       chr1    43273   0       24M     *       0       0       TCGTTGAGAAGAGTAGGGACGGTA        FFFFHHHHHJJJJGHIJJJJIIGIXA:i:0   MD:Z:24 RG:Z:1  NM:i:0  XM:i:2
HWI-ST558:319:HJHNYADXX:2:2207:2908:11814       16      chr1    43750   255     29M     *       0       0       TACGCTTCAGTACGATCACTGATGTTGAT   JIJIJJIJJJJJJJJJJIJJHHHHHFFFF    XA:i:1  MD:Z:9T19       RG:Z:1  NM:i:1

This happens for all 12 of my bam files (mapped to Danrer7, bowtie-0.12.8*). I am a bit at a lost of what is happening here. Bedtools Version: v2.23.0

SAM bedtools • 13k views
ADD COMMENT
2
Entering edit mode
9.5 years ago
A. Domingues ★ 2.7k

I have spent some more time finding the source of the issue, and it turns to be something simple:

Mapping was done against Zv9 and the bed file contained coordinates from Zv10 - UCSC now has this version as default and I obviously did not pay enough attention (I hang my head in shame). When replacing with the Zv9 bed file the issue is gone.

I would expect that in such a case, the error message would something like '-a and -b coordinates not matching' but the fact that it gives an error is already re-assuring.

ADD COMMENT
5
Entering edit mode
9.6 years ago

I think it's more because the read name contains a space:

 HWI-ST558:319:HJHNYADXX:1:1101:10003:66276 1:N:0:GATCAG

This name should have been trimmed to:

HWI-ST558:319:HJHNYADXX:1:1101:10003:66276

which aligner did you use?

ADD COMMENT
0
Entering edit mode

which aligner did you use?

bowtie-0.12.8. It is a very old version, but I am using it to be consistent with another set of results that someone else produced.

I think it's more because the read name contains a space:

But this seems to be produced by bedtools intersect, rather than by bowtie, because the space is not present in the read name from the bam. Also, bedToBam returns a proper read name:

bamToBed -i some.bam | grep 'HWI-ST558:319:HJHNYADXX:2:2104:11288:74456'
chr4    61813676        61813703        HWI-ST558:319:HJHNYADXX:2:2104:11288:74456      0       +

Actually, I just tested converting bam to bed before intersecting and it does not throw out any error:

bamToBed -i some.bam | bedtools intersect -a - -b LINE.bed -sorted -s -wa -bed -f 1.0 | wc -l

If I am reading this test correctly it might be a bug in intersectBed. I will wait to see if Aaron Quinlan shows up here, otherwise I report it soonish.

ADD REPLY
0
Entering edit mode

But this seems to be produced by bedtools intersect: no because '1:N:0:GATCAG is typically produced by a sequencer see: http://en.wikipedia.org/wiki/FASTQ_format

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
EAS139      the unique instrument name
136         the run id
FC706VJ     the flowcell id
2           flowcell lane
2104        tile number within the flowcell lane
15343       'x'-coordinate of the cluster within the tile
197393      'y'-coordinate of the cluster within the tile
1           the member of a pair, 1 or 2 (paired-end or mate-pair reads only)
Y           Y if the read is filtered, N otherwise
18          0 when none of the control bits are on, otherwise it is an even number
ATCACG      index sequence
ADD REPLY

Login before adding your answer.

Traffic: 1517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6