Bismark alignment - Chromosomal sequence could not be extracted for ... - Error or not?
1
0
Entering edit mode
7.8 years ago

Hi I am performing alignment of RRBS data (single-end sequencing) using bismark and bowtie2 and I have a question about the messages it is outputting. Is this an error I should consider or can this be just ignored? I have not found any clear answers about this.

Here is my code:

~bismark –q --bowtie2 --sam /home/undergrad3/Desktop/RRBS/Ssc10.2 *R1_trimmed.fq

Here is the output message (I am getting a lot of these!):

Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1206:2909:33932_1:N:0:TGGTGA  GJ058815.1  2
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1206:19918:37378_1:N:0:TGGTGA GJ058815.1  2
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1206:15595:46100_1:N:0:TGGTGA GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1206:10876:47243_1:N:0:TGGTGA GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1207:9293:8594_1:N:0:TGGTGA   GJ058815.1  2
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1207:26250:9913_1:N:0:TGGTGA  GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1207:12388:11812_1:N:0:TGGTGA GJ058815.1  2
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1207:4432:19619_1:N:0:TGGTGA  GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1207:30492:23557_1:N:0:TGGTGA GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1207:10155:24507_1:N:0:TGGTGA GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1207:12236:43867_1:N:0:TGGTGA GJ058815.1  2
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1208:14894:8348_1:N:0:TGGTGA  GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1208:30472:23663_1:N:0:TGGTGA GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1208:10551:27373_1:N:0:TGGTGA GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1208:6218:28059_1:N:0:TGGTGA  GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1208:10115:35972_1:N:0:TGGTGA GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1208:10054:36007_1:N:0:TGGTGA GJ058815.1  1
Chromosomal sequence could not be extracted for K00363:17:HGCMHBBXX:4:1208:20232:36235_1:N:0:TGGTGA GJ058815.1  1

and it keeps going...

Any help is appreciated! Thank you!

bismark • 3.5k views
ADD COMMENT
0
Entering edit mode
7.1 years ago
Gon ▴ 540

This has already been answered by the author of the software in several websites, here is an example: https://github.com/FelixKrueger/Bismark/issues/68

The warning message is reported when a read aligns to the very end (or to the very beginning) of a reference sequence. Then, Bismark tries to look at 2 bases downstream (or upstream) to calculate the cytosine context and fails to do so (there are no more bases in the reference). I've personally seen this error often with the mitochondrial chromosome (MT) and some scaffolds (e.g. KI270336.1), which seems to be your particular case. If this is indeed the case you can simply disregard the warning message.

Apparently, if you are using amplicons with short sequences, this warning can be more prevalent and have an important effect on your results. The author recommends to append two base pairs to the end (and beginning) of your reference sequence(s) (NN).

ADD COMMENT

Login before adding your answer.

Traffic: 2447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6