I'm trying to convert a bam file with paired end read alignments to bed format, which is required by the F-Seq peak calling program. The bedtools bamToBed program is complaining that reads are marked as paired, but their mate doesn't appear next to it in the bam file. This is after running samtools sort -n
on the bam file, which is what bedtools says to do to fix this. The first 10 lines from the bam are below.
HISEQ-2500-1:79:C62NNANXX:4:1101:1216:7741 163 gi|358485510|ref|NC_006089.3| 88359766 60 36M = 88359773 43 AGTTGAAAAAATGTGCTCAGGCTTCCTATTGGTTAA BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF X0:i:1 X1:i:0 MD:Z:36 PG:Z:MarkDuplicates XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1216:7741 83 gi|358485510|ref|NC_006089.3| 88359773 60 36M = 88359766 -43 AAAATGTGCTCAGGCTTCCTATTGGTTAAGTGAAAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB X0:i:1 X1:i:0 MD:Z:36 PG:Z:MarkDuplicates XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1216:12449 163 gi|358485505|ref|NC_006094.3| 33794145 60 36M = 33794149 40 CAAATGGCTTTGGACTGCTTCTGCCAAATCCCAAGC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF X0:i:1 X1:i:0 MD:Z:36 PG:Z:MarkDuplicates XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1216:12449 83 gi|358485505|ref|NC_006094.3| 33794149 60 36M = 33794145 -40 TGGCTTTGGACTGCTTCTGCCAAATCCCAAGCCCAG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB X0:i:1 X1:i:0 MD:Z:36 PG:Z:MarkDuplicates XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1216:21049 99 gi|358485510|ref|NC_006089.3| 145399305 60 36M = 145399308 39 CCACTGATATTTACTCTTCCTAAGCTCTAACTACAG BBBBBFBFFFFFFFFFFFFFFF<FFFFFBFFFFFFF X0:i:1 X1:i:0 MD:Z:36 PG:Z:MarkDuplicates XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1216:21049 147 gi|358485510|ref|NC_006089.3| 145399308 60 36M = 145399305 -39 CTGATATTTACTCTTCCTAAGCTCTAACTACAGCTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB X0:i:1 X1:i:0 MD:Z:36 PG:Z:MarkDuplicates XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1217:14270 163 gi|358485497|ref|NC_006102.3| 9271537 60 36M = 9271612 110 GGTTTGGACCACTGCTTTGGTTGTTTATGAGCAGAG <<BBBFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFF X0:i:1 X1:i:0 MD:Z:36 PG:Z:MarkDuplicates XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1217:14270 83 gi|358485497|ref|NC_006102.3| 9271612 60 1S35M = 9271537 -110 CCAATGCCACAGAAAGGAACGTTCCTGTTTTGCCTA #FFFFF/FBFB<FFFFFFFFFFB<FFFFFFF<//BB X0:i:1 X1:i:0 XC:i:35 MD:Z:35 PG:Z:MarkDuplicates XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1217:18107 99 gi|358485509|ref|NC_006090.3| 33014530 60 36M = 33014585 91 TAATTGAAAGTCTCAAATAGTATGTGACATGCCGTT BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF X0:i:1 X1:i:0 MD:Z:36 PG:Z:MarkDuplicates XG:i:0 AM:i:25 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
HISEQ-2500-1:79:C62NNANXX:4:1101:1217:18107 147 gi|358485509|ref|NC_006090.3| 33014585 60 36M = 33014530 -91 CTGATGTATGGTGGCTGCTGAAACTGCTTGCAAAAG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB X0:i:1 X1:i:0 MD:Z:28C0T6 PG:Z:MarkDuplicates XG:i:0 AM:i:25 NM:i:2 SM:i:25 XM:i:2 XO:i:0 XT:A:U
What is the output of
Well, according to samtools, most of your reads are properly paired. Maybe bamToBed needs all reads to be properly paired (I never used it)? You may try to filter only properly paired reads and pipe them to bamToBed.
Hi I did the same step too, but the problem persist. I got the bam file from samtools after bowtie pe alignment, getting this results
then did the step that you propose getting this warning
I know that is a warning but the bedpe file is empty. If anyone knows or any advice I really appreciate.
Hello, you have to sort your BAM file by read name first, have a look at my previous answer above to see how to do that.
Hi, I did that you say, step by step:
and tried this too
and get the same warning, but this time is generated a bedpe file, but when I opened in IGV didn't show nothing. I don't know exactly if my bam file is corrupt or what else. The comand to generate the sam file was:
You have to leave the BAM file sorted by read name in order to generate a BEDPE file. When you sort by read name it places the alignments from each paired read on consecutive lines, this allows bedtools to scan through the file every two lines and produce a BEDPE record. It is much more efficient then having to keep track of paired reads which could appear anywhere in the BAM file.
Thanks, now is generating well the bedpe without warnings. Now my question is about if should I transform the bedpe to another format as a wig maybe to visualize in IGV, because IGV opened wrongly the bedpe file and I am not acquainted with IGV.
To the best of my knowledge I don't think IGV can display BEDPE files. If you want to look at the coverage either load in the BAM file directly or create a bedGraph file from the BAM file (BEDtools genomecov command can do this)
Hi, thanks for reply. Your answer was very useful. I know that maybe correspond to another post but I need some help. Now I'm trying to get the genomecov as you recommend but have problem with the
-g
option, I don't know what kind of genome file is required, I read that an UCSC genome is required but my genome is not there. If you know how to make it I would appreciate.Hi there, a genome file is a tab-delimited file which lists the chromosome name in the first column and the length of the chromosome in the second column. See this Biostars post for details.
Hi I am in a similar situation, but my problem is that the bedpe file is empty while there no warning or error during the bedtools bamtobed.
My PE.sorted.fixed.bam file is already been sorted: @HD VN:1.0 SO:queryname
Do you know what might be the problem?