Mismatch number of reads after bedtools bam2fastq on RNA-seq data
1
0
Entering edit mode
6.4 years ago
aditi.qamra ▴ 270

Hi,

I am trying to generate fastqs ( to realign them with different parameters) from RNA-seq bams aligned using STAR (run with --outSAMunmapped Within flag). The original fastq was paired end and stranded and I don't have access to that.

I used bedtools bam2fastq ( bedtools bamtofastq -i $bam -fq R1.fq -fq R2.fq ) to get the fastqs.

I got 113,214,136 reads for each fastq file. No. of reads in the bam file also matches this as the output from samtools view -c $bam is 226,428,272. (2*113,214,136).

samtools flagstat $bam

226428272 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
223692746 + 0 mapped (98.79%:-nan%)
226428272 + 0 paired in sequencing
113214136 + 0 read1
113214136 + 0 read2
223692746 + 0 properly paired (98.79%:-nan%)
223692746 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

However, this is strange because I have the fastqc files and the STAR log of these runs and both showed no. of input reads as 107,979,993. The STAR *.log.final.out is as --

Number of input reads        107,979,993
Average input read length       150
Uniquely mapped reads number        103,168,845
Uniquely mapped reads %         95.54%
Number of reads mapped to multiple loci          3,443,385
% of reads mapped to multiple loci      3.19%
Number of reads mapped to too many loci          36,808 
% of reads mapped to too many loci      0.03%
% of reads unmapped: too many mismatches        0.00%
% of reads unmapped: too short      1.16%
% of reads unmapped: other      0.07%

I can't figure what I am I missing here that no of input reads is less than the no. of reads in the BAM file and regenerated fastq ?!

Thanks!

RNA-Seq alignment • 2.6k views
ADD COMMENT
0
Entering edit mode

Hello aditi.qamra,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Thanks Vijay! Sorry I dint know about this option before. Will take care

ADD REPLY
1
Entering edit mode
6.4 years ago
h.mon 35k

Probably the extra reads are due to multi-mappers or other secondary alignments being pulled from the bam more than once. If this is undesired, you have to filter the bam to contain only primary alignments, or you may use picard SamToFastq, which by default includes only primary alignments into the output.

Also, did you sort the bam file by name before bedtools bamtofastq? The online documentation states:

When using this option, it is required that the BAM file is sorted/grouped by the read name. This keeps the resulting records in the two output FASTQ files in the same order. One can sort the BAM file by query name with samtools sort -n aln.bam aln.qsort.

ADD COMMENT
0
Entering edit mode

Thanks h.mon. I did sort them before using bedtools. But you are right it is because of multimappers in the bam.

ADD REPLY

Login before adding your answer.

Traffic: 2590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6