Counting sense reads in bacterial paired-end RNA-seq data
1
1
Entering edit mode
11.0 years ago
biotech ▴ 570

Hi,

I'm trying to count reads mapping to sense strand. I have doubts which counts file I should chose from this pipeline. I think is "plate_R.counts" because has more reads counted in total. Am I right?

Library creation kit -> E7420S NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®

I would also appreciate a nice tutorial to understand Illumina paired-end library preparation, alignment, counting...

Thanks!

P.S I read a previous post asking similar questions but still I have doubts!

#################################################
#BWA HT.seq bacterial paired-end RNA-seq pipeline
#################################################

# Get the genome file from the command line
genome_file=$1
# Get the fastq file from the command line
fastq_file_R1=$2
# Get the fastq file from the command line
fastq_file_R2=$3
#get gff
GFF=$6

#BWA index (default settings)
bwa index $genome_file
#BWA align
bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam.gz

#Flagstat
#Convert .sam to .bam to input to Flagstat
samtools view -b -S -o P_S1_L001_aln-pe.bam P_S1_L001_aln-pe.sam.gz
samtools flagstat P_S1_L001_aln-pe.bam

#Count reads mapped with htseq-count
samtools sort -n P_S1_L001_aln-pe.bam plate.sorted
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s yes plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_F.counts
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s reverse plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_R.counts
RNA-seq HTSeq bacteria paired-end • 5.8k views
ADD COMMENT
0
Entering edit mode

how about this:

samtools view -f 16 your.bam | wc -l
ADD REPLY
0
Entering edit mode

What I was trying to say is that I need the number of reads mapping to sense and the number of reads mapping antisense to the annotated genes, not the the original sequence.

ADD REPLY
0
Entering edit mode

Try bedtools intersect (full documentation of options at https://bedtools.readthedocs.org/en/latest/content/tools/intersect.html)

Your command might look something like this (not tested):

bedtools intersect -S -c -a genes.bed -b mapped_reads.bam

That will output the count of reads overlapping on the anti-sense strand with each gene. To get the total across all genes, sum the counts.

ADD REPLY
3
Entering edit mode
11.0 years ago
David Fredman ★ 1.1k

Phil's suggestion is on the right track. You can use samtools view filters to select reads mapping to the sense or anti-sense strand of your reference sequence. However, the -f flag extracts reads mapping anti-sense. To get the mapping locations of reads mapping to the sense strand, use the -F (filter) option

samtools view -F 16 mapped_reads.bam

To count unique reads (not mapping locations) if you have allowed multiple mapping locations, you may have to make the list of read identifiers non-redundant first:

samtools view -F 16 mapped_reads.bam | cut -f1 | sort | uniq | wc -l

See this gist for those (and other) examples

SAM and BAM filtering one-liners

@author: David Fredman, david.fredmanAAAAAA@gmail.com (sans poly-A tail)
@dependencies: http://sourceforge.net/projects/bamtools/ and http://samtools.sourceforge.net/

Please extend with additional/faster/better solutions via a pull request!

BWA mapping (using piping for minimal disk I/O)

bwa aln -t 8 targetGenome.fa reads.fastq | bwa samse targetGenome.fa - reads.fastq\
| samtools view -bt targetGenome.fa - | samtools sort - reads.bwa.targetGenome

samtools index reads.bwa.targetGenome.bam

Count number of records (unmapped reads + each aligned location per mapped read) in a bam file:

samtools view -c filename.bam

Count with flagstat for additional information:

samtools flagstat filename.bam

Count the number of alignments (reads mapping to multiple locations counted multiple times)

samtools view -F 0x04 -c filename.bam

Count number of mapped reads (not mapped locations) for left and right mate in read pairs

samtools view -F 0x40 filename.bam | cut -f1 | sort | uniq | wc -l
samtools view -f 0x40 -F 0x4 filename.bam | cut -f1 | sort | uniq | wc -l #left mate
samtools view -f 0x80 -F 0x4 filename.bam | cut -f1 | sort | uniq  | wc -l #right mate

Remove unmapped reads, keep the mapped reads:

samtools view -F 0x04 -b in.bam > out.aligned.bam

Count UNmapped reads:

samtools view -f4 -c in.bam

Require minimum mapping quality (to retain reliably mapped reads):

samtools view -q 30 -b in.bam > aligned_reads.q30.bam
samtools view -q 30 -c in.bam #to count alignments with score >30

Require match to be on the sense strand of the reference (samtools flag)

samtools view -F 16

Require match to be on antisense strand (samtools flag)

samtools view -f 16

Require at least N matches at the start of the read:

$N=6
samtools view in.bam \
| perl -lane 'next unless $F[5] =~ /^(\d+)M/;print if $1 >= $N;'

Filter by number of mismatches in BWA generated output, use BWA-specific flag:

Tag Meaning
NM     Edit distance
MD     Mismatching positions/bases
AS     Alignment score
BC     Barcode sequence
X0     Number of best hits
X1     Number of suboptimal hits found by BWA
XN     Number of ambiguous bases in the reference
XM     Number of mismatches in the alignment
XO     Number of gap opens
XG     Number of gap extentions
XT     Type: Unique/Repeat/N/Mate-sw
XA     Alternative hits; format: (chr,pos,CIGAR,NM;)*
XS     Suboptimal alignment score
XF     Support from forward/reverse alignment
XE     Number of supporting seeds

To keep only reads that map without any mismatches:

bamtools filter -tag XM:0 -in reads.bam -out reads.noMismatch.bam

Retain only uniquely mapping reads (reads with a single unambigous mapping location):

If BWA was used it is possible to use the BWA XT flag value U for unique (analogously, R is for repeat). I did not find a simple way to do this with samtools or bamtools, so grep to the rescue:

samtools view reads.bam | grep 'XT:A:U' | samtools view -bS -T referenceSequence.fa - > reads.uniqueMap.bam

However, the concept of "uniquely mapping" is not the cleanest idea - in most scenarios any given read could be placed elsewhere although it may be a lower scoring alignment. Thus, you could instead filter based on mapping quality, to retain the "reliably mapped" reads. Different mappers have different scoring models. As a rule of thumb, min values of 5 or 10 will work well. If you used bowtie/bowtie2, try:

samtools view -b -q 10 foo.bam > foo.filtered.bam

ADD COMMENT
0
Entering edit mode

How can we tell the read counts on plus strand (-F 16)?

what if we set the "--library-type fr-firststrand" or "--library-type fr-secondstrand"? what is the difference?

ADD REPLY

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6