Hi,
I'm trying to count reads mapping to sense strand. I have doubts which counts file I should chose from this pipeline. I think is "plate_R.counts" because has more reads counted in total. Am I right?
Library creation kit -> E7420S NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®
I would also appreciate a nice tutorial to understand Illumina paired-end library preparation, alignment, counting...
Thanks!
P.S I read a previous post asking similar questions but still I have doubts!
#################################################
#BWA HT.seq bacterial paired-end RNA-seq pipeline
#################################################
# Get the genome file from the command line
genome_file=$1
# Get the fastq file from the command line
fastq_file_R1=$2
# Get the fastq file from the command line
fastq_file_R2=$3
#get gff
GFF=$6
#BWA index (default settings)
bwa index $genome_file
#BWA align
bwa mem -t 8 $genome_file $fastq_file_R1 $fastq_file_R2 | gzip -3 > P_S1_L001_aln-pe.sam.gz
#Flagstat
#Convert .sam to .bam to input to Flagstat
samtools view -b -S -o P_S1_L001_aln-pe.bam P_S1_L001_aln-pe.sam.gz
samtools flagstat P_S1_L001_aln-pe.bam
#Count reads mapped with htseq-count
samtools sort -n P_S1_L001_aln-pe.bam plate.sorted
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s yes plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_F.counts
python -m HTSeq.scripts.count -m intersection-nonempty -f bam -a 10 -t mRNA -i Parent -r name -s reverse plate.sorted.bam $GFF | awk 'n>=5 { print a[n%5] } { a[n++%5]=$0 }' > plate_R.counts
how about this:
What I was trying to say is that I need the number of reads mapping to sense and the number of reads mapping antisense to the annotated genes, not the the original sequence.
Try bedtools intersect (full documentation of options at https://bedtools.readthedocs.org/en/latest/content/tools/intersect.html)
Your command might look something like this (not tested):
That will output the count of reads overlapping on the anti-sense strand with each gene. To get the total across all genes, sum the counts.