I was following DESeq2 manual to process my simple RNASeq paired end data that involves wil type and stress treated plant.
I ran feature count using HTSeq (version 0.9.1) with the command,
htseq-count -a 10 -s 'no' WT-CON.sam /home/exp/DESEQ2/genes.gtf > WT-DESeq.txt
I noticed a Warning: 53525476 reads with missing mate encountered.
100000 GFF lines processed.
.
604523 GFF lines processed.
Warning: Read K00171:29:H2NYHBBXX:8:1128:23045:10019 claims to have an aligned mate which could not be found in an adjacent line.
100000 SAM alignment record pairs processed.
.
.
56400000 SAM alignment record pairs processed.
56500000 SAM alignment record pairs processed.
Warning: 53525476 reads with missing mate encountered.
56509150 SAM alignment pairs processed.
A previous post and a comment by Ian highlight the sort by name (-n) option. Currently, I sort in by position and converted to SAM,
samtools sort -o WT-CON.bam /home/exp/DESEQ2/WT/accepted_hits.bam
samtools view WT-CON.bam > WT-CON.sam
Am not sure how can I overcome the warning. Do I need to sort BAM with -n and run HTSeq again or any other parameter is missing?
Yes, you have to do that in order to have the aligned mates adjacent to each other. Alternatively, featureCounts can sort the reads automaticaly for you before counting. But it will be slow anyway... sorting bam is always a pain.