I am using Bowtie 2 version 2.1.0 for pair end RNAseq reads mapping to the CDS(Protein coding gene sequences). I am not able to understand the default setting of mismatch in Bowtie2.
I can see there are two option related to mismatch:
--mp : max penalty for mismatch;lower qual = lower penalty (6)
-N : mismatches in seed alignment; can be 0 or 1 (0)
Please suggest what is the difference between these two and how I can adjust mismatch during mapping.
My aim is to map pair-end read to reference CDS (protein coding gene sequences) and to do raw read count.
Thanks a lot. Can you please suggest what is overall default mismatch for a read in bowtie2 other than seed alignment? How can we change it?
Is there anything that would suggest you to change the default parameters in your experiment? They tend to perform well in most situations. Moreover, regarding your alignment to CDS are you on purpose not taking into account the spliced reads? Because bowtie is not a splice aware aligner.
Radek, I have seen some previous publication, for mapping against genome generally spice aware aligner is used but for mapping against transcriptome or CDS, it is not needed .......please share the link of latest publication where splice aware alinger is used for mapping against CDS if you recommending.
You're right in not using a splice-aware aligned on a transcriptome. However, you should consider moving to HISAT2 in since they're curating that one instead of bowtie2 and tophat2.
https://ccb.jhu.edu/software/hisat2/manual.shtml
Thanks. They written that, HISAT 2 is developed based on the HISAT and Bowtie2 implementations. You are correct, it will good to use Hisat2 instead of Bowtie2.
But again, in Hisat2, I am not able to understand one thing from long time. I have a strand specifc RNAseq library, so should I map reads by using default setting or should give strand specific option?
there are two option for strand specificity in Hisat2:
1) --rna-strandness:For paired-end reads, use either FR or RF, With this option being used, every read alignment will have an XS attribute tag: '+' means a read belongs to a transcript on '+' strand of genome. '-' means a read belongs to a transcript on '-' strand of genome
2) --fr/--rf/--ff: The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand
Do you think --rna-strandness should be used when reads are mapped to genome only otherwise for transcriptome mapping I should use only ---fr .
Put it this way:
If you don't use RNA strandness you will most likely get the best result for each read anyway. However, if there is a gene duplication + inversion and you know your read comes from the forward strand, you could map it with the same score on the reverse strand (where the duplicated + inverted gene is). Therefore, background noise.
The fr, rf, ff depends on the architecture of the sequencing construct, which for illumina is (correct me if i'm wrong, always --fr).
EDIT:
WOW, Biostars supporting video embed from youtube, awesome.
If this answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.