I'm writing a workflow tool for testing PCR assay primer/probe sequences. I am not a bioinformatics person or scientist, I'm writing this tool to help a friend.
I am using a search term to download some sequences from the NCBI nucleotide database. Currently, I am removing any sequence that contains at least one N in the sequence record.
My question is, how do N's affect a multi sequence alignment? Is there an acceptable threshold of N's for sequences that will be aligned with a primer/probe sequence to test PCR assays?
My friend wasn't quite sure how we should handle sequences containing N's so I thought I'd ask here.
Thanks!
Depending on the program you are using solitary N's may not have deleterious effect on your testing. You may want to use curated sequences like RefSeq which should not have this problem.
I am using clustal omega to perform the alignments. My friend is required to use genbank so I believe I'm stuck with it for this tool
RefSeq
is actually curated subset of GenBank. The sequences will be non-redundant and will save you time/effort: https://ncbi.nlm.nih.gov/refseq/about/Thanks! I will check with him and see if this works for him.