Dear Biostars,
My request is based on filtering and curing several multifastas. For instance, I have downloaded about 150 complete genomes from NCBI belonging to Influenza Virus that infect humans. Within these sequences, there are ambiguous nucleotides (W, S, K, M, Y, R, V, H, D, B, N), which are produced by sequencing process as following:
>H1N1_12
ATGCTTACTGGGTGATC
>H5N1
TTGCCRTCACCGNACTGC
>H1N1_9
CTGYNATTGCCATCGWAA
>H5N1_1
ATCTTACTCGGCGACTCC
>H5N1_5
ACTGYRATTCGCCTAKAA
With use of Biopython tools, I wish a script where it identifies these ambiguous nucleotides with its respective fasta header (in this case, >H5N1 has R and N, >H1N1_9 has Y, N and W, >H5N1_5 has Y, R and K) if finds whatever ambiguous nucleotide then it must remove the sequence with its ID in a new fasta file and else the process goes on until find it.
Cheers,
Welcome to the forum, diegocastano182 .
While we are indeed here to help out, it is a good idea (think of it as a code of conduct) to at least show some effort yourself. Example by indicating what you may have tried or considered already before posting here.
Unfortunately (and please do not take this the wrong way) we are not here to do your work for you ;-) .
What about this? It's right?
Maybe, you could help me with edit it. Thanks
I don't know is there is a method to detect ambiguous bases with biopython. Though, you can write a script yourself if it is not an absolute necessary to use it.
Your code looks ok if it has indentations. If it is not working, could you send the error message, or the logically wrong result?
It is always better to send it as it is and with the result you got so that others do not try to guess what you are going through.
Try not to delete posts that have answers so that others may see them in the future if they have the same question.