Entering edit mode
4.6 years ago
k.kathirvel93
▴
310
Hi EveryOne,
I have a multifasta file which contains 11000 (30kb each) genomes. Now i want to remove all the reads(whole genome) which contains N (minimum atleast one N ). How can I do this with sed or awk? Thanks in advance.
I have input like this :
Genome1 ATCGTCGTACAGATACAGATACANNNcGATAGACATAGACA
Genome2 AGTCGATCAGTACAGATACAGATACAGATACAGATAC
I want output like this
Genome2 AGTCGATCAGTACAGATACAGATACAGATACAGATAC
Hello k.kathirvel93!
Questions similar to yours can already be found at:
We have closed your question to allow us to keep similar content in the same thread.
If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.
Cheers!
Thanks @Pierre Lindenbaum, I have gone through that thread you mentioned, but it was not working fine with my large data, coz after executed that code, still the genome have Ns. Since that thread was 4 yrs old, i created my own thread. Can you help with this? Thanks
have you found a solution?