Hi all,
I have a alignment fasta file which looks like this:
$ cat alignment_file.fasta
>Ref1
ATCG
>S1
AT-C
>S2
CTCG
>S3
ATCG
>S4
-TCG
Specifically, what I want is:
1) If there exists a SNP (for eg: "C" in the S2 sequence in the position 1), then I want the gap ("-", S4 sequence in the position 1) to be replaced with "N"
2) If there is a gap in the column but no SNPs (like in the S1 sequence, position 3), then I want to replace the gap with reference base in the position 3 (which is "C" in the reference).
I want to change the above output to this:
>Ref1
ATCG
>S1
ATCC
>S2
CTCG
>S3
ATCG
>S4
NTCG
Is there any easy way or any tool with which I can do this? Thanks so much!
I think you can use bedtools maskfasta, it will do the job.
Hi @abedkurdi10. Thank you. I understand bedtools maskfasta can mask the base to N's but I am looking for some tool or script which can replace N only when there is a SNP in the whole column as mentioned in my question.
I think 'pattern matching' in any programming language can also solve this issue.
Hi @Pramod. Thank you. But as mentioned in the question, I am looking to replace Ns if only a SNP is present in the whole column and it is slightly beyond just matching a pattern and replacing it. I might end up writing my own script for this. Just did not want to reinvent the wheel. Anyways, thanks for response.