Replace the bases with N for the sites with evidence of a SNP
1
0
Entering edit mode
5.4 years ago
Prakki Rama ★ 2.7k

Hi all,

I have a alignment fasta file which looks like this:

$ cat alignment_file.fasta
>Ref1
ATCG
>S1
AT-C
>S2
CTCG
>S3
ATCG
>S4
-TCG

Specifically, what I want is:

1) If there exists a SNP (for eg: "C" in the S2 sequence in the position 1), then I want the gap ("-", S4 sequence in the position 1) to be replaced with "N"

2) If there is a gap in the column but no SNPs (like in the S1 sequence, position 3), then I want to replace the gap with reference base in the position 3 (which is "C" in the reference).

I want to change the above output to this:

>Ref1
ATCG
>S1
ATCC
>S2
CTCG
>S3
ATCG
>S4
NTCG

Is there any easy way or any tool with which I can do this? Thanks so much!

snp alignment • 2.4k views
ADD COMMENT
0
Entering edit mode

I think you can use bedtools maskfasta, it will do the job.

ADD REPLY
0
Entering edit mode

Hi @abedkurdi10. Thank you. I understand bedtools maskfasta can mask the base to N's but I am looking for some tool or script which can replace N only when there is a SNP in the whole column as mentioned in my question.

ADD REPLY
0
Entering edit mode

I think 'pattern matching' in any programming language can also solve this issue.

ADD REPLY
0
Entering edit mode

Hi @Pramod. Thank you. But as mentioned in the question, I am looking to replace Ns if only a SNP is present in the whole column and it is slightly beyond just matching a pattern and replacing it. I might end up writing my own script for this. Just did not want to reinvent the wheel. Anyways, thanks for response.

ADD REPLY
2
Entering edit mode
5.4 years ago
Prakki Rama ★ 2.7k

Answering the question myself.

I have written a perl script which can actually do this. It can be found in github for reference.

ADD COMMENT
0
Entering edit mode

bedtools maskfasta is already available, why writing a script from scratch?

ADD REPLY
0
Entering edit mode

Hi. As I have stated in my question, there are if conditions which I need to fulfill before I replace the gaps with either N or Reference base. Simply using bedtools maskfasta might not accomplish what I need. I would be glad to accept and vote your answer if you can actually post your steps to solve the above problem using bedtools. Thanks much!

ADD REPLY
0
Entering edit mode

bedtools maskfasta requires FASTA as input file. To mask regions in that FASTA file, bedtools maskfasta accepts BED, GFF and VCF files as regions to be masked. So SNPs in a VCF file will be masked in the FASTA file.

Also the tool is easy to use: bedtools maskfasta -fi input.fasta -fo output.fasta -bed regions.vcf

ADD REPLY
0
Entering edit mode

How about replacing replaceAmbigousBases to Reference? replaceAmbigousBases2Reference.pl

ADD REPLY

Login before adding your answer.

Traffic: 2712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6