How to unmask a soft masker genome?
2
0
Entering edit mode
23 months ago
Diego ▴ 110

Hi all,

I wonder if you know a good way to unmask a soft masked genome (i.e., atatata --> ATATATA).

Thanks in advance

soft-mask • 1.3k views
ADD COMMENT
1
Entering edit mode

You can try tr command

tr '[:lower:]' '[:upper:]'
ADD REPLY
0
Entering edit mode

in all fairness this would alter the sequence names as well, usually that is not a desired behavior

ADD REPLY
1
Entering edit mode

Yes you need to skip the names some way.

while read -r line; do case "$line" in ">"*) echo "$line";; *) echo $(echo $line | tr '[:lower:]' '[:upper:]');; esac;done < myref.fa
ADD REPLY
2
Entering edit mode
23 months ago

You can use a tool like seqkit

https://bioinf.shenwei.me/seqkit/usage/

seqkit seq 

 -u, --upper-case                print sequences in upper case
ADD COMMENT
2
Entering edit mode
23 months ago
GenoMax 147k

Using reformat.sh from BBMap suite

reformat.sh -Xmx2g in=masked.fa out=unmasked.fa tuc=t
ADD COMMENT

Login before adding your answer.

Traffic: 1303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6