Hello,
I have an unmasked genome and its repeat file in bed/gff format. I tried to softmask the genome using maskFastaFromBed from bedtools (v2.26), but it doesn't automatically mask the minus strand by taking strand information from the .bed file. There is no -s option to switch on strandedness either (a feature available in getfasta). So basically, the masking is always returned on the + strand and I'm surprised! Does anyone have any ideas or alternative tools to suggest? Example run and output is below.
Thanks.
genome.fasta
Chr1
ATTGACAGAAATATCATCACATCTATTCTTTCTCTCCCCTAGTTTAGCAAAT
Chr2
GACATATAAATAATAGTGGGAAAGAGACCGGATGAAACCTCAACTGTGGCTTTCATTAACAGATCA
genome.bed
Chr1 0 17 for 1 +
Chr2 0 17 rev 1 -
maskFastaFromBed -soft -fi genome.fasta -bed genome.bed -fo genome_softmasked.fasta
Output:
Chr1
attgacagaaatatcatCACATCTATTCTTTCTCTCCCCTAGTTTAGCAAAT
Chr2
gacatataaataatagtGGGAAAGAGACCGGATGAAACCTCAACTGTGGCTTTCATTAACAGATCA
Can you provide example output of how you would like the output to be?
Hello,
I would've expected the output on minus strand as below (also shown by user Fatima below). Am I getting this wrong or something?
Thanks
Would it work if you modify the genome.bed based on the strand? I assume for 0 17 on the negative strand you want the mask to be applied to 17 nucleotides starting from the last one toward the beginning of the sequence.
>Chr2
GACATATAAATAATAGTGGGAAAGAGACCGGATGAAACCTCAACTGTGGctttcattaacagatca
If that's the case I think you can modify your genome.bed and when it's - strand you can replace start stop with length-stop-1 and length-start
Hi Fatima,
Thanks for the reply. Yes, there are ways to modify the .bed file, but we shouldn't have to since bedtools generally takes strand information in other tools (like getfasta), but apparently not for maskFastaFromBed.