How To Create Masked Genome?
1
3
Entering edit mode
13.1 years ago
Bioscientist ★ 1.7k

Now I have build-18 genome at hand; also duplication/satellites bed files (coordinate). I want to mask those satellites/duplications by replacing their sequences into 'NNNNN'

How can I do this? (I download repeat sequences from UCSC genome broweser, can we also download such masked genome from somewhere?)

THanks

repeatmasker • 10k views
ADD COMMENT
3
Entering edit mode
13.1 years ago

Bedtools contains a tool named: maskFastaFromBed

Masks a FASTA file based on BED coordinates.

BTW: the following directory contains a 'masked' version of hg18:http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/

chromFaMasked.zip - The assembly sequence in one file per chromosome. Repeats are masked by capital Ns; non-repeating sequence is shown in upper case.

ADD COMMENT
0
Entering edit mode

works super!!!!

ADD REPLY
0
Entering edit mode

maskFastaFromBed does not seem like working to mask only one base. I have tried to mask only one base but couldn't make it. For example, I would like to mask position 20 in Chr. The bed files are as below:

Chr    20     20         or          Chr    20     -

Both cases actually mask position 20 and 21.

Is there any option I can include to only mask one base?

ADD REPLY
0
Entering edit mode

By the way, I have also tried maskseq from EMBOSS and it works well to mask only one base. However, I wish to mask aligned sequences but maskseq removes the gaps which I want to retain. maskFastaFromBed does not remove the gaps while performing masking.

ADD REPLY
0
Entering edit mode

Just figured out that if you want to extract position 20, you need to prepare the bed file like: Chr 19 20.

ADD REPLY

Login before adding your answer.

Traffic: 1484 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6