Question

Extract non-repetitive DNA sequence

0

Entering edit mode

4.2 years ago

huiyus97 • 0

Hi,

I scanned all dna repetitive elements with RepeatMasker, and I got a file look like this.

u1  u2  u3  u4  scaffolds   begin   end (left)repeat        repeat  class   begin   end left    id  
15  3.7 3.1 6.5 contig1 2955    2986    -9613389    +   (ATTA)n Simple_repeat   1   31  0   1   
29  4.8 2.2 2.2 contig1 3772    3816    -9612559    +   (AAGGCTAAA)n    Simple_repeat   1   45  0   2   
15  19.6    0   0   contig1 6019    6047    -9610328    +   (T)n    Simple_repeat   1   29  0   3   
14  25.1    3.3 3.3 contig1 9869    9928    -9606447    +   GA-rich Low_complexity  1   60  0   4

I then reformatted this file and extracted all repetitive elements with bedtools. However, I also want to extract the non-repetitive sequences, which I assume is all dna sequences except for repetitive sequences.

Is there anyway to extract the non-repetitive sequences directly with a file indicated the positions of all repetitive elements?

Thank you!

bedtools • 711 views

ADD COMMENT • link updated 4.2 years ago by JC 13k • written 4.2 years ago by huiyus97 • 0

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY • link 4.2 years ago by Ram 44k

score 0 · Answer 1 · 2020-09-24

0

Entering edit mode

4.2 years ago

JC 13k

Create a BED file with your contigs length
Subtract the repetitive region from the contigs BED with bedtools subtract operator
Extract the regions from step 2

ADD COMMENT • link 4.2 years ago by JC 13k