Question

How do I obtain the mobile elements?

1

Entering edit mode

6.6 years ago

kiomix106 ▴ 10

hello, the doubt I have because according to the main classes of mobile elements according to the context of the question are the DNA, LTR, LINES, SINES

EXAMPLE

scaffold_1  RepeatMasker    .   33  148 663 -   .   snRNA:U5
scaffold_1  RepeatMasker    .   605 720 663 -   .   snRNA:U5
scaffold_1  RepeatMasker    .   969 1132    831 -   .   DNA/TcMar-Tc2:Mariner-2N2_XT
scaffold_1  RepeatMasker    .   1496    1730    1279    -   .   DNA/TcMar-Tc2:Mariner-2N2_XT
scaffold_1  RepeatMasker    .   1645    1810    342 -   .   DNA/hAT-Charlie:hAT-N5_XT
scaffold_1  RepeatMasker    .   1946    2097    876 +   .   DNA/Kolobok-T2:XBR_Xt
scaffold_1  RepeatMasker    .   2474    2917    3699    -   .   DNA/Kolobok-T2:Kolobok-1N2_XT
scaffold_1  RepeatMasker    .   3306    3322    16  +   .   Simple_repeat:(A)n
scaffold_1  RepeatMasker    .   3689    3989    1732    -   .   DNA/hAT-Charlie:POR-1_Xt
scaffold_1  RepeatMasker    .   4876    5605    5823    +   .   LINE/L1:L1-42A_XT
scaffold_1  RepeatMasker    .   5600    6436    23824   +   .   LINE/L1:L1-42A_XT

So I guess I should take each one and do a count ... but the problem is that each name that appears with DNA, LTR, etc. is different so I can not do a count because the names are different .. How can I solve this through a command in the terminal, I also use the bedtools tool

genome • 1.4k views

ADD COMMENT • link 6.6 years ago by kiomix106 ▴ 10

score 0 · Answer 1 · 2019-04-22

Maybe these is not the best solution, but appear that you have several patterns of names, like DNA/TcMar, DNA/Kolobok, LINE/L1... You can create a list of those names:

DNA/TcMar   
DNA/Kolobok 
LINE/L1

And, for example, if you named that list as TE.lst; you can use a for with grep:

for i in `cat TEs.lst`;do grep -c "$i";done

Where -c count lines with pattern.

I make a test with this TE.lst example using your information:

the ouput:

2
2
2

A little bit laborious but worked, I hope.

score 0 · Answer 2 · 2019-04-22

In the end, what I did was make a count by changing the words of each one to DNA, LINE, LTR, SINE ... so I could group them easily with the following command ... a bit repetitive but it serves

awk 'BEGIN {FS = OFS = "\ t"} ($ 9 ~ "DNA") {$ 9 = "DNA"; print $ 0}' xentr4_repeatmasker_annotation_first200000.gff3> countingDNA.txt

awk 'BEGIN {FS = OFS = "\ t"} ($ 9 ~ "LINE") {$ 9 = "LINE"; print $ 0}' xentr4_repeatmasker_annotation_first200000.gff3> countingLINE.txt

awk 'BEGIN {FS = OFS = "\ t"} ($ 9 ~ "SINE") {$ 9 = "SINE"; print $ 0}' xentr4_repeatmasker_annotation_first200000.gff3> countingDNA.txt

awk 'BEGIN {FS = OFS = "\ t"} ($ 9 ~ "LTR") {$ 9 = "LTR"; print $ 0}' xentr4_repeatmasker_annotation_first200000.gff3> countingLTR.txt