How do I obtain the mobile elements?
2
1
Entering edit mode
5.6 years ago
kiomix106 ▴ 10

hello, the doubt I have because according to the main classes of mobile elements according to the context of the question are the DNA, LTR, LINES, SINES

EXAMPLE

scaffold_1  RepeatMasker    .   33  148 663 -   .   snRNA:U5
scaffold_1  RepeatMasker    .   605 720 663 -   .   snRNA:U5
scaffold_1  RepeatMasker    .   969 1132    831 -   .   DNA/TcMar-Tc2:Mariner-2N2_XT
scaffold_1  RepeatMasker    .   1496    1730    1279    -   .   DNA/TcMar-Tc2:Mariner-2N2_XT
scaffold_1  RepeatMasker    .   1645    1810    342 -   .   DNA/hAT-Charlie:hAT-N5_XT
scaffold_1  RepeatMasker    .   1946    2097    876 +   .   DNA/Kolobok-T2:XBR_Xt
scaffold_1  RepeatMasker    .   2474    2917    3699    -   .   DNA/Kolobok-T2:Kolobok-1N2_XT
scaffold_1  RepeatMasker    .   3306    3322    16  +   .   Simple_repeat:(A)n
scaffold_1  RepeatMasker    .   3689    3989    1732    -   .   DNA/hAT-Charlie:POR-1_Xt
scaffold_1  RepeatMasker    .   4876    5605    5823    +   .   LINE/L1:L1-42A_XT
scaffold_1  RepeatMasker    .   5600    6436    23824   +   .   LINE/L1:L1-42A_XT

So I guess I should take each one and do a count ... but the problem is that each name that appears with DNA, LTR, etc. is different so I can not do a count because the names are different .. How can I solve this through a command in the terminal, I also use the bedtools tool

genome • 1.2k views
ADD COMMENT
0
Entering edit mode
5.6 years ago
flogin ▴ 280

Maybe these is not the best solution, but appear that you have several patterns of names, like DNA/TcMar, DNA/Kolobok, LINE/L1... You can create a list of those names:

DNA/TcMar   
DNA/Kolobok 
LINE/L1

And, for example, if you named that list as TE.lst; you can use a for with grep:

for i in `cat TEs.lst`;do grep -c "$i";done

Where -c count lines with pattern.

I make a test with this TE.lst example using your information:

the ouput:

2
2
2

A little bit laborious but worked, I hope.

ADD COMMENT
0
Entering edit mode
5.6 years ago
kiomix106 ▴ 10

In the end, what I did was make a count by changing the words of each one to DNA, LINE, LTR, SINE ... so I could group them easily with the following command ... a bit repetitive but it serves

awk 'BEGIN {FS = OFS = "\ t"} ($ 9 ~ "DNA") {$ 9 = "DNA"; print $ 0}' xentr4_repeatmasker_annotation_first200000.gff3> countingDNA.txt

awk 'BEGIN {FS = OFS = "\ t"} ($ 9 ~ "LINE") {$ 9 = "LINE"; print $ 0}' xentr4_repeatmasker_annotation_first200000.gff3> countingLINE.txt

awk 'BEGIN {FS = OFS = "\ t"} ($ 9 ~ "SINE") {$ 9 = "SINE"; print $ 0}' xentr4_repeatmasker_annotation_first200000.gff3> countingDNA.txt

awk 'BEGIN {FS = OFS = "\ t"} ($ 9 ~ "LTR") {$ 9 = "LTR"; print $ 0}' xentr4_repeatmasker_annotation_first200000.gff3> countingLTR.txt

ADD COMMENT

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6