Hey! I'm trying to determine the absence of transposon elements on a non-reference genome by looking at split reads. For that, I mapped the reference genome against a pair of reads using STAR. Consequently, I sorted and indexed the resulting bam file and converted split junction reads with Bam2hints. The output of Bam2hints was a gff file that was converted to a bed file by using gff2bed
gff2bed < myfile.gff > my_sorted_file.bed
Finally, to count if there are hints/junctions spanning a TEs I tried to use Bedtools intersect:
bedtools intersect -a TE_file.bed -b my_sorted_file.bed
(TE_file.bed is the bed file containing the transposon element coordinates)
These 2 Bed files look like this:
- TE_file.bed
chrI 22231 22552 chrI_s1+_TY1 . + reannotate transposable_element . ID=chrI_s1+_TY1
chrI 138753 138990 chrI_s2-_TY1 . - reannotate transposable_element . ID=chrI_s2-_TY1
chrI 160105 160237 chrI_st1-_TY1 . - reannotate transposable_element . ID=chrI_st1-_TY1
chrI 160238 166163 chrI_u1-_TY1 . - reannotate transposable_element . ID=chrI_u1-_TY1
chrI 182613 182953 chrI_s4+_TY3 . + reannotate transposable_element . ID=chrI_s4+_TY3
-my_sorted_file.bed
2micron 1674 4928 . 0 . b2h intron . pri=4;src=E
chrI 5028 15053 . 0 . b2h intron . pri=4;src=E
chrI 6306 7155 . 0 . b2h intron . mult=2;pri=4;src=E
chrI 6826 16996 . 0 . b2h intron . pri=4;src=E
chrI 6827 9072 . 0 . b2h intron . mult=2;pri=4;src=E
chrI 10926 13026 . 0 . b2h intron . pri=4;src=E
chrI 12053 24485 . 0 . b2h intron . mult=2;pri=4;src=E
chrI 12061 21076 . 0 . b2h intron . pri=4;src=E
A piece of the output looks like this:
***** WARNING: File Gallone_hints.bed has inconsistent naming convention for record:
chrI 5028 15053 . 0 . b2h intron . pri=4;src=E
chrI 22231 22552 chrI_s1+_TY1 . + reannotate transposable_element ID=chrI_s1+_TY1
chrI 138753 138990 chrI_s2-_TY1 . - reannotate transposable_element ID=chrI_s2-_TY1
chrI 160105 160237 chrI_st1-_TY1 . - reannotate transposable_element ID=chrI_st1-_TY1
chrI 160238 166163 chrI_u1-_TY1 . - reannotate transposable_element ID=chrI_u1-_TY1
How can I solve this problem? Which columns should I remove, reorder? Thank you in advance.
Thank you for your reply. Since there was one chromosome starting with a number I just added a character to it. Your code did not work for me so I did the adding manually. This is the stdout:
The making it
chr2micron
, so have the same prefix everywhere, so chr.Thank you for your help and time! It worked.