Hello,
I have two GTF files which contain the information of transcripts, and I want to get the overlap of transcripts between the two GTF file. Can anyone give me some advice?
Thanks!
Hello,
I have two GTF files which contain the information of transcripts, and I want to get the overlap of transcripts between the two GTF file. Can anyone give me some advice?
Thanks!
Maybe somebody knows something I don't, but I feel like bedtools should be able to handle your gtf files directly.
If it's not done already, you can filter for transcripts only of each file, then use bedtools:
awk '$3 == "transcript"' file1.gtf > file1.txOnly.gtf
awk '$3 == "transcript"' file2.gtf > file2.txOnly.gtf
bedtools intersect -u -a file1.txOnly.gtf -b file2.txOnly.gtf > file1_tx_overlapping_file2_tx.gtf
depending on your exact goals for what overlaps you want reported you can switch the -a and -b files or adjust options.
bedtools intersect \
-a <(awk '/^[^#]/ {printf("%s\t%d\t%s\t%s\n",$1,int($4)-1,$5,$0);}' file1.gtf | sort -t $'\t' -k1,1 -k2,2n ) \
-b <(awk '/^[^#]/ {printf("%s\t%d\t%s\t%s\n",$1,int($4)-1,$5,$0);}' file2.gtf | sort -t $'\t' -k1,1 -k2,2n )
Something in AGAT
should work: https://agat.readthedocs.io/en/latest/?badge=latest
Using bedops --intersect
and gtf2bed
will get their common genomic space:
bedops --intersect <(gtf2bed < transcripts.gtf) <(gtf2bed < transcripts.gtf) > answer.bed
If you want to know what transcripts overlap other transcripts, specifically, you could use bedmap --echo --echo-map
:
bedmap --echo --echo-map <(gtf2bed < transcripts.gtf) > answer.bed
More information at: https://bedops.readthedocs.io/
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Ok, thanks for all of your help very much! I will try it.