Hisat2 splice sites extract blank files
1
0
Entering edit mode
5.3 years ago

Hi everybody,

I was trying to extract splices sites of my -gtf file but I had no success with neither "hisat2_splice_sites.py" nor "awk" command. For both options I just had a blank file as output. My final goal is use the splice sites forhisat2-build --ss --exon. For some reason, I could do that with human gtf files but HIV. I tried to download several files of many data site such as NCBI, UCSC genome Browser, Esembl and ENA. Someone could help with another data site for retrieve gff/gtf files, or even how to make a new one by myself? OBs: I downloaded gff files and then converted with gffread command

Thank you guys in advance!

assembly alignment • 2.7k views
ADD COMMENT
0
Entering edit mode

Could you provide one example of a gtf that doesn't work? And the commands you used?

ADD REPLY
0
Entering edit mode

Hi! Sure.! I used:

gffread -E hiv.gff -T  -o hiv.gtf #Conversion.                  

hisat2_extract_splice_sites.py hiv.gtf > splice_sites.txt     #Extract

These same commands worked perfectly with other human gff/gtf files and. I got that "splice_sitex.txt" as a blank file ( Zero bytes)

Awk Commands:

awk '{if ($3=="exon") {print $1"\t"$4-1"\t"$5-1}}' hiv.gtf > exonsFile.txt.        
awk '{if ($3=="intron") {print $1"\t"$4-2"\t"$5}}' hiv.gtf > ssFile.txt

About gtf, I tested tons of them. Sending some of them below:

https://www.ncbi.nlm.nih.gov/nuccore/KY112585.1

https://www.ncbi.nlm.nih.gov/genome/genomes/10319? (this one contains I list of possibles assemblies)

Any insight about this issue is welcome!

ADD REPLY
0
Entering edit mode
5.3 years ago
h.mon 35k

The annotation you are using is from a virus, which has an extremely packed genome, and contains no introns. Thus, the ssFile.txt will be empty.

ADD COMMENT
0
Entering edit mode

That's fair. So what is the solution? use a txt file made by myself with --known-splicesite-infile <path> ? Because, I think I am not supposed to have novel splice junctions or new junctions annotations if I do not provide any file like those ones.

ADD REPLY
0
Entering edit mode

What, are you trying to do though? If you believe there are novel splice sites that the annotation file has misssed then you dont find those novel splice sites by passing in the annotation file..

If you dont care about novel splice sites and want to boost the accuracy using the annotation file, then if the annotation file says there are no introns then therefore there will be an empty file thats not something you solve, that is not an error it means the virus as we know it has no introns and therefore cannot splice them out...

ADD REPLY

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6