Custom genome and tophat
0
0
Entering edit mode
7.9 years ago
Michi ▴ 990

Hi

I am creating a custom genome since I want to detect two genes only in my rna-seq reads. I have created a .fa as well as a gtf file - both extracts of the original genome.fa and the original gtf.

My genome-fasta and gtf contain the two regions of interest. The header of the .fa file clearly states this:

>3 dna:chromosome chromosome:GRCh37:3:191004362:191007983:1
>12 dna:chromosome chromosome:GRCh37:12:106303127:106339550:1

So the sequence length of chromosome 12 is only 36424 which makes the program think that the gene coordinates (106320389-106322312) as written in the .gff is way beyond the end of the chromosome meanwhile the Fasta header clearly states where the sequence is to be placed.

Does anybody know how to solve this issue?

As for the errors that occur:

  • Tophat fails with the Error: gtf_to_fasta returned an error (the error being std::out_of_range -> basic_string::substr)

  • The same occurs with a when trying to extract manually the transcripts with gffread: Error (GFaSeqGet): end coordinate (106322323) cannot be larger than sequence length 36424

sequencing RNA-Seq tophat • 2.0k views
ADD COMMENT
1
Entering edit mode

The gene coordinates in the gtf file needs to match the location of the gene in your actual sequence. I don't think tophat uses the information in the fasta headers. So if you modified the genomic sequence lengths you need to update the gtf-file accordingly. Why don't you just use the full sequences of chromosome 3 and 12 and a gtf-file containing only the two genes you want? The mapping should still be really fast.

ADD REPLY
0
Entering edit mode

That makes sense! Altough a pity since I don't like to 'alter' the genomic position. I'll give it a try - thank you

ADD REPLY
0
Entering edit mode

It seems that this did the trick.

ADD REPLY
0
Entering edit mode

You should not do like this, some reads may align on other genomic locations in a better way. You should align on the whole genome and then extact the read count using featureCounts and your .gff of interest.

ADD REPLY

Login before adding your answer.

Traffic: 2066 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6