Find a gene from RNA-seq data without GTF
0
0
Entering edit mode
4.7 years ago
avino ▴ 20

I have sra files from an RNA-seq experiment of an organism which there is no reference genome nor reference transcriptome available, hence no gtf available. I would like to see if I can extract the sequence of a gene (cds) inside these sra files based on an homologue gene (cds) of a related species which I have available.

RNA-Seq • 1.3k views
ADD COMMENT
1
Entering edit mode

Have you tried blastn?

ADD REPLY
1
Entering edit mode

Use web interface for BLAST at NCBI like @fatima said and choose SRA as database and the limit to the accession number you need.

ADD REPLY
1
Entering edit mode

Perform transcriptome assembly, first?

ADD REPLY
2
Entering edit mode

Agreed with this. I suggest you assemble your transcriptome, maybe with stringtie and then align that CDS against that. This will also tell you if you maybe have gene duplications or something like that.

ADD REPLY
1
Entering edit mode

Yes you could map the SRA data to reference, if that is all you are interested in. That is why I had suggested that you could do this the other way around. Use reference as query and then identify reads from SRA using blast at NCBI.

In either case, you will identify reads that map to the gene you are interested in. You will need to either assemble them or do multiple sequence alignments (depending how homologous your data is to the reference) to get a putative sequence of homolog you are interested in.

ADD REPLY
0
Entering edit mode

Thank you for all your useful suggestions...I was thinking since I also have fastq files, I could quality control them, then simply map them against a genome of reference which is gonna be my homologue gene, and since all is cds, either my fastq and the homologue, I could use as mapper bowtie2...what do you think?

ADD REPLY
1
Entering edit mode

You just said that there is no reference genome, nor is there a reference transcriptome (?). If you mean to map the FASTQ reads to just the CDS sequence of your gene, then this is possible (with Bowtie), but some person will find some way to criticise it. How were your FASTQ reads produced, i.e., what was your experimental design?

ADD REPLY
0
Entering edit mode

yes I meant mapped them onto the CDS of my homologue gene...I don't have any experimental design because I found the sra on NCBI from another group that however has not published the work yet...

ADD REPLY
0
Entering edit mode

Hope that you got it solved?

ADD REPLY

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6