convert genome file to transcriptome file
2
1
Entering edit mode
5.4 years ago
harry ▴ 40

i want to know about which package i download which can convert my genome file into transcript file which i want to use in kallisto pseudomapper tell me which package and how to download. thanks

RNA-Seq • 6.9k views
ADD COMMENT
2
Entering edit mode

I changed your title, as for package download doesn't tell us anything about your question. In addition, it is usually helpful to be as precise as possible. You write "genome file" but more accurate would probably be that you have a genome fasta, and want a transcriptome fasta. We also have no idea which organism you are working on, so specifying that would be good as well.

ADD REPLY
1
Entering edit mode

If your genome has been assembled and annotated by you, you have to tell a bit more about how the genome has been assembled and annotated, in particular, what kind of annotation do you have.

If the genome has been assembled by a third party and is available at NCBI or Ensembl, a suitable transcripts fasta is probably already available, you just have to find it.

ADD REPLY
0
Entering edit mode

I have HIV genome in fasta format but i don't have there whole transcripts because kallisto work on transcript file. So please tell me how to convert fasta format of HIV genome into transcript file.

ADD REPLY
8
Entering edit mode
5.4 years ago
husensofteng ▴ 410

So basically you could (given that you work on a linux/mac machine):

  1. Download and extract the genome fasta file for HIV1*:

    wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/864/765/GCF_000864765.1_ViralProj15476/GCF_000864765.1_ViralProj15476_genomic.fna.gz

    gunzip GCF_000864765.1_ViralProj15476_genomic.fna.gz

  2. Download and extract gene annotations for HIV1:

    wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/864/765/GCF_000864765.1_ViralProj15476/GCF_000864765.1_ViralProj15476_genomic.gff.gz

    gunzip GCF_000864765.1_ViralProj15476_genomic.gff.gz

  3. Generate the transcriptome fasta using gffread:

    gffread -F -w transcriptome.fa -g GCF_000864765.1_ViralProj15476_genomic.fna GCF_000864765.1_ViralProj15476_genomic.gff

*Of course step 1 is not needed if you already have the genome fasta file. In such case, make sure you download a GTF file that corresponds to the genome fasta. Otherwise, make sure the listed files in step 1 and 2 belong to your HIV subtype.

ADD COMMENT
2
Entering edit mode

Your answer is correct, but you are slightly over-complicating: the FTP repository you linked already contain the transcripts in fasta format, in the file GCF_000864765.1_ViralProj15476_cds_from_genomic.fna.gz.

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/864/765/GCF_000864765.1_ViralProj15476/GCF_000864765.1_ViralProj15476_cds_from_genomic.fna.gz

ADD REPLY
0
Entering edit mode

you are right, I thought the question is on generating a transcripts fasta based on the user's genome fasta and gave the links as an example to make it complete.

ADD REPLY
0
Entering edit mode
5.4 years ago

You will need to also obtain a transcript annotation file - typically a GTF or a GFF file. From that GTF file and your genome fasta file you can extract the transcript nucleotide sequence as a fasta file using tools such as gffread a tool which incidentially is also included in both the Cufflinks and Stringtie binary releases so it is typically easier to download those binaries and access the program though there.

ADD COMMENT

Login before adding your answer.

Traffic: 2756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6