I have 6 wheat samples sequenced using RNA-seq. I received forward and reverse fastq files and I generated bam files by using hisat2 tool which are aligned with the reference wheat genome. I have been asked to build multiple sequence alignment for 3 genes from this sequenced rna seq data. I believe I need to select a gene sequence from all the samples and do a multiple sequence alignment. But I am struck in fetching the gene sequence from bam files. How do I select one gene sequence for all the 6 samples? Any suggestions? kidly send me commands for fetching the sequence of genes from this RNA seq data in fastq file or aligned.bam file.
are you looking for isoforms? read about transcript assembly with stringtie, or isoform detection, once detected or assembled you can extract the sequences and align them.
Hi buffo,
I have run the strigtie and got the assembled transcripts.next i want to fetch the transcripts sequences from all six sample .how can I do?
you can use gff_read to do that
Hi Buffo,
1 GTF file containing the assembled transcripts 2 Gene abundances in tab-delimited format 3 Fully covered transcripts that match the reference annotation, in GTF format 4 Files required as input to Ballgown 5 a merged GTF file from a set of GTF files. I did not use the gff_read tool. I have read the manual of it but did not understand. Can you please set the command for me. The command given in manual is gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf
In this command i did not understand which genomic file and transcripts.fa file I have to use.
Hi Buffo,
I have run the stringtie tool and five files have been generated.
I did not use the gff_read tool. I have read the manual of it but did not understand. Can you please set the command for me. The command given in manual is
gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf
In this command i did not understand which genomic file and transcripts.fa file I have to use.
The
-w
option refers to the output file name,-g
needs the genome reference (you use it to map your reads) and the file transcripts.gtf is the option 1 of the above list (1 GTF file containing the assembled transcripts).Hi buffo, I have genrated the sequence file (transcript.fa). this file contain sequence ids like
Here i have write the id of first sequence in file. My gene of interest id is TraesCS3D02G273600. I want to do extract this id sequence from the file i have generated (transcript.fa) .Kindly tell me how can i extract the sequence of transcript of gene of interest. I
Hi buffo , please reply me
If you are looking for a single gene, type in the shell prompt:
It looks for the specific gene id in the transcript.fa file and shows the first line after the gene id, which is the sequence of the transcript.
Hi buffo , this command grep -A 1 'gene_id' transcript.fa display only one line of sequence , i want to extract the full length of gene sequence
Hi Buffo, Thank you.I have the gene sequences now.but these are the same sequence as present in database. After sequence alignment i did not any variation in six samples of target gene. why it is same?
load your bam files in IGV and check if that gene has SNPs among samples. That sequence is the same because the genome reference is exactly the same.