How To Extract Full Sequence Of A Gene From Bam Or Sam File
3
0
Entering edit mode
12.0 years ago

Hi

I am interested in extracting sequence of a gene from a wild type and two allelic mutants of that gene. I have RNAseq data available from all three genotypes and i am wondering how do people normally go about doing this with RNAseq data ? One way i can think of is to denovo assemble the RNAseq reads from all three genotypes and pull out the gene sequence from all the three genotypes. Rather than doing this, Is there a way where you can extract the sequence from say BAM/SAM files?

denovo • 10k views
ADD COMMENT
0
Entering edit mode

Dear Upendra Kumar,

I would like to extract a gene sequence for 30 viral samples and planning to build a phylogenetic tree. For example, this is reference gene of interest for me NC_026434 which is NA Neuraminidase. For this gene, the CDS starts at1 and ends at 1410. How do I extract the same gene for all the 30 viral samples. I have the fastq files and reference aligned bam files.

ADD REPLY
1
Entering edit mode
12.0 years ago
Gjain 5.8k

Please have a look at samtools and this tutorial

SAMTools provides various tools for manipulating alignments in the SAM/BAM format. The SAM (Sequence Alignment/Map) format (BAM is just the binary form of SAM) is currently the de facto standard for storing large nucleotide sequence alignments.

Option you are looking for: faidx index/extract FASTA

ADD COMMENT
1
Entering edit mode
12.0 years ago
JC 13k

You can use samtools to call your consensus sequence based on the reads, I have never tried this for RNAseq but it should work:

 samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
ADD COMMENT
0
Entering edit mode

Note that this method will NOT fix indels. It will simple make the sequence lowercase around them.

What might be a little simpler is to make a vcf from the .bam and the original reference sequence, figure out what SNPs you believe, and then correct that reference based on the SNPs from the vcf.

ADD REPLY
0
Entering edit mode

yes, your right, I'm not sure if the OP requires indels, but thanks for the clarification and alternative.

ADD REPLY
0
Entering edit mode
12.0 years ago

Hi all,

Thanks earlier for your help regarding extracting the sequence from bam file. I did that successfully but however i have some questions. In my final file i found some "n's" and some "lower case letters". I guess "n's" mean low/no coverage and "small letters" mean indels with respect to reference right?

ADD COMMENT
0
Entering edit mode

yes, you're right

ADD REPLY

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6