Counting mutated reads from RNA Seq data
2
1
Entering edit mode
9.4 years ago
EVR ▴ 610

Hi,

I am new to variant analysis. I have mutation induced RNAseq samples data which has been aligned to reference transcriptome and obtained a bam file which has been sorted. For every transcript I would like to count the number of mutated reads aligned to reference transcript. For an example, number of reads on which base C has been mutated to A at every position along the transcript.

Is it possible to achieve this using samtools and vcf tools.

Kindly guide me

Mutation Variant-analysis RNA-Seq Samtools • 4.5k views
ADD COMMENT
0
Entering edit mode

Basically u need to call SNPs on RNA-seq data using tools such as samtools/GATK which results in VCF file. From VCF file, you could count the snps in regions of interest. VCF file contains the coordinates for snps and your transcripts will have the coordinates. just overlapping them will give you number of snps. If there are overlapping transcripts, you should be careful in redundancy.

ADD REPLY
2
Entering edit mode
9.4 years ago
poisonAlien ★ 3.2k

bam-readcount?

ADD COMMENT
0
Entering edit mode

Hi,

bam read count is what I was looking for. When I ran the program, it is giving me warning for all reads that "Couldn't find Generated tag". Is this a serious problem? Also is it possible to run the bam-readcount by using utilizing all CPUs? it is taking quite lot of time.

The command I used

bam-readcount -f /path/to/ref_fasta /path/to/bam_file > sample_output.txt
ADD REPLY
0
Entering edit mode

That's just warning (use -w1 to stop the warnings), I guess you can ignore it. And I don't think its multithreaded.

ADD REPLY
0
Entering edit mode

Thanks for your reply. Every sample I run has around 20 million reads. I think this program will take really a long time. Is there any similar tool which gives you similar output and we can make use of multithreading to make things work faster.?

ADD REPLY
0
Entering edit mode

Don't you have variant location (as you mentioned, locus where C>A occurred)? Just run the programme for those variants. make a tab limited file with three columns - 'chromosme start end' and use this to run the programme. Its pretty fast.

bam-readcount -q20 -w1 -b20 -l tab_file.txt -f $ref $bam

Also since you mentioned RNA-seq, probably you would expect everything to be homozygous for alternate allele.

ADD REPLY
0
Entering edit mode

Yes I do have the output from GATK analysis where it has Transcript_ID, POS, Ref_base and Alt_base( Mapping is done with respect to de novo assembled reference transcriptome)So Can I make us e tab delim file with ID and the position and introduce the file into bam-readcount. WIll it work with this two information? Also for a bam aroung 1GB how long approximately it will take finish? Kindly guide me.

ADD REPLY
0
Entering edit mode

is it vcf file ?

ADD REPLY
0
Entering edit mode

yes Is a VCF file.

ADD REPLY
0
Entering edit mode

If you have vcf file, it should already contain depth info. Anyways,

cat your.vcf | sed '/^#/d' | awk '{OFS="\t" ; print $1,$2,$2}' > your.var

bam-readcount -q20 -w1 -b20 -l your.var -f $ref $bam > your.rc

This would give readcount for all the variants in vcf file. Maybe you want to grep those variants of your interest.

ADD REPLY
0
Entering edit mode

Thanks...Works like charm !!!

ADD REPLY
0
Entering edit mode
6.1 years ago
hewm2008 ▴ 50

https://github.com/BGI-shenzhen/BamDeal ./BamDeal_Linux statistics BasesCount -List bam.list -OutPut outFix

can give out each site four based depth

ADD COMMENT

Login before adding your answer.

Traffic: 2281 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6