Finding transcriptome coverage of genome
1
0
Entering edit mode
3.6 years ago
DNAlias ▴ 40

Hello,

I would like to determine what percentage of the genome is covered by a transcriptome. I tried following the instructions here: http://www.metagenomics.wiki/tools/samtools/breadth-of-coverage but it didn't like that I had a fasta instead of fastq input.

Does anyone know how I can achieve this? Thanks

mapping transcriptome trinity RNAseq • 1.5k views
ADD COMMENT
0
Entering edit mode

Those instructions are precisely what you would have to do starting from a fastq file: map it to a genome and compare the size of regions above 5x with the size of the genome.

By the way, if you end up mapping your fastq and obtaining a bam file, I would suggest a faster way to get the information you need through bedtools:

bedtools genomecov -bga -ibam file.bam | awk '{
 if ($4<5) {below+=$3-$2} else {above+=$3-$2}
} END {
 print "Total:", below+above;
 print "Above:", above, "(", 100*above/(below+above), "%)";
 print "Below:", below, "(", 100*below/(below+above), "%)";
}'
ADD REPLY
0
Entering edit mode

Qualimap bamqc is an option that you can explore: http://qualimap.conesalab.org/doc_html/command_line.html#cmdline-bamqc. It gives information about both depth and breadth of coverage. You can also explore qualimap RNA-seq QC: http://qualimap.conesalab.org/doc_html/analysis.html#rna-seq-qc

Just to add, you need to align your raw reads to a reference genome before you use bamqc.

If you have a fasta file, instead of fastq, there are tools availble that can help you add dummy quality scores to your fasta file. I am not sure which is the best tool for that, but a simple google can help you with that.

I haven't tried this, but since you are working with RNA-seq data, STAR allows you to align fasta sequences to a reference genome according to its manual and this is mentioned specifically in page 5 (https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf)

ADD REPLY
1
Entering edit mode
3.6 years ago

two options I see:

  • you either re-format the fastq file to fasta (search around, this is quite straightforward)

  • you map the fastq file to the genome (using a mapper of your liking) and then use something like bedtoolscov to get the coverage per base (or other regions).

ADD COMMENT
0
Entering edit mode

I'll comment on myself :

do you have a fastA file as input or a fastQ file ?

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6