Question

Finding transcriptome coverage of genome

0

Entering edit mode

4.6 years ago

DNAlias ▴ 40

Hello,

I would like to determine what percentage of the genome is covered by a transcriptome. I tried following the instructions here: http://www.metagenomics.wiki/tools/samtools/breadth-of-coverage but it didn't like that I had a fasta instead of fastq input.

Does anyone know how I can achieve this? Thanks

mapping transcriptome trinity RNAseq • 2.0k views

ADD COMMENT • link updated 4.6 years ago by prasundutta87 ▴ 730 • written 4.6 years ago by DNAlias ▴ 40

0

Entering edit mode

Those instructions are precisely what you would have to do starting from a fastq file: map it to a genome and compare the size of regions above 5x with the size of the genome.

By the way, if you end up mapping your fastq and obtaining a bam file, I would suggest a faster way to get the information you need through bedtools:

bedtools genomecov -bga -ibam file.bam | awk '{
 if ($4<5) {below+=$3-$2} else {above+=$3-$2}
} END {
 print "Total:", below+above;
 print "Above:", above, "(", 100*above/(below+above), "%)";
 print "Below:", below, "(", 100*below/(below+above), "%)";
}'

ADD REPLY • link 4.6 years ago by Jorge Amigo 14k

0

Entering edit mode

Qualimap bamqc is an option that you can explore: http://qualimap.conesalab.org/doc_html/command_line.html#cmdline-bamqc. It gives information about both depth and breadth of coverage. You can also explore qualimap RNA-seq QC: http://qualimap.conesalab.org/doc_html/analysis.html#rna-seq-qc

Just to add, you need to align your raw reads to a reference genome before you use bamqc.

If you have a fasta file, instead of fastq, there are tools availble that can help you add dummy quality scores to your fasta file. I am not sure which is the best tool for that, but a simple google can help you with that.

I haven't tried this, but since you are working with RNA-seq data, STAR allows you to align fasta sequences to a reference genome according to its manual and this is mentioned specifically in page 5 (https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf)

ADD REPLY • link 4.6 years ago by prasundutta87 ▴ 730

score 1 · Answer 1 · 2021-04-07

1

Entering edit mode

4.6 years ago

lieven.sterck 16k

two options I see:

you either re-format the fastq file to fasta (search around, this is quite straightforward)
you map the fastq file to the genome (using a mapper of your liking) and then use something like bedtoolscov to get the coverage per base (or other regions).