Hello,
I would like to determine what percentage of the genome is covered by a transcriptome. I tried following the instructions here: http://www.metagenomics.wiki/tools/samtools/breadth-of-coverage but it didn't like that I had a fasta instead of fastq input.
Does anyone know how I can achieve this? Thanks
Those instructions are precisely what you would have to do starting from a fastq file: map it to a genome and compare the size of regions above 5x with the size of the genome.
By the way, if you end up mapping your fastq and obtaining a bam file, I would suggest a faster way to get the information you need through
bedtools
:Qualimap bamqc is an option that you can explore: http://qualimap.conesalab.org/doc_html/command_line.html#cmdline-bamqc. It gives information about both depth and breadth of coverage. You can also explore qualimap RNA-seq QC: http://qualimap.conesalab.org/doc_html/analysis.html#rna-seq-qc
Just to add, you need to align your raw reads to a reference genome before you use bamqc.
If you have a fasta file, instead of fastq, there are tools availble that can help you add dummy quality scores to your fasta file. I am not sure which is the best tool for that, but a simple google can help you with that.
I haven't tried this, but since you are working with RNA-seq data, STAR allows you to align fasta sequences to a reference genome according to its manual and this is mentioned specifically in page 5 (https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf)