Entering edit mode
5.3 years ago
sarah.goldstein
•
0
I am a bit of a bioinformatics newbie and need some help.
I was wondering if it is possible to determine the coverage of a single gene from whole-genome sequencing data? I have the target gene sequence, the R1/R2 reads, the whole-genome assembly, and a reference genome assembly.
I expect this target gene to be present on a plasmid and therefore potentially present in multiple copies. As a rough estimate, increased coverage of this gene should equate to multiple copies of the plasmid.
Any simple ways to map the coverage of just one gene?
Thanks!
Ideally if your data is from whole genome then you should map to the entire genome and then use a BED intercal to count the number of reads aligning to the specific region (I assume that is what you are interested in).
If you are mainly interested to see if there are reads in your data that actually map to the gene/region of interest then you could just map to the gene by itself. Doing so may over-estimate the coverage a bit, if reads that don't belong in the region get aligned there because you used a reduced reference.