Entering edit mode
3.4 years ago
smrutimayipanda
▴
20
Hello,
I want to see the coverage of every bases and want the results in percentage, for e.g. 100% coverage covered, 99% coverage, etc. But I got the following results:
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq
NC_000001.10 1 249250621 10793257 50016111 20.0666 6.34772 36.4 55.7
NT_113878.1 1 106433 2390 21426 20.131 3.27799 36.3 11.6
NT_167207.1 1 547496 21064 123882 22.627 5.60021 36.5 3.43
Here coverage is 20, 20, 22. What does it mean? Is it 20% or 20X. I couldnt understand. How can I get the results as 100% coverage?
In case of
samtools coverage
it looks like it is a percentage, so 20% (http://www.htslib.org/doc/samtools-coverage.html). If you want the coverage (depth) per base I thinksamtools depth
is the best option. An alternative can besamtools mpileup
. Keep an eye on the settings ofdepth
you may need to change something because otherwise it uses some default settings.I used normal samtools coverage input.bam and samtools depth input.bam. Can you please tell me what parameters should I use? if 20% is the coverage, then it means poor coverage, right?
And I checked this link. If 9.7 is the coverage, that doesnt mean it is 9%. There should be some concept behind it.
If you want to see the finer details of your results you may need to use IGV (https://software.broadinstitute.org/software/igv/). Based on this part:
I would say that you need samtools depth or mpileup. I may do not understand your question but based on what other number you want to calculate a percentage? In other words, if position 1 has a coverage/depth of 20 reads how do you want to calculate the percentage?
Thats what I am asking. How do we calculate the coverage in percentage? the coverage covered by variant should be 90-100%, so I am asking that only. How to calculate that?
you can't .
there is no point in putting the per-base coverage in percentage. As others have already asked: what would be for instance 100% for you?
what you can put in % is the fraction of a region that is covered by at least one read. Is it that you're after?
Yeah I just want to see my variants should cover 100% coverage or atleast 99% coverage
Please explain what you mean with "100% coverage" ?
I meant gene coverage. For example EGFR is the gene found in patient's data and coverage is 100%. So I want to know how to calculate gene coverage
Ah! You want the percentage of variants? In this case you need a vcf file. If you dont have that yet you need to do a step back. And do variant calling first. You could use freebayes for example.EDIT: OP gave new info again, still leave it here just in case.
No not new info, I was being just clear. I need gene coverage. Can you please tell me how to do that?
IF you want gene coverage you need to dig further. I usually use is deepTools and bedtools coverage :
First you can use
bamCoverage
and then you can get the counts four our genes in the bam file usingbedtools coverage
with- counts
option.what is -a file and -b file here? which file I have to use?
these things are very clearly explained in the manual. So please do make use of it!
please read the manual, all the information is in there.
why did you remove your command? It would be helpful for me.
For two reasons: 1) I posted them by mistake 2) I want to write the tools and not my pipeline. I don't have any problem in share it but I give you a lot of information and my pipeline may not suits your data
This is why I asked first the options you used for it... Anyway, the best option to get the coverage for every single base in a bed file, is to use the option depth eg.
samtools depth -aa -b your_regions.bed your_input.bam
I already mentioned the command. I didnt use any option.
Have you tried
depth
command? have a look into this post, there are so many optionsyeah I have used
Which options did you use? When you ask for help with a specific results, is always better to write the code or pipeline you use. If not, is very complicated to understand the output