Finding out which genes are covered the most
1
0
Entering edit mode
6.0 years ago

Hey! I have performed an alignment of my reads (paired end) using Bowtie2. I would like to know which region of the genome is most highly covered.

As per my knowledge this can be done using mpileup or Bamcov or visualize the bam file in IGV. But, I want to specifically see which genes are the most highly covered and not just regions of high coverage.

Is there a way bam files can be mapped on gff files or is there any other approach to this.

Looking forward to replies!!

Thanks

coverage genes • 1.7k views
ADD COMMENT
0
Entering edit mode

You can bin the genome (*ome) and then check of the coverage or use a gtf file to get the coverage. try bedtools coverage. In addition, one can generate histogram of coverage.

ADD REPLY
5
Entering edit mode
6.0 years ago

I wrote BamStats05 http://lindenb.github.io/jvarkit/BamStats05.html

$ head genes.bed
1   179655424   179655582   ZORG
1   179656788   179656934   ZORG

$ java -jar  dist/bamstats05.jar -B genes.bed --mincoverage 10  in.bam > out.txt

$ head out.txt
#chrom  start   end gene    sample  length  mincov  maxcov  avg nocoverage.bp   percentcovered
1   179655424   179656934   ZORG    SAMPLE1 304 27  405 216.80921052631578  0   100
ADD COMMENT
0
Entering edit mode

I get the error as below:

input: bed file CP019610.1 44 914 B0W97_00010 CP019610.1 991 1309 B0W97_00015 CP019610.1 1365 1695 B0W97_00020 CP019610.1 1691 2060 B0W97_00025 CP019610.1 2165 2465 B0W97_00030 CP019610.1 2571 3099 B0W97_00035

[SEVERE][BamStats05]bad start/end in BED line : "CP019610.1 (tab) 44 (tab) 914 (tab) B0W97_00010" java.lang.IllegalArgumentException: bad start/end in BED line : "CP019610.1 (tab) 44 (tab) 914 (tab) B0W97_00010" at com.github.lindenb.jvarkit.util.bio.bed.BedLine.<init>(BedLine.java:58) at com.github.lindenb.jvarkit.util.bio.bed.BedLineCodec.decode(BedLineCodec.java:61) at com.github.lindenb.jvarkit.tools.bamstats04.BamStats05.readBedFile(BamStats05.java:134) at com.github.lindenb.jvarkit.tools.bamstats04.BamStats05.doWork(BamStats05.java:329) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1145) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1303) at com.github.lindenb.jvarkit.tools.bamstats04.BamStats05.main(BamStats05.java:406) Caused by: java.lang.NumberFormatException: For input string: " 44 " at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:569) at java.lang.Integer.parseInt(Integer.java:615) at com.github.lindenb.jvarkit.util.bio.bed.BedLine.<init>(BedLine.java:54) ... 6 more [INFO][Launcher]bamstats05 Exited with failure (-1)

ADD REPLY
0
Entering edit mode

CP019610.1 (tab) 44 (tab) 914 (tab) B0W97_00010

va.lang.NumberFormatException: For input string: " 44 " at

I see some whitespaces "(space)(tab)(space)" between each column. There should be only one "(tab)".

ADD REPLY
0
Entering edit mode

BTW , while I was here I've just added a new option '--merge' for my tool : https://github.com/lindenb/jvarkit/commit/1be40fd3e7aac9dadcfbdb4fb079027ad0b3581a ( with an error, fixed later...)

ADD REPLY
0
Entering edit mode

works now! Thanks and I will get back to you in case of doubts.

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Hi! Can you help me with understanding no_coverage column means? It would be really great if you tell what the column header mean for each field. I know it's intuitive but it would be helpful.

ADD REPLY
0
Entering edit mode

no_coverage

number of bases in the interval that are below the limit (default <=0)

ADD REPLY

Login before adding your answer.

Traffic: 2696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6