Get A Graphical Representation Of Number Of Reads In Bam File For A Certain Genomic Region
4
2
Entering edit mode
11.7 years ago
Whetting ★ 1.6k

Hi guys and girls,

One of my coworkers is scrambling to redo a figure for a manuscript under review. The reviewers want to see a graphical representation of the number of reads from an RNA seq experiment. My coworkers is a biologists and computers are not his strong suit. We received a BAM file from the company who did the sequeuncing. However, they are not very responsive about getting us the data needed for the figure. I am probably the most computer literate in the lab, but I have never dealt with next-gen sequencing data, so I ma completely at a loss.

From some of the other questions on the forum, I gathered that Samtools could to this. We are interested in plotting the number of reads in "Chr2: 212,000,000 – 213,800,000". I assume, you would do some kind of sliding window in order to bin the reads?

Any pointers would be greatly appreciated!

example of desired output

EDIT: thanks for all the help. Each of the proposed options seems to come with its own little problems. I am sure I'll figure it out though!

bam samtools • 7.7k views
ADD COMMENT
1
Entering edit mode

The link to your example doesn't appear to be working.

But see the question Converting Bam To Bedgraph For Viewing On Ucsc? for some possibly helpful ideas.

ADD REPLY
4
Entering edit mode
11.7 years ago

upload your BAM (or a sub part of the BAM) in the UCSC as a custom track and download the figure(s) as PDF/PS.

ADD COMMENT
1
Entering edit mode

I would go with this method, but I would first convert to a coverage plot to reduce the file sizes. I would recommend bedtools genomecov, using the -ibam flag to import the BAM file. With the resulting bedGraph output you can convert it to bigWig (search 'UCSC bigwig'). It is often problematic trying to directly view a BAM file in UCSC, plus a coverage plot is more aesthetically pleasing in a figure.

ADD REPLY
3
Entering edit mode
11.7 years ago

Try IGV, from the Broad institute. You can import indexed .bams to it, and zoom in on the gene, and see the reads pileup up on exons.

ADD COMMENT
0
Entering edit mode

Could be a bit tricky given a 1.8Mb region. My IGV stops displaying alignments at regions > 0.1Mb roughly.

ADD REPLY
4
Entering edit mode

That setting can be changed under View -> Preferences -> Alignments -> Visibility range threshold. It is set at 30kb by default. With enough memory available you might be able to increase to 2000kb to encompass this region? I tried it on my laptop (for a bam with ~50X coverage) with 2MB of ram allocated for IGV and it was insufficient though. I'll try again later on a bigger memory machine and see if this is possible. Increasing downsampling might help also? But, given the challenges some of the other solutions suggested in this thread might be better.

ADD REPLY
0
Entering edit mode

Good point, thanks! That works on my workstation (one 5GB RNA-Seq BAM file using ~500MB main memory).

ADD REPLY
0
Entering edit mode

How would I get the image out of IGV?

ADD REPLY
0
Entering edit mode

File -> save image in IGV will save a snapshot of what every you are looking at in IGV. I often just take snapshots using mac's built in screenshot options.

ADD REPLY
3
Entering edit mode
11.7 years ago
Christof Winter ★ 1.0k

If you are interested in the number of reads per position to then plot them with a tool of your choice, samtools is indeed your friend:

samtools depth -r chr2:212,000,000-213,800,000 accepted_hits.bam > reads-per-position.txt

will produce

chr2    212107123    1
chr2    212107124    1
chr2    212107125    1
chr2    212107126    2
chr2    212107127    2
chr2    212107128    2
...

The last column contains the number of reads covering the given position. Note that positions with zero coverage are omitted. You could easily go ahead and turn the output into a BED file and then use bedtools intersect to restrict your positions exons only (using e.g. a GFF or GTF file containing the exon coordinates).

ADD COMMENT
0
Entering edit mode

is this faster than samtools mpileup? That command gives depth at each position along with some additional information

ADD REPLY
0
Entering edit mode

For RNA-Seq BAM files that have N's in the CIGAR string to represent introns, samtools mpileup unfortunately also counts the intronic gaps in reads spanning two exons as read depth (which is normally not what you want). Also, by default it ignores reads with an unmapped mate (to get these, add the -A option).

ADD REPLY
3
Entering edit mode
11.7 years ago

what we normally do in our NGS pipeline is to generate a bigwig coverage file for each BAM file, which can be visualized directly on IGV. here is all the code needed:

genomeCoverageBed -bg -ibam input.bam > coverage.bedgraph
bedGraphToBigWig coverage.bedgraph human_hg19.fa.fai coverage.bigwig

you will have to have installed bedtools and the UCSC's bedGraphToBigWig, plus the index of the reference used (human_hg19.fa.fai in our case)

ADD COMMENT
0
Entering edit mode

cool, how do you get the image out of IGV?

ADD REPLY
0
Entering edit mode

File -> save image. Or, take a screenshot.

ADD REPLY
1
Entering edit mode

true. IGV natively provides a PNG image as output, although it won't be as good as the PDF/PS image that Pierre mentioned.

ADD REPLY
1
Entering edit mode

You can save as SVG as well, so you'll get a vector graphic.

ADD REPLY
1
Entering edit mode

Yes. You can choose between .png, .jpeg, and .svg (under 'file format') when you save image.

ADD REPLY

Login before adding your answer.

Traffic: 2252 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6