Extracting coverage per base pair
2
0
Entering edit mode
7.3 years ago
Burgenix • 0

Hi,

I want to find the coverage per bp when using any seed/bait script available (MITObim, NOVOplasty). I find it wierd that even if they are aligning and baiting to a reference, this information is not readily provided.

Any ideas on how to do this?

Thanks !

alignment coverage • 4.2k views
ADD COMMENT
3
Entering edit mode
7.3 years ago

Assuming you have an aligned BAM file from either of these programs, BEDTools coverage with the -d command line parameter will output per-base read depth ('coverage').

In addition, I would have expected that these programs gave detail on coverage during the de novo assembly stage of analysis.

ADD COMMENT
0
Entering edit mode

Novoplasty doesnt provide this data, and MITObim just outputs the minimum coverage in the logfile at the end of the assembly/alignment, but nothing more. I guess I'll have to find a way to use BEDtools (have no bam files, just fasta -> fasta, sadly

ADD REPLY
0
Entering edit mode

So, you just have the assembled genome in FASTA?

I can only assume that the average depth of coverage achieved was output in one of the log files (or maybe even just output to your screen) - all other programs that I've used for assembly output various metrics for coverage.

If you get desperate, you could just re-align your data to your new genome and then use BEDTools to get coverage that way. This is also common practice to see how well-represented your new genome is in relation to the samples that you used to create it.

ADD REPLY
0
Entering edit mode

Thank you so much Kevin, appreciate it ! I wont rip my hair out and just use good old BEDTools to do it. The logfiles arent that great, as they provide only the minimum coverage. MITObim has .wig files, I can maybe use those.

ADD REPLY
0
Entering edit mode

Yes, BEDTools is very powerful but, wait, if you have wig files, then you may want to take a look at bwtools. They seem to provide information on coverage. However, you first need to convert the wig files to bigWig (look here).

ADD REPLY
2
Entering edit mode
7.3 years ago

If you have BAM files then you can try:

  • samtools depth yourfile.bam

You can also provide your region of interest for per base coverage in BED format. This shall give you per base coverage.

ADD COMMENT
2
Entering edit mode

Or even faster, SamBamba depth. It is multithreaded and offers some nice custom filtering and region/binning options.

ADD REPLY
0
Entering edit mode

Thanks ATPoint - never even knew that existed.

ADD REPLY
3
Entering edit mode

You're welcome. SamBamba is really a nice alternative for SAMtools when it comes to standards tasks such as sorting, indexing, flagstatting and filtering. The tools do mostly the same as their SAMtools counterparts but are more efficient in using multicore setups. Also, it indexes every created file on the fly, reducing processing time for larger files a lot.

EDIT: Be sure to specify the number of threads when using SamBamba. When not specified explicitely (-t) it will run on all available threads instead of only one, as samtools does when -@ is not set.

ADD REPLY
0
Entering edit mode

I have not tried SamBamba (I plan to in the near future!) but I would like to note that as of 1.4 samtools is at least partially multithreaded and substantially faster than prior versions.

Edit: Results of testing -

bushnell@gpint209:/dev/shm$ time samtools view -@ 20 -b mapped.sam > st.bam

real    0m4.779s
user    1m7.448s
sys     0m0.805s
bushnell@gpint209:/dev/shm$ time /global/projectb/sandbox/gaag/bbtools/sambamba/sambamba view -S -f bam mapped.sam > sbb.bam

real    0m6.493s
user    2m45.629s
sys     0m1.160s
bushnell@gpint209:/dev/shm$ time samtools view -@ 20 st.bam > st.sam

real    0m4.521s
user    0m8.411s
sys     0m1.152s
bushnell@gpint209:/dev/shm$ time /global/projectb/sandbox/gaag/bbtools/sambamba/sambamba view -f sam st.bam > sbb.sam

real    0m1.842s
user    0m16.169s
sys     0m1.448s

Very impressive bam -> sam conversion speed!

-rw-rw-r-- 1 bushnell bushnell 1248860045 Oct 10 11:29 mapped.sam
-rw-rw-r-- 1 bushnell bushnell  360334358 Oct 10 11:33 sbb.bam
-rw-rw-r-- 1 bushnell bushnell 1248859507 Oct 10 11:34 sbb.sam
-rw-rw-r-- 1 bushnell bushnell  376423550 Oct 10 11:32 st.bam
-rw-rw-r-- 1 bushnell bushnell 1248859507 Oct 10 11:34 st.sam

Incidentally, SamBamba produced a smaller bam file so probably it was running at higher compression which is why it took longer for sam -> bam. Samtools does not let you adjust the compression level so there's nothing I can do about that. But SamBamba does and it looks like Samtools uses roughly compression level 4. Setting SamBamba at level 4 reduced the user and CPU time to roughly the same as samtools (both slightly higher, but the resulting file was slightly smaller).

ADD REPLY
0
Entering edit mode

And if I do not have .bam files? Can I calculate it using python or whatever?

ADD REPLY
0
Entering edit mode

Which file format do you have, BED?

ADD REPLY

Login before adding your answer.

Traffic: 2148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6