Hi everyone my sample Depth of coverage around 300x now i want to make 250x, 200x, 150x, 100x . Can any one suggest some tools or packages to do such work ?
Thank you advance.
Hi everyone my sample Depth of coverage around 300x now i want to make 250x, 200x, 150x, 100x . Can any one suggest some tools or packages to do such work ?
Thank you advance.
If your reads are in SAM/BAM format, you can also the Picard DownsampleSam tool. You then provide a probability of a read being retained during sampling.
You can use seqtk:
something like this to select 1000000 reads (you will need to calculate how many reads would be needed for 250x etc):
seqtk sample -s100 .my.fastq.gz 1000000 | gzip > my.1.fastq.gz
GATK also allows downsampling. in fact it does it always by default. you can use the PrintReads walker if downsampling is your only goal.
as Mick has always stated: "you can't always get what you want, but if you try the '-ds' option, well you just might find, you get what you need"
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi HG. What format are your data in? At what step of your analysis do you want to reduce your coverage? Give us more details so that we can better help you.
HI Eric Thanks for reply. My data set : illumina 250bp pair end reads, whole genome sequencing of E.coil, Exp genome size 5.00mb. Now i already assemble the raw data which is around 300x coverage. Now i want to see if coverage reduce what will be quality of assembly mainly N50 value. For more information i just want to follow my assembly like GAGE-B http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702249/ In this paper they assemble their data in different coverage. I just want to see the same effect of my own data set .