Question

In-silico downsizing to estimate the DNA input

0

Entering edit mode

4.5 years ago

APJ ▴ 40

Hi,

Given a fastq file from 50ng data, I could find all the reference variants from the variant calling results. Is it possible to test in silico downsizing of fastq data, to see what the minimal DNA amount would be to not lose our reference variants?

Any thoughts on this?

Thank you!

sequencing snp next-gen • 855 views

ADD COMMENT • link updated 4.5 years ago by 5heikki 11k • written 4.5 years ago by APJ ▴ 40

1

Entering edit mode

I guess all you can do is check how coverage differences change variant calls, but I doubt that you can meaningfully simulate different DNA amounts as this is dependent on the kit and the number of PCR cycles, so you would need data for different starting amounts and then make a model based on these data.

ADD REPLY • link 4.5 years ago by ATpoint 89k

score 0 · Answer 1 · 2021-03-17

Why not?

You can do e.g. this:

paste -d $'\t' - - - - <file.fq | shuf -n "$NUMBER" | awk 'BEGIN{FS="\t";OFS="\n"}{print $1,$2,$3,$4}' > out.fq

Where "$NUMBER" is the number of reads you want in your output. If you want the shuf to be deterministic or e.g. have the chance to including the same read more than once then see man shuf