I used randomreads.sh
from bbmap to generate reads from a fasta file generated with FastaAlternateReferenceMaker from GATK. It seems no matter which options I choose the script stops generating reads and chromosome 7 despite the fasta file contains all contigs from human_g1k_v37.fasta. See:
sed -n 's/^>//p' alternate.fasta
1 1:1-249250621
2 2:1-243199373
3 3:1-198022430
4 4:1-191154276
5 5:1-180915260
6 6:1-171115067
7 7:1-159138663
8 8:1-146364022
9 9:1-141213431
10 10:1-135534747
11 11:1-135006516
12 12:1-133851895
13 13:1-115169878
14 14:1-107349540
15 15:1-102531392
16 16:1-90354753
17 17:1-81195210
18 18:1-78077248
19 19:1-59128983
20 20:1-63025520
21 21:1-48129895
22 22:1-51304566
23 X:1-155270560
24 Y:1-59373566
25 MT:1-16569
26 GL000207.1:1-4262
27 GL000226.1:1-15008
28 GL000229.1:1-19913
29 GL000231.1:1-27386
30 GL000210.1:1-27682
31 GL000239.1:1-33824
32 GL000235.1:1-34474
33 GL000201.1:1-36148
34 GL000247.1:1-36422
35 GL000245.1:1-36651
36 GL000197.1:1-37175
37 GL000203.1:1-37498
38 GL000246.1:1-38154
39 GL000249.1:1-38502
40 GL000196.1:1-38914
41 GL000248.1:1-39786
42 GL000244.1:1-39929
43 GL000238.1:1-39939
44 GL000202.1:1-40103
45 GL000234.1:1-40531
46 GL000232.1:1-40652
47 GL000206.1:1-41001
48 GL000240.1:1-41933
49 GL000236.1:1-41934
50 GL000241.1:1-42152
51 GL000243.1:1-43341
52 GL000242.1:1-43523
53 GL000230.1:1-43691
54 GL000237.1:1-45867
55 GL000233.1:1-45941
56 GL000204.1:1-81310
57 GL000198.1:1-90085
58 GL000208.1:1-92689
59 GL000191.1:1-106433
60 GL000227.1:1-128374
61 GL000228.1:1-129120
62 GL000214.1:1-137718
63 GL000221.1:1-155397
64 GL000209.1:1-159169
65 GL000218.1:1-161147
66 GL000220.1:1-161802
67 GL000213.1:1-164239
68 GL000211.1:1-166566
69 GL000199.1:1-169874
70 GL000217.1:1-172149
71 GL000216.1:1-172294
72 GL000215.1:1-172545
73 GL000205.1:1-174588
74 GL000219.1:1-179198
75 GL000224.1:1-179693
76 GL000223.1:1-180455
77 GL000195.1:1-182896
78 GL000212.1:1-186858
79 GL000222.1:1-186861
80 GL000200.1:1-187035
81 GL000193.1:1-189789
82 GL000194.1:1-191469
83 GL000225.1:1-211173
84 GL000192.1:1-547496
Is this an expected behavior or can one change this to produce all/different chromosomes?
Best,
Berndmann
Provide the command line used. My guess is your job ran out of memory.
Not quite sure what you are going for here. You have set a pretty low coverage value so my guess is that may be one reason you do not have all chromosomes covered. You also are creating a BAM file. I thought you wanted to get fastq reads? You are also getting singe-end reads by this method (unless you set
paired=t
).My suggestion would be to generate plenty of reads using
randomreads.sh
to get good coverage (pay attention to parameters for SNP etc they have default values) and then sample the number of reads you need from this large pool usingreformat.sh
.I will give this a try! What would you suggest for the SNP values. I never had to sample reads. The main goal is to simulate reads that mimic low-coverage data. That is also why I chose this extreme low coverage. But might be a better idea to subsample the coverage in a later step.
Even after I added your suggestions the problem persists. I can increase the memory usage further but did not observe any large spike when the script ran.
Furthermore I highly doubt that this is a problem with my physical memory limit. See :
Is this maybe a problem how randomreads.sh calculates mem-usage?
If you are also doing paired reads then explicitly indicate an
out1=R1.fq.gz
andout2=R2.fq.gz
otherwise the reads would be writteninterleaved
in a single file (out=
).Ok. How do I create
R1.fq.gz
. andR2.fq.gz
from a.fasta
file generated in the way described in the TO question?You should do
Well, ok. I missed the out parameter. Sorry for that kinda stupid question.