Simulate reads for positions in a vcf file
2
0
Entering edit mode
16 months ago
berndmann ▴ 10

Is there any way to simulate reads for any amount of positions in a vcf file? I found a GATK3 tool to simulate those mentioned in this thread vcf to bam by Pierre Lindenbaum but I was unable to find the correct GATK legacy version which contains this tool. The GATK tool seems to be a little bit more sophisticated since it takes a user-defined read-error into account. But correct me if I'm wrong here and every other tool posted in this thread does this as well.

Could anyone also outline how accurate this bam file will be if one uses it to test a variant-calling pipeline?

Best regards,

Berndmann

variant-calling simulation bam vcf • 2.1k views
ADD COMMENT
0
Entering edit mode

Do you need to use a VCF as starting point?

If not you could use mutate.sh to introduce mutations in a reference (while creating a VCF file in process) and then use randomreads.sh from BBMap suite to create fastq reads from that mutated genome.

ADD REPLY
0
Entering edit mode

Thanks for your answer GenoMax . What is the more reasonable approach if one needs to start at a vcf and wants to create reads that can be used in a calling algorithm to generate a vcf file comparable to the origin.

ADD REPLY
0
Entering edit mode

There's another tool, applyvariants.sh, that accepts a reference and vcf and will output the altered reference. Then you can simulate reads from it using randomreads.sh.

ADD REPLY
2
Entering edit mode
16 months ago
GenoMax 147k

You can get gatk v.3.8 from here. It has the SimulateReadsForVariants tool. Be aware that GATK3 is now unsupported.

Note: This version of GATK only works with Java 1.8.

ADD COMMENT
0
Entering edit mode
base) Gigapepe2:/media/Genomics_prac_guide/tools/gatk3$ java -jar GenomeAnalysisTK.jar --version
    3.8-0-ge9d806836
    (base) Gigapepe2:/media/Genomics_prac_guide/tools/gatk3$ java -jar GenomeAnalysisTK.jar -T SimulateReadsForVariants
    ##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: SimulateReadsForVariants
##### ERROR ------------------------------------------------------------------------------------------

I tried it yesterday and recieved the same output when I tried to use SimulateReadsForVariants. Is there any obvious mistake that I made?

ADD REPLY
0
Entering edit mode

Did you

export JARDIR=/path_to/gatk/3.8-0

$ java -jar $JARDIR/GenomeAnalysisTK.jar  -T SimulateReadsForVariants --version
3.8-1-0-gf15c1c3ef
ADD REPLY
0
Entering edit mode
java -jar $JARDIR/GenomeAnalysisTK.jar -T SimulateReadsForVariants --version 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: SimulateReadsForVariants
##### ERROR ------------------------------------------------------------------------------------------

Yes, I did. Also tried it with the same version you mentioned but this produced the same results.

ADD REPLY
0
Entering edit mode

Create a script file called gatk with :

-----------------------------------------------
#!/bin/sh

JARDIR=/path_to/gatk/3.8-0

java -jar $JARDIR/GenomeAnalysisTK.jar $*
-----------------------------------------------

Make it executable

chmod +x gatk

Do not move contents of GATK dir after you uncompress the distro.

Then try

$ gatk -T SimulateReadsForVariants --version
3.8-1-0-gf15c1c3ef

Probably a long shot but are you using a non-US keyboard layout?

ADD REPLY
0
Entering edit mode

Yes, I do. I wonder why this is important :D

ADD REPLY
0
Entering edit mode

Even though characters may look similar on screen they may be recognized as something else (perhaps the -). Like I said a long shot. It is working for me.

ADD REPLY
0
Entering edit mode

Btw. This did not work the same error as before:

sh gatk -T SimulateReadsForVariants --version
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: SimulateReadsForVariants
##### ERROR ------------------------------------------------------------------------------------------
ADD REPLY
0
Entering edit mode

When I use a more recent version of GATK (4.2) everything works fine. My system only has a problem with all the legacy versions see :

java -jar gatk4_4.jar IndexFeatureFile
USAGE: IndexFeatureFile [arguments]

Creates an index for a feature file, e.g. VCF or BED file.
Version:4.2.6.1-SNAPSHOT


Required Arguments:

--input,-I <GATKPath>         Feature file (eg., VCF or BED file) to index. Must be in a tribble-supported format 
                              Required. 


Optional Arguments:

--arguments_file <File>       read one or more arguments files and add them to the command line  This argument may be
                              specified 0 or more times. Default value: null. 

--gatk-config-file <String>   A configuration file to use with the GATK.  Default value: null. 

--gcs-max-retries,-gcs-retries <Integer>
                              If the GCS bucket channel errors out, how many times it will attempt to re-initiate the
                              connection  Default value: 20. 

--gcs-project-for-requester-pays <String>
                              Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be
                              accessed.  User must have storage.buckets.get permission on the bucket being accessed. 
                              Default value: . 

--help,-h <Boolean>           display the help message  Default value: false. Possible values: {true, false} 

--output,-O <GATKPath>        The output index file. If missing, the tool will create an index file in the same
                              directory as the input file.  Default value: null. 

--QUIET <Boolean>             Whether to suppress job-summary info on System.err.  Default value: false. Possible
                              values: {true, false} 

--tmp-dir <GATKPath>          Temp directory to use.  Default value: null. 

--use-jdk-deflater,-jdk-deflater <Boolean>
                              Whether to use the JdkDeflater (as opposed to IntelDeflater)  Default value: false.
                              Possible values: {true, false} 

--use-jdk-inflater,-jdk-inflater <Boolean>
                              Whether to use the JdkInflater (as opposed to IntelInflater)  Default value: false.
                              Possible values: {true, false} 

--verbosity <LogLevel>        Control verbosity of logging.  Default value: INFO. Possible values: {ERROR, WARNING,
                              INFO, DEBUG} 

--version <Boolean>           display the version number for this tool  Default value: false. Possible values: {true,
                              false} 


Advanced Arguments:

--showHidden <Boolean>        display hidden arguments  Default value: false. Possible values: {true, false} 


***********************************************************************

A USER ERROR has occurred: Argument input was missing: Argument 'input' is required

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
ADD REPLY
0
Entering edit mode

GenoMax Could you please upload your version of GATK 3.8.0 so I can check that your version works on my system?

ADD REPLY
1
Entering edit mode

Make sure you are using java v.1.8. GATK version 3.8.1 available from the cloud link in my answer only seems to work with that version.

ADD REPLY
0
Entering edit mode

Fantastic, it works! I guess it seems to be a good thing to clarify this in your answer. This might be helpful for other people as well.

ADD REPLY
0
Entering edit mode
16 months ago
berndmann ▴ 10

It seems like this answer in another thread might solve the problem of read sampling from an input.vcf?

ADD COMMENT

Login before adding your answer.

Traffic: 2229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6