Entering edit mode
6 months ago
Maverick
▴
10
I am trying to convert a cram file into fastq without reference. The command I am using is:
SamToFastq - I <path_to_cram_file> -F "<path_to_first_fastq>" -F2 "<path_to_second_fastq>"
This is the error that shows up:
java.lang.IllegalArgumentException: Failure getting reference bases for sequence chr1
at htsjdk.samtools.cram.build.CRAMReferenceRegion.fetchReferenceBasesByRegion(CRAMReferenceRegion.java:172)
at htsjdk.samtools.cram.build.CRAMReferenceRegion.fetchReferenceBasesByRegion(CRAMReferenceRegion.java:192)
at htsjdk.samtools.cram.structure.Slice.normalizeCRAMRecords(Slice.java:450)
at htsjdk.samtools.cram.structure.Container.getSAMRecords(Container.java:322)
at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:112)
at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:204)
at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:608)
at picard.sam.SamToFastq.doWork(SamToFastq.java:204)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:280)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
This is the snapshot of the cram file when I tried to view it using ViewSam
Is there not a way to convert them without the reference file?
Thank you so much! Will look it up right away.
I was able to get my fastq files. I can see that my reference files are cached in a hts-ref directory but in multiple files as seen in the screenshot below. I want to run these fastq files now into my GATK pipeline that follows the GATK best practices workflow of broad.It requires a fasta file for running the bwa mem. is there a way to compile this cached ref files into a single fasta file? What would be the ideal way to handle this?
Thank you again!
You can use any reference that you like at this point. Since you are planning to use GATK, you can get their version from the resource bundle available from their public cloud bucket here: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/ There are pre-made indexes there as well.
Understood Thank you!