Entering edit mode
2.9 years ago
Sara
▴
260
I am planning to perform expression analysis on RNAseq data but instead of fastq
files, I received cram
files. I used samtools to convert cram
to fastq files. if you have experience, would you please let me know if the resulting fastq would have any differences with the original fastq files (I do not have access to the original fastq files). here is the command I used:
samtools view -b -T test.cram > test.bam
samtools fastq -1 test_R1.fastq.gz -2 test_R2.fastq.gz test.bam
You should name sort (
samtools sort -n
orsamtools collate
) the BAM files before converting to fastq. If your CRAM file did not include unmapped reads then you will not be able to get those. Fastq data you end up with should be identical to information in your CRAM file.Since cram is reference based format, my understanding is that user must know the reference file used in generating cram and the same should be supplied in generating bam.
Since OP has
-T
in their command they must have provided the reference when doing the actual conversion.Two additional caveats:
Original CRAM files could have contained unaligned fastq data and will not need a reference : Is it possible to directly convert fastq to CRAM ?
samtools
is able to find a reference based on UR field (if that information exists) : Converting CRAM to BAM without reference fastaI would not create cram files without reference (First caveat) nor do I recommend it. If service/cram provider does that, no comments on that.
Second caveat works only if reference file is located as per UR field, on OP machine/network path/public URL. Since it is from a third party, I think samtools may not be able to find the reference. Otherwise, user has to provide by
-T
.If OP could post UR field from cram file, it would help better understanding the location of reference file used in cram file.