Entering edit mode
24 months ago
langziv
▴
70
Hello.
It looks like the CRAM files I have consist of multiple genomes' data. If that's even possible, is there a way to split each file into separate ones so that each will include data from a single genome?
why would you want to do that ?
anyway : How To Split A Bam File By Chromosome ; How Can I Split Bam Into Chromosome (In A Loop) Using Samtools? ; split sorted bam file chromosome wise ; etc...
I need to do variant calling, and I need to associate variants with their respective genome.
most SV callers will accept a BED file / a range to call a specific interval.
I noticed the problem after I did the variant calling. I got VCF files with no associations between variants and genomes.
If this is related to Getting information on CRAM files from headers inside the files then we don't know that there is actually more than one genome in the files you have.
My suspicion is that you don't have multiple genomes. Examine the read headers and see if you have multiple flowcells/lanes/flowcell serials numbers present.
Thanks @genomax.
I'm not sure how to identify flowcells/lanes/flowcell serials numbers in CRAM files. Can you give an example?
You will need to examine the reads id's in column 1 of the alignments.
Sequence identifiers are explained in this Wikipedia section.
Thanks, but this link explains the structure of FASTA files. I don't have FASTA files. My initial data are in CRAM files.
it's not. It's about FASTQ.
read carefully what Genomax said:
So I need to convert the CRAM files to FASTQ files in order to get that information?
Yes. You could do this on the fly.
This is the same FC with 4 lanes.
Thanks.
So it means that it's a single genome?