Question

Any ways of extracting and filtering read sequences from cram files with samtools?

0

Entering edit mode

9 months ago

sacryt • 0

I am trying to extract read sequences from a cram file. It would be easiest with pysam but this is not working for me on my work environment in DNAnexus RAP for some reason.

Is there an alternative way of printing and saving just the read sequences from a cram file with samtools e.g. to an array or text file? I want just the sequence, not any other information associated like cigar strings

Additionally, is there any way to filter the reads, e.g. according to mapping quality before displaying them?

Thank you

samtools cram reads sequence • 1.2k views

ADD COMMENT • link 8 months ago by sacryt • 0

score 4 · Accepted Answer · 2024-02-26

4

Entering edit mode

9 months ago

Huiyang ▴ 190

You can use "samtools fastq" or "samtools fasta" to convert CRAM to FASTQ or FASTA. If your CRAM is sorted by position, you need to use "samtools sort -n" to convert your CRAM to a read name sorted format before output.

ADD COMMENT • link 9 months ago by Huiyang ▴ 190

0

Entering edit mode

Thank you, I've converted them to FASTQ files, though they are now much larger.

I was planning on trying to use the read mapping quality field in the cram files to specifically extract the reads which were of low quality - is there any way for me to do that with samtools, and then I could convert them to FASTQ?

ADD REPLY • link 9 months ago by sacryt • 0

0

Entering edit mode

You can use the '-F' parameter in sambamba view for filtering, for example, -F "mapping_quality < 10" to obtain reads with mapping quality lower than 10. It is worth noting that versions of sambamba >= 0.8 no longer support the CRAM format, while versions < 0.8 do support the CRAM format.

ADD REPLY • link 9 months ago by Huiyang ▴ 190

0

Entering edit mode

Try: samtools view -u -e 'mapq<10' in.cram | samtools fasta -

The main samtools.1 man page has a section "FILTER EXPRESSIONS" describing the syntax. http://www.htslib.org/doc/samtools.html#FILTER_EXPRESSIONS.

It's worth noting that if your read data is paired, then any filtering is problematic as it'll break pairs up.

ADD REPLY • link 8 months ago by jkbonfield ★ 1.3k

0

Entering edit mode

Thank you! Sorry I just saw this, is there an alternative for paired read data? I ran this

samtools view -u -e 'mapq<10' "path/in.cram" > "path/out.cram"

but it is running indefinitely for several hours, much longer than just viewing the cram file itself. I'm guessing it's because my data might be paired? I wasn't sure

ADD REPLY • link 8 months ago by sacryt • 0