How to trim the length of reads in a CRAM file?
0
0
Entering edit mode
2.6 years ago
Ishan • 0

I have a CRAM file with paired reads which looks like this:

im13@node-13-21:~/scratch_im13_projects/im13_basespace_runs$ samtools view ./walkup_194_repeat/CRAM/A01_FR_KAPA_25x_1ug_SR_1ngx4rxns_S1.cram | head
D00586:937:HVCWGBCX3:1:1101:1485:1803   77      *       0       0       *       *       0       0       NCAGAGGAAGCGGAACGCATGTTTC       #<GGGIIGIGGGIIGIGIIGGG.<<
D00586:937:HVCWGBCX3:1:1101:1485:1803   141     *       0       0       *       *       0       0       AGGGTGTTCGGGCCGCTGCTCTGCA       GAGGGGGGGGGGGIIIIIIGIGGGG
D00586:937:HVCWGBCX3:1:1101:1440:1901   77      *       0       0       *       *       0       0       NGTACCGTGCGACATCGCGAGTATC       #<<<GGAGGIAAGIGGGIG<GA<<<
D00586:937:HVCWGBCX3:1:1101:1440:1901   141     *       0       0       *       *       0       0       CTGTCTGTCTCAATGCCACACTGCA       G<G<AGGGGGGGIGGIIIIIIGGGG
D00586:937:HVCWGBCX3:1:1101:1549:1836   77      *       0       0       *       *       0       0       NTGAAGATGATCGCTTATACGTATC       #<<GGGIIIGGIGGGIGIGIG<.<<
D00586:937:HVCWGBCX3:1:1101:1549:1836   141     *       0       0       *       *       0       0       CTGTGTCGCCCTCGTCCCCGCTGCA       AGGGGGIGGGIGIAGGGIIGGAGGG
D00586:937:HVCWGBCX3:1:1101:1705:1849   77      *       0       0       *       *       0       0       NGGGAGAATGCCATGCATTGGTTTC       #<<GGIGIIIIIIIIGIGGGIG<<<
D00586:937:HVCWGBCX3:1:1101:1705:1849   141     *       0       0       *       *       0       0       GCCAGGAATTCCAGGCTCACCTGCA       GGGGGIIIGAGIIIIIIGIIIGGGI

I would like to trim the ends of these 25bp reads to 20bp length, e.g.

for the first read: NCAGAGGAAGCGGAACGCATGTTTC --> NCAGAGGAAGCGGAACGCAT

for the second read: AGGGTGTTCGGGCCGCTGCTCTGCA -> AGGGTGTTCGGGCCGCTGCT

How can I do this and save the output?

Many thanks!

cram samtools • 900 views
ADD COMMENT
0
Entering edit mode

Looks like these are unaligned reads. You may be able to pipe these through samtools collate (reads seem to be collated but just in case) | samtools fastq | through a program that does the trimming (like bbduk.sh from BBTools) | samtools to CRAM (if you want to restore the format).

ADD REPLY
0
Entering edit mode

I also have the FASTQ files for R1 and R2 which were combined to make the CRAM:

im13@node-13-21:~/scratch_im13_projects/im13_basespace_runs$ samtools view ./walkup_194_repeat/FASTQ/A01_FR_KAPA_25x_1ug_SR_1ngx4rxns_S1_R1_001.fastq.gz | head
D00586:937:HVCWGBCX3:1:1101:1485:1803   4       *       0       0       *       *       0       0       NCAGAGGAAGCGGAACGCATGTTTC       #<GGGIIGIGGGIIGIGIIGGG.<<
D00586:937:HVCWGBCX3:1:1101:1440:1901   4       *       0       0       *       *       0       0       NGTACCGTGCGACATCGCGAGTATC       #<<<GGAGGIAAGIGGGIG<GA<<<
D00586:937:HVCWGBCX3:1:1101:1549:1836   4       *       0       0       *       *       0       0       NTGAAGATGATCGCTTATACGTATC       #<<GGGIIIGGIGGGIGIGIG<.<<
D00586:937:HVCWGBCX3:1:1101:1705:1849   4       *       0       0       *       *       0       0       NGGGAGAATGCCATGCATTGGTTTC       #<<GGIGIIIIIIIIGIGGGIG<<<
ADD REPLY
0
Entering edit mode

can I run the trimming on the FASTQ files and then convert to CRAM? if so, how can I do this? apologies not familiar with bbduk.sh

ADD REPLY
0
Entering edit mode

Yes you can. Assuming you want to do this for both reads you can do something like

bbduk.sh -Xmx2g in1=R1.fastq.gz in2=R2.fastq.gz out1=trim_R1.fastq.gz out2=trim_R2.fastq.gz forcetrimright=19 

Then convert the files to CRAM.

ADD REPLY

Login before adding your answer.

Traffic: 1245 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6