Dear all, I have one question how could I filtrate the reads from bam file, which have length of read lower than 30 bp. If it is lower than 30 bp, this rows will be deleted from bam file. I think I could use :
samtools view -h /home/filip/Desktop/rozdeleny\ bed_009_QCfailed/Ionfiltrovany1.bam | perl -lane '$l = 10; $F[5] =~ s/(\d+)[MX=DN]/$l+=$1/eg; print if $l > 30 or /^@/' | samtools view -bS - > bar.bam
Tahnk you
why looking at the cigar string $F[5] when you can just get the length of the SEQ ( $F[9] ) ? Is it really the LENGTH of the READ you're looking for ?
Oh I think it is my mistake. I think at 10th column is the all read (ACTCG...) and Could I use this syntax for the 9th column to filtrate the reads with length lower than 30 ?
yes, but again, why do you want to use the 9th column=CIGAR instead of the 10th=ATGC sequence ? I would understand if you only wanted the number of reference bases that the read covers, excluding padding.
No I would like to filter the reads which have the read length lower than 30 bp. In final bam file will be only the reads, which have length higher than 30bp. Or I badly understand of your question? I don't know if my script is ok. I am starting bioinformatician...