tool for a pattern of repeats
1
0
Entering edit mode
7.5 years ago
2nelly ▴ 350

Dear all,

I am wondering if there is any tool or someone have a custom script to extract reads from sam/bam or fastq file that contain a specific pattern of repeats. Let s say we have 1000000 reads and we want to look how many of them contain the pattern (CCAATT)n.

Thank you in advance.

sequencing sequence • 1.4k views
ADD COMMENT
2
Entering edit mode
7.5 years ago

using samjs: http://lindenb.github.io/jvarkit/SamJavascript.html

java -jar dist/samjs.jar -e 'record.getReadString().contains("CCAATT") || record.getReadString().contains("GGTTAA")'  input.bam

using awk:

samtools view -h input.bam | awk -F '\t'  '($0 ~ /^@/ || $10 ~ /CCAATT/ || $10 ~ /AATTGG/ )'

from a fastq:

  gunzip -c input.fastq.gz | paste - - - - | awk  -F '\t'  '($2 ~ /CCAATT/ || $2 ~ /AATTGG/ )' | tr "\t" "\n"
ADD COMMENT
0
Entering edit mode

Thank you so much Pierre!

You are always there when we need help.

I will give a try to all of these commands

ADD REPLY
0
Entering edit mode

You've asked many questions on this forum without validation (green mark on the left) or comment. e.g: samtools flagstat only for exome coordinates . Please validate/close the answers.

ADD REPLY
0
Entering edit mode

Dear Pierre,

These commands are working nicely.

However, what about if we want to look for reads consisting of let s say more than 50% of this repeat (randomly distributed within the reads)?

many thanks

ADD REPLY

Login before adding your answer.

Traffic: 2332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6