search multiple sequences in multiple fastqs
0
0
Entering edit mode
5.7 years ago
max_19 ▴ 170

Hi all!

I have a script which goes through a bunch of fastq files and searches for specific sequences found in a text file called test_text.txt, then outputs a fastq file with only those reads. The reads in my fastq files are all between 15-30bp long, however when I look at the output fastq files, I am getting reads of 52 bp long.

Does anyone know why this would happen? code is below:

for fastq in `ls my_files`;do
    for i in `cat test_text.txt`; do zgrep -B 1 -A 2 $i $fastq >>filtered_fastq/${fastq}.filtered_reads.fastq; done;
done

Thanks for any input.

sequencing fastq • 1.3k views
ADD COMMENT
0
Entering edit mode

What have you done so far to convince yourself/check the reads are all 15-30bp?

I can't see an obvious mistake at first glance, so are you sure there are no 52 bp reads in the input file?

ADD REPLY
0
Entering edit mode

I used cutadapt to set min/max length for all sequences (cutadapt -a $adapter3 -f fastq -m 15 -M 30). also when I do fastqc on the file, my sequence length in the basic stats section shows '15-30'

ADD REPLY
0
Entering edit mode

Make sure; do awk '{print length}' <filename> | sort -n | tail . The last number should be the longest sequence length in your file. Remove the headers before if you think they're longer than 30bp (i.e. grep -v ">" <filename> | awk '{print length}' | sort -n | tail

ADD REPLY
0
Entering edit mode

Thank you! Tried this and got 54. This is odd! does this mean something is wrong with my cutadapt step? and why does FASTQC report 15-30bp for sequence length

ADD REPLY
0
Entering edit mode

cutadapt should get rid of reads > 30 in length in that case. Are you sure that you have the trimmed file in my_files, and not the original file you had before running cutadapt?

ADD REPLY
0
Entering edit mode

By the way, the -f switch in grep allows you to provide a file of patterns to search; e.g. grep -f <file_with_patterns> <file_to_search_in>, so possibly your could get rid of your inner for loop.

ADD REPLY
0
Entering edit mode

Good to know, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6