Entering edit mode
9.2 years ago
EVR
▴
610
Hi,
I would like to know about the index used in the sample. I mean from the raw fastq file, I would like know the Illumina index read present in the raw fastq file. Is there any way to find that information from raw fastq file.
Kindly guide me. Thanks in advance.
Hi Daniel,
For an example, if I know the index read of a sample say "XXXPSDE", then can I use
grep -c "XXXPSDE" Sample_raw.fastq
to obtain the total number of reads containing this index?Sorry, I don't follow. What is `XXXPSDE`?
Sorry for the confusion, say for an example, if I have Illumina True seq adapter index 7 , CAGATC. In order to find how many raw reads that has this index, then Can I use grep -c "CAGATC" Sample_raw.fastq to estimate the counts?
Yes, the last field of the identifier is the index.
If you don't know the index sequence and/or the FASTQ contains multiple indices, you can use the following to get counts:
If my fastq file is already decompressed then can I use
to obtain the indices?
The example that you posted begins with @H534, not @HWI. Otherwise, the command should work.
Your grep will also include every instance of CAGATC that's present in your reads (~1 per 4000 nucleotides of sequence). You want to parse only the read identifiers, which typically begin with '@HWI' .
Hi Harold,
Thanks for your tip. In that case if a read begins like "@H534:291:C6YYCACXX:7:1101:1748:1945 1:N:0:CGATGT" then I have to extract CGATGT which represents adapter index used for this sample. Am I right?