Finding the iIllumina index read from raw fastq file
2
0
Entering edit mode
9.1 years ago
EVR ▴ 610

Hi,

I would like to know about the index used in the sample. I mean from the raw fastq file, I would like know the Illumina index read present in the raw fastq file. Is there any way to find that information from raw fastq file.

Kindly guide me. Thanks in advance.

RNA-Seq illumina index-read • 10k views
ADD COMMENT
4
Entering edit mode
9.1 years ago
Danielk ▴ 640

It depends on how the raw fastq file was generated form the more raw bcl files. It's sometimes available in the name of the read, and sometimes it's supplied as a separate fastq file with the index read.

ADD COMMENT
0
Entering edit mode

Hi Daniel,

For an example, if I know the index read of a sample say "XXXPSDE", then can I use grep -c "XXXPSDE" Sample_raw.fastq to obtain the total number of reads containing this index?

ADD REPLY
0
Entering edit mode

Sorry, I don't follow. What is `XXXPSDE`?

ADD REPLY
0
Entering edit mode

Sorry for the confusion, say for an example, if I have Illumina True seq adapter index 7 , CAGATC. In order to find how many raw reads that has this index, then Can I use grep -c "CAGATC" Sample_raw.fastq to estimate the counts?

ADD REPLY
2
Entering edit mode

Yes, the last field of the identifier is the index.

If you don't know the index sequence and/or the FASTQ contains multiple indices, you can use the following to get counts:

zcat NAME_OF_FASTQ | grep '^@H534' | cut -d : -f 10 | sort | uniq -c | sort -nr > indices.txt
ADD REPLY
1
Entering edit mode

If my fastq file is already decompressed then can I use

grep '^@HWI' Sample_raw.fastq | cut -d : -f 10 | sort | uniq -c | sort -nr > indices.txt

to obtain the indices?

ADD REPLY
2
Entering edit mode

The example that you posted begins with @H534, not @HWI. Otherwise, the command should work.

ADD REPLY
1
Entering edit mode

Your grep will also include every instance of CAGATC that's present in your reads (~1 per 4000 nucleotides of sequence). You want to parse only the read identifiers, which typically begin with '@HWI' .

ADD REPLY
0
Entering edit mode

Hi Harold,

Thanks for your tip. In that case if a read begins like "@H534:291:C6YYCACXX:7:1101:1748:1945 1:N:0:CGATGT" then I have to extract CGATGT which represents adapter index used for this sample. Am I right?

ADD REPLY
3
Entering edit mode
9.1 years ago

Did you try ExtractIlluminaBarcodes function in picard tools?

ADD COMMENT

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6