Interpretation Fastq file format (paired end)
2
0
Entering edit mode
5.6 years ago
9521ljh ▴ 50

I download SRR5376997.sra from NCBI

and i convert SRA to fastq using SRA toolkit

fastq-dump SRR5376997.sra

output: SRR5376997.fastq

head SRR5376997.fastq

@SRR5376997.1 HWI-ST1215:350:C42MLACXX:8:1101:1631:2122 length=202 TGTTAAATGATCTAATGAAAAAGATAAACAATTACATAAACAGATGGGGAATTTCAGAGAACTAAAAACTATTTTAAAAGAGTCAAAGAGAAATGCTAAAAC CTTTTTTCTTCTTGTGTTTCAGTTTAGATCGTCTCTATTGGAACATCTTTAAGTTCACTGATTCTTTCCTTAGCTGTGTCCAATCTGCTGATGAGCCTGT

+SRR5376997.1 HWI-ST1215:350:C42MLACXX:8:1101:1631:2122 length=202 @@BFFFFFGGHHH@GGHGIJJJJHGHIJJJJAFGEFCIEGGHIJEHIIJIBGHIHGGGIIIIJJJJJJJGGGIAGAHHHGFF@CDDAED@6>;@ACDDDDDCCCFFFFFDHFGHIG<ecgiggifiiehigijhijijdfiehghg@dghef@fhf>FEH@FFHHII@FGGG@EGIGEEHHGHFFFFCEECEEEEAAACCB>

@SRR5376997.2 HWI-ST1215:350:C42MLACXX:8:1101:1971:2133 length=202 GCTTTCTGGCATTTGCCGATGCCCCCTTCTCTCCCTCTGGGGGGTTTGTGTGTATAAGTGCGTGGGCAGGGAGGCCGCGGAGGCGGCGCCGCGGGCGCTCT GCTGAATTAAACGGTGTCCTGGTCTTCAAGGAGCTTACAAGTGAGAGCGCCCGCGGCGCCGCCTCCGCGGCCTCCCTGCCCACGCACTTATACACACAAAC

+SRR5376997.2 HWI-ST1215:350:C42MLACXX:8:1101:1971:2133 length=202 CCCFFFFFHHDHHJJJJJIJJIJJJJJJJIJHIJIGHHIGIJID5:BDACABADDEEDCCDDBBDDDDDDDBDDB?@DDDDDDD>B9<bdbd@@ddbb>B<bccfdfffhghhhjaffhhhijfhdhijijjhjjjjjjjjjghiiggghjdhgiigfdbdddddddddddbdbdddcdddd4&lt;@b&lt;@bccdddedcb?8?b< p="">

@SRR5376997.3 HWI-ST1215:350:C42MLACXX:8:1101:1963:2152 length=202 CCTTAAGAGGTATGAATGATTGTGATTTGGTGCTTTGGACAATGCCATGTAGAGTGCTTCTTTGGGGGTGAGGGATAGACAGACCCTAGGGGCTCTGAGCT GGAGGCACCTGAAGCTGCCTCTTCCTCCAAGTCAAGAGAATCCTTTCTCCCCATCTTCACAGTCTCACAGCTCACTTCATCTTCCCCGTCCTCACTTTCTT


Q1. how can i know this fastq is paired or not?

Q2. What is the meaning @SRR5376997.1 @SRR5376997.2 @SRR5376997.3???? what is meaning .1 .2 .3 ???


And i split paired end fastq using SRA toolkit(fastq-dump)

fastq-dump -I --split-files SRR5376997.sra

output: SRR5376997_1.fastq SRR5376997_2.fastq

head SRR5376997_1.fastq

@SRR5376997.1.1 HWI-ST1215:350:C42MLACXX:8:1101:1631:2122 length=101 TGTTAAATGATCTAATGAAAAAGATAAACAATTACATAAACAGATGGGGAATTTCAGAGAACTAAAAACTATTTTAAAAGAGTCAAAGAGAAATGCTAAAA

+SRR5376997.1.1 HWI-ST1215:350:C42MLACXX:8:1101:1631:2122 length=101 @@BFFFFFGGHHH@GGHGIJJJJHGHIJJJJAFGEFCIEGGHIJEHIIJIBGHIHGGGIIIIJJJJJJJGGGIAGAHHHGFF@CDDAED@6>;@ACDDDDD @SRR5376997.2.1 HWI-ST1215:350:C42MLACXX:8:1101:1971:2133 length=101 GCTTTCTGGCATTTGCCGATGCCCCCTTCTCTCCCTCTGGGGGGTTTGTGTGTATAAGTGCGTGGGCAGGGAGGCCGCGGAGGCGGCGCCGCGGGCGCTCT

+SRR5376997.2.1 HWI-ST1215:350:C42MLACXX:8:1101:1971:2133 length=101 CCCFFFFFHHDHHJJJJJIJJIJJJJJJJIJHIJIGHHIGIJID5:BDACABADDEEDCCDDBBDDDDDDDBDDB?@DDDDDDD>B9<bdbd@@ddbb>B< @SRR5376997.3.1 HWI-ST1215:350:C42MLACXX:8:1101:1963:2152 length=101 CCTTAAGAGGTATGAATGATTGTGATTTGGTGCTTTGGACAATGCCATGTAGAGTGCTTCTTTGGGGGTGAGGGATAGACAGACCCTAGGGGCTCTGAGCT

head SRR5376997_2.fastq

@SRR5376997.1.2 HWI-ST1215:350:C42MLACXX:8:1101:1631:2122 length=101 CCTTTTTTCTTCTTGTGTTTCAGTTTAGATCGTCTCTATTGGAACATCTTTAAGTTCACTGATTCTTTCCTTAGCTGTGTCCAATCTGCTGATGAGCCTGT +SRR5376997.1.2 HWI-ST1215:350:C42MLACXX:8:1101:1631:2122 length=101 CCCFFFFFDHFGHIG<ecgiggifiiehigijhijijdfiehghg@dghef@fhf>FEH@FFHHII@FGGG@EGIGEEHHGHFFFFCEECEEEEAAACCB>

@SRR5376997.2.2 HWI-ST1215:350:C42MLACXX:8:1101:1971:2133 length=101 GCTGAATTAAACGGTGTCCTGGTCTTCAAGGAGCTTACAAGTGAGAGCGCCCGCGGCGCCGCCTCCGCGGCCTCCCTGCCCACGCACTTATACACACAAAC+SRR5376997.2.2 HWI-ST1215:350:C42MLACXX:8:1101:1971:2133 length=101 BCCFDFFFHGHHHJAFFHHHIJFHDHIJIJJHJJJJJJJJJGHIIGGGHJDHGIIGFDBDDDDDDDDDDDBDBDDDCDDDD4<@B<@BCCDDDEDCB?8?B

@SRR5376997.3.2 HWI-ST1215:350:C42MLACXX:8:1101:1963:2152 length=101 GGAGGCACCTGAAGCTGCCTCTTCCTCCAAGTCAAGAGAATCCTTTCTCCCCATCTTCACAGTCTCACAGCTCACTTCATCTTCCCCGTCCTCACTTTCTT

As you can see above, SRR5376997.fastq -> SRR5376997_1.fastq(non-Bold) SRR5376997_2.fastq(Bold)

TGTTAAATGATCTAATGAAAAAGATAAACAATTACATAAACAGATGGGGAATTTCAGAGAACTAAAAACTATTTTAAAAGAGTCAAAGAGAAATGCTAAAA CCTTTTTTCTTCTTGTGTTTCAGTTTAGATCGTCTCTATTGGAACATCTTTAAGTTCACTGATTCTTTCCTTAGCTGTGTCCAATCTGCTGATGAGCCTGT


Q3. it means non-bold(SRR5376997_1) is paired with bold(SRR5376997_2)???

Q1. how can i know this fastq is paired or not?

Q2. What is the meaning @SRR5376997.1 @SRR5376997.2 @SRR5376997.3???? what is meaning .1 .2 .3 ???

sequence sequencing alignment • 1.5k views
ADD COMMENT
2
Entering edit mode
5.6 years ago
GenoMax 147k

When in doubt check EBI-ENA for downloading fastq files. SRR5376997 is indeed a paired-end data set.

What is the meaning @SRR5376997.1 @SRR5376997.2 @SRR5376997.3???? what is meaning .1 .2 .3 ???

*.1, *.2, *.3 is just sequential read numbering. If you use fastq-dump with -F option you should recover original Illumina format headers without the SRR* accession numbers.

ADD COMMENT
2
Entering edit mode
5.6 years ago

You find the information of paired-end sequencing in the study information : https://www.ncbi.nlm.nih.gov/sra/SRX2672295

Layout: PAIRED

In Illumina documentation R1 is forward and R2 is reverse (depends on your sequencing kit) https://emea.illumina.com/content/dam/illumina-marketing/documents/products/illumina_sequencing_introduction.pdf

If you do not --split-files in your fastq-dump command, both forward and reverse reads are mixed in your file SRR5376997.fastq

Reads in R2 file are reverse complement to be from 5' to 3' in your fastq file so you have the following :

#R1 :
ATGC
#R2:
GTCA
#Result
ATGC----->
           <-----TGAC

In any case I go to ENA to download these SRR files

You can use this tuto to download your files : Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD COMMENT

Login before adding your answer.

Traffic: 2463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6