Hello, I have two fastq files 3D_1.fastq and 3d_2.fastq. To the best of my knowledge the first file contains forward reads and the second file contains reverse reads. I am able to confirm that the fastq files were generated as paired end reads, 101 base pairs in length, and have Illumina/sanger 1.9+ encoding. The data files that I have are the nucleotide sequences from a single sample and from a highseq machine. For some reason I am getting an error message from Picard that indicates a lack of read group information in the header of my files. I used Bowtie2 to map the reads against a reference genome and used the sorted bam file as the input file in order to validate its information in Picard. These are the first few lines from my first fastq file
@SN996:194:H5V7HBCXY:1:1108:1872:2028 1:N:0:TCTCGCGC
NTATTTCATAGCATACTTTTCCGGGCTCGCCGGGCCTAAGAAAGTTGCAAAAATTTTTCAATCGAAATACAAATGAAATTAAAACCTACGCGCGTGTGTGG
+
DHHIIIIIIIIIHHFHIHHIHGIIICHHGHIIHIHHHEHIDGHHFEHIHGHHIIHIIHGIIIIIHIHIIECHIIGFFHHIHIHCFHIIG<<E0CFHH
@SN996:194:H5V7HBCXY:1:1108:1995:2062 1:N:0:TCTCGCGC
CATCGATATGTATTTCTATTAACAAATTGCAAACATTACGATTAAATGAAAGAGTTGTGGCGTCCCTCGTTCTTGACCCGCGGACTGACTCACAGTCCCGA
These are the first few lines from my second fastq file
@SN996:194:H5V7HBCXY:1:1108:1872:2028 2:N:0:TCTCGCGC
GCCGGCGGCAGTTTGTGCATTGCTTTTGAAGTGGCAACAATTTCGCCACGATTCTCTTGGTCTTTCTTCGGTTGCTGTTGCTGGAGGAGCCTCCATTATTC
+
DDCDCIICC<ECDHHHEHIHGHEFGGHIHEHHIIIIH?GH1CHH?EGHHHCE<1D@1<<@<FEEFCF1GHHIFHC1<F<<@<E111<EEEHHIIIG1CCD1
@SN996:194:H5V7HBCXY:1:1108:1995:2062 2:N:0:TCTCGCGC
CTGACCGCAGTGAATCGGAAGGTGGCCTACGAGTACCAGTCGAATACGAAGAACGAGGCCCTCAACCAGATGAAGGAAATGCCCAACTTTATGTCGACACT
I know that the fastq files were generated from a single sample, so it would make sense that they do not contain Read Group identification because all reads belong to only a single sample. I would assume that it is fairly common to have sequencing done on a single sample and that if this information was 100% necessary to have in the header that the sequencing company would have formatted the data in such a way that it would not prevent downstream analyses. For what reason would I be getting this error in Picard? Does anyone have a suggestion on how to move past this issue?
is the space before the " @SN996" is a copy+paste problem when you' ve written the current post ? If not, this is your problem.
Yes this was just an error that I made in my post.
Illumina highseq for all your stoner sequencing!