HI,
I found some rnaseq in encode were sequenced in paired-end but were combined in just one file like this:
@ERR030882.73513047 HWI-BRUNOP16X_0001:3:67:4997:91733#0 length=100
GTTAGGGAGGTTATGGAGGTTAGGGAGGTTATGGAGGTTATGGAGGTTAGCCTCGGTCTCCACCATAGCCTCCACCTCGGTCTCCTCCATAGCCTCCTCG
+ERR030882.73513047 HWI-BRUNOP16X_0001:3:67:4997:91733#0 length=100
HHHHHHHHHHHHHHHHHHHDFBFFFEHHHHDGGG?HHDHAGFG7GC9C8:HHHHHHHHHHHHHHHIHHHHHHHHHHHHHGHHHHHHHHFHHHHHHHHHHD
They sequenced as 2X50bp but combine both end in one file. How to handle this type of data. Should I treat them as single end in mapping?
Thanks a lot !
you are only showing a single record above
This is the only record. The description is: 50bpPEmRNASeqFCAs51sequence, 2x50 paired end mRNA-seq READ1, ~ 0.5% phiX DNA spiked in, Performer: ILLUMINA-CA
I'd double-check your metadata. It would be very hard to create a single fastq file from two paired-end files. If these are publicly available data, can you share the links to the metadata and the files that you believe are incorrect?
It's GSM759515 in GEO. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM759515
Thanks
One can obtain fastq files directly from EBI ENA. The accession number for the sequencing data is ERX011212. A search for that at the EBI ENA website will lead to http://www.ebi.ac.uk/ena/data/view/ERX011212 where you can download the fastq files directly. Note that SRA and ENA mirror one another, so fastq files for SRA can almost always be obtained through going to EBI.
thanks a lot...