Paired end data loaded into geo as single end runs ... How to extraxct the data
2
1
Entering edit mode
9.9 years ago
ChIP ▴ 600

Hi,

This sample run http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM835232 at geo has the sra files in bit different manner submitted.

Such that the mate pairs are loaded to SRA as single end runs resulting in two files per sample.

My problem is, how can I get proper fastq files from these two SRA files.

I tried

fastq-dump -A SRR364680.sra
fastq-dump -A SRR384964.sra

and after that bowtie, but it doesn't work. Has anybody ever dealt with such a data, if yes how can I proceed to get unaligned FAstq files that can be used for alignment.

Here is the head of the two fastqs

file 1

@SRR364680.sra.1 SFGF-GA2-1_63:2:112:1559:999 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR364680.sra.1 SFGF-GA2-1_63:2:112:1559:999 length=80
################################################################################
@SRR364680.sra.2 SFGF-GA2-1_63:2:112:9048:999 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR364680.sra.2 SFGF-GA2-1_63:2:112:9048:999 length=80
################################################################################
@SRR364680.sra.3 SFGF-GA2-1_63:2:112:10809:999 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

File 2

@SRR384964.sra.1 SFGF-GA2-1_63:2:14:1899:1000 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR384964.sra.1 SFGF-GA2-1_63:2:14:1899:1000 length=80
################################################################################
@SRR384964.sra.2 SFGF-GA2-1_63:2:14:11711:999 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR384964.sra.2 SFGF-GA2-1_63:2:14:11711:999 length=80
################################################################################
@SRR384964.sra.3 SFGF-GA2-1_63:2:14:13989:1000 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Thank you

Samtools RNA-Seq Picard • 3.5k views
ADD COMMENT
0
Entering edit mode

Can you post the first 5 read names from two fastq files ? What's the error with bowtie ?

ADD REPLY
0
Entering edit mode

Please check the updated question.

ADD REPLY
0
Entering edit mode

I am wondering, if it's paired end data, it will have the read1 and read2 information in the read name (like #1,#2 or /1, /2 etc) to distinguish the read pairs. But I don't see them here. The read pairs (R1 and R2) should be in same order for alignment.

ADD REPLY
1
Entering edit mode
9.9 years ago

I wonder if this is what you're dealing with? How To Convert Sra-Lite Paired-End Submission To Fastq?

ADD COMMENT
0
Entering edit mode

I don't think it is, since the submitter at GEO submitted and loaded mate pairs to SRA as single end runs resulting in two files per sample. The split utility of fastq-dump is not helping in this case.

ADD REPLY
0
Entering edit mode
9.5 years ago
skittely • 0

Based on the original geo page for the .sra reads, it seems the reads are single end rather than paired. http://www.ncbi.nlm.nih.gov/sra?term=SRX105932 - if you see the Library section and click on "more..." then you can see the layout of the library. My guess is the two separate files are some sort of replicate.

ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6