GEO (NCBI) confusing data
1
0
Entering edit mode
8.8 years ago
silas008 ▴ 170

Hi,

I'm have some problems with one data shared on GEO:

In this link: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19414, we can find a list of 26 samples. Clicking on one of that, for example GSM503820, which links to http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM503820, and clicking on the ftp download link, in the bottom of the page, we can download a fasta file that contains in the first line a number and in the second line a read. I think that the first line is the count of the read specified in the second line. But is confusing because, on GEO, the data is normally provided in SRA or FASTQ format.

If I'm right about this fasta file, can I convert it on fastq file?

Thank you very much

fastq fasta sra • 2.9k views
ADD COMMENT
0
Entering edit mode

A part of the fasta file:

>3
ATTGCAATGAAGTCGTCGCTCT
>4
GAGGAAGGATAAAGATAAGC
>5
GAATCATAAGACTACTAATTA
>14
CATATCAATGTCATGGAAGAA
>31
ATCATCATTCTCCTTTTTCA
>36
GAGAGCAAATTGGAGTAATCAA
ADD REPLY
0
Entering edit mode

What is it that you want to do with the data? Fastq contains quality information in addition to just sequence, so you will not be able to convert directly.

ADD REPLY
0
Entering edit mode
8.8 years ago
h.mon 35k

Why do you need to convert to fastq, can't you use fasta?

I guess qualities were not provided as typically the quality of the first 30 bases or so is high, above 30. You may convert using mock qualities, see suggestions here and here.

ADD COMMENT

Login before adding your answer.

Traffic: 2806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6