There is some way to split a FastQ file by chromosome?
1
0
Entering edit mode
5.0 years ago

Hi folks,

I'm diving in this world now (so I'm a newbie) and I have a big issue to resolve. I need the FastQ file of each bovine (Bos taurus taurus) chromosome. In NCBI I can only download each FastA file splitted by chromosome and I don't know other database with this especific file. Two options comes to my mind:

1º - Convert the FastA to FastQ, but I read that some dates in the FastQ file doesn't exist in the FastA file and the sites and softwares can't convert my files because they are too heavy (about 100Mb).

2° - Download the SRA file from NCBI, convert to FastQ and than split (?) by chromosome, but how could I do it?

Regars from someone lost. =]

FASTQ SRA FASTA chromosome • 2.4k views
ADD COMMENT
2
Entering edit mode

1 - You can't convert FASTA to FASTQ. They don't contain the same information.

2 - If you download from SRA, that should be in FASTQ format already, but is likely not segregated by chromosome (but eukaryotes are not an area of expertise for me).

One option would be to download the reference genome, and the reads from SRA, then map the reads to the reference yourself.

Why do you need FASTQ specifically?

ADD REPLY
0
Entering edit mode

At the first moment, I was right that the file type from SRA was FastQ, but I read in some tutorial that I' d convert a SRA type file to FastQ. I made some confusion about it, lets ignore this part so.

ADD REPLY
2
Entering edit mode

I need the FastQ file of each bovine (Bos taurus taurus) chromosome.

Can you tell us why? As it stands this request does not make logical sense.

ADD REPLY
0
Entering edit mode

I'm using a software (a site, to be more specific) called RepeatExplorer and the input file required is a FastQ file.

ADD REPLY
2
Entering edit mode

Are you looking for specific repeats or just a file with masked repeats? You can download download masked genome files for cow genome from UCSC.

ADD REPLY
0
Entering edit mode

I'm drawing a probe to paint the whole chromosome. The software (RepeatExplorer) I'm using 'll provide me some sequences to use as my "primer" and the sequence required by the software is a FastQ file. How I want do the probes to each chromosome, so I need the FastQ file from each chromosome.

Feel free to sugest another way to get it. =)

Before I Forget, thank y'all for the help.

ADD REPLY
1
Entering edit mode

While one could create a fake fastq file easily, I don't think the tool you are looking at will accept that file. It is meant to be used with short next generation sequencing reads and not entire chromosomes.

ADD REPLY
0
Entering edit mode

But could I create a fake long FastQ file?. If so, I could try.

ADD REPLY
1
Entering edit mode

You can, but you shouldn't.

It wouldn't surprise me if the tool rejects that anyway, as read lengths are short.

ADD REPLY
2
Entering edit mode

Can you provide a link to the resource and point out where it specifically asks for FASTQ? That's a pretty unlikely filetype requirement for a tool which is in theory just looking for repeats.

ADD REPLY
1
Entering edit mode

Tool OP is referring to is for NGS data (not genomes). It can be found here.

ADD REPLY
1
Entering edit mode

Hmm.. Strange. I would personally have thought that short reads are not ideal for finding these types of features without first doing some assembly.

ADD REPLY
1
Entering edit mode
5.0 years ago

Convert the FastA to FastQ,

A fastq contains more information than a fasta. You can't "convert" your way into more information than you started with!

In general, no one stores giant long nucleotides with individual base quality scores. What you are asking for almost certainly does not exist.

You need to stop and think about what you are asking, and what you really need to do, because I bet you don't really need a giant fastq.

ADD COMMENT

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6