fastq to fasta conversion
2
1
Entering edit mode
6.7 years ago
monicasteffi ▴ 10

This is a very basic question. I converted fastq file to fasta using the following commend

 seqtk seq -a combined_RP.fastq.gz > RP.fasta

When I open the fasta file, I got the follwoing lines:

>E00502:101:ZA170417199:8:1101:27904:1731 1:N:0:ATGGGC
GTCNGTGAACTAGAAAATTTCTTGAAGTTGGAACCGCAAGTATTTGTTACCAATCCTCCTCAAAGTAGTATATGGCAAGAACTT....
>E00502:101:ZA170417199:8:1101:28168:1731 1:N:0:ATGAGC
TTGNAGTTTCAGTCAAAATCTAACTATTAAAATAAGGAATTTAAAACCTTACTCGCGCAGCATCCCGATCGCGGTGAGGTCAC...
>E00502:101:ZA170417199:8:1101:28716:1731 1:N:0:ATGAGC
AATNGGTTTTACTTTAATTTCTCTACTTCTATACTCTGTACATAATGTAATTAAGGGTGAATGAAGGGGTCACTAACAC....

My next step will be ab initio gene identification. Can I proceed with the same fasta file? or How do I get a fasta file with continuous stretch of sequences. Thank you in advance

fasta fastq GO • 5.0k views
ADD COMMENT
0
Entering edit mode

You know have reads, which are not the full genome fasta. You first need to perform a de novo assembly to combine all reads in one fasta genome, then you can do gene identification.

ADD REPLY
0
Entering edit mode

I have a similar issue. I used BWA to generate a mapped genome assembly and called a consensus sequence (.fq). I wanted to run QUAST on my mapped assembly, but couldn't get quast to work with my bam files so I tried to convert my fq file into a fasta file and got a similar result. Although it looks like multiple reads or sequences, I believe it may be my data mapped to scaffolds? The names (associated with '>') match the reference scaffold names. I too had hoped to get a continuous stretch of sequences to input into quast, but am not sure how...

ADD REPLY
0
Entering edit mode

and called a consensus sequence (.fq)

How?

ADD REPLY
2
Entering edit mode
6.7 years ago

Reads are short sequencing runs of chunks of the actual sequence stretch you wanted to have the sequence for. consider example for paired end reads

The middle sequence is the chunk of DNA you want to sequence. R1 and R2 are reads

R1
-------------->
--------------------------------------------------
                                    <------------- R2

Hence, first step is to assemble the reads into contigs / scaffolds which represent your assembly. Only after that you can predict the genes. Reads are too short stretches to be considered for gene prediction.

ADD COMMENT
0
Entering edit mode

Thank you for the reply. Ive run bowtie with reference. And I got sam file. Can I proceed with sam file further ?

ADD REPLY
1
Entering edit mode

It appears to me that you are new to NGS. Please confirm. That will help us to direct you

I recommend you to go through this paper

ADD REPLY
0
Entering edit mode
5.8 years ago

You have successfully converted your fastq of short reads into a fasta of short reads. Conversion is not your problem

But that's not what you want. You want to map your fastqs, which I guess you've done, and use some other software that will take all the reads and their mapping coordinates and make a consensus sequence. You need to look for software that will make a consensus sequence from your .bam

ADD COMMENT

Login before adding your answer.

Traffic: 2861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6