I want to convert a paired SRA file to FASTA file to do some downstream analysis. For example the following link that is a metagenomics wgs paired sequencing: http://www.ncbi.nlm.nih.gov/sra/?term=ERR1018199 I know I can use fastq-dump and I did same as bellow:
./fastq-dump myfile.sra --fasta
and it gives me:
>ERR1018199.1 FCD0JMKACXX:8:1101:1051:3172#ACTTGAAT length=200
GGAAGCGGTGTTCTGTTTCTCCTTCCATATATTTCCACAGACTGGCATATTCTTCTTTCTTCTCGCCCAT
TTCCTGCAATTGCAGCATATATTGCCAGATGATGAAAAAAATATACATCAAGGAAAACAAGTCTATCTGG
CAATATATGCTGCAATTGCAGGAAATGGGCGAGAAGAAAGAAGAATATGCCAGTCTGTGG
>ERR1018199.2 FCD0JMKACXX:8:1101:1348:3168#ACTTGAAT length=200
CTTGCAGATTCTACAAAAAGAGTGTTTCATAAACTGGTCTATCAAAAGAAAGGTTAAACTCAGTGAGTTG
AACCCACACATCACAAAGTAGCTTCTGAGATATGTGGGTAATATCTGTATGGATGTTTGTATGATTGATG
TTACTGACATTGATTGCAAAGAAGGCGACAGCGTTGAGATTTTCGGAGATCATCTTCCTA
>ERR1018199.3 FCD0JMKACXX:8:1101:1451:3174#ACTTGAAT length=199
GTATCATGACCGGTCGTTCGGGCAACAACATTTGGTGTATCAGTCCGATGTTCGACCTCAACAAACCGAC
and if I use --split-files option. It gives me:
ERR1018199_1.fasta:
>ERR1018199.1 FCD0JMKACXX:8:1101:1051:3172#ACTTGAAT length=100
GGAAGCGGTGTTCTGTTTCTCCTTCCATATATTTCCACAGACTGGCATATTCTTCTTTCTTCTCGCCCAT
TTCCTGCAATTGCAGCATATATTGCCAGAT
>ERR1018199.2 FCD0JMKACXX:8:1101:1348:3168#ACTTGAAT length=100
CTTGCAGATTCTACAAAAAGAGTGTTTCATAAACTGGTCTATCAAAAGAAAGGTTAAACTCAGTGAGTTG
AACCCACACATCACAAAGTAGCTTCTGAGA
>ERR1018199.3 FCD0JMKACXX:8:1101:1451:3174#ACTTGAAT length=100
GTATCATGACCGGTCGTTCGGGCAACAACATTTGGTGTATCAGTCCGATGTTCGACCTCAACAAACCGAC
TATTTTACTAAATTCCCACATCGATACCGT
>ERR1018199.4 FCD0JMKACXX:8:1101:1254:3223#ACTTGAAT length=100
ERR1018199_2.fasta:
>ERR1018199.1 FCD0JMKACXX:8:1101:1051:3172#ACTTGAAT length=100
GATGAAAAAAATATACATCAAGGAAAACAAGTCTATCTGGCAATATATGCTGCAATTGCAGGAAATGGGC
GAGAAGAAAGAAGAATATGCCAGTCTGTGG
>ERR1018199.2 FCD0JMKACXX:8:1101:1348:3168#ACTTGAAT length=100
TATGTGGGTAATATCTGTATGGATGTTTGTATGATTGATGTTACTGACATTGATTGCAAAGAAGGCGACA
GCGTTGAGATTTTCGGAGATCATCTTCCTA
>ERR1018199.3 FCD0JMKACXX:8:1101:1451:3174#ACTTGAAT length=99
GGGAGGTAGTTGGGGCAATACGCTTTCTATTCCGTTTTTTCCGGAAACTTCTTCTTCACATGAGGCAAGA
AAGATCAGATTGTATGCCTGCTCGGTTAT
>ERR1018199.4 FCD0JMKACXX:8:1101:1254:3223#ACTTGAAT length=100
My questions is, the first command simply concatenate the two reads with same read number. is it true? Can I use that for further analysis (for example assemble) or I have to use --split-files to produce two different files and apply my pipeline on those files?
Or more simpler can we say that we have the following sequence in our (meta)genome? or not?
>ERR1018199.1 FCD0JMKACXX:8:1101:1051:3172#ACTTGAAT length=200
GGAAGCGGTGTTCTGTTTCTCCTTCCATATATTTCCACAGACTGGCATATTCTTCTTTCTTCTCGCCCAT
TTCCTGCAATTGCAGCATATATTGCCAGATGATGAAAAAAATATACATCAAGGAAAACAAGTCTATCTGG
CAATATATGCTGCAATTGCAGGAAATGGGCGAGAAGAAAGAAGAATATGCCAGTCTGTGG