I downloaded data as SRA file and used fastq_dump according to Trinity recommendations.
fastq-dump --skip-technical --readids --read-filter pass --dumpbase --defline-seq '@$sn[_$rn]/$ri' --split-files ./SRR.sra
Then I run quality control with FastQC and trimmed out adapters with trimmomatic.
My headers looks like this:
head -n 8 SRR5874687.1_pass_1_trim.fastq
@/1
GACCGTAGCCGTGGTATTTACTTCACTCAAGACTGGTGTTCAATGCCAGGTGTTATGCCAGTTGCTTCAGGTGGTATTCACGTATGGCACATGCCAGCTT
+SRR5874687.1.171.1 length=100
?1BDDD8B:AC@DEA:ACHAAFH?+2?1??FD9CFH9BDGDBBDGECBDFCFHG>F=@GFFHGIGIH@AHEEHF4;@C3.>BA3AAD=5;,:>C@>><CC
@/1
CTTTTACTGAATCCATGGGGTGTTTCTTATTCTTAGCTCAAAGTCTGTACATGTTGTGCACGTGCTGAAACCGCGTGTGCCGGTTGCGCGAGTCCTCTCA
+SRR5874687.1.172.1 length=100
?@@FFADDHFFFFGECGIGI@GHIJ?HHDGH?FH?D@GGHGGGGGIIGCGHDEHIHIICFFHHICGGCDHECBBBCBBDDDDD=B>?B>B5953:>:@>:
head -n 8 SRR5874687.1_pass_2_trim.fastq
@/2
CACCGAACTGAAGACATGCGTCATCACCGAAGATTTCAACTAAAGCTGGCATGTGCCATACGTGAATACCACCTGAAGCAACTGGCATAACACCTGGCAT
+SRR5874687.1.171.2 length=100
@@@DFFDDHBFHDHGBFG@@C<@F>??CFHIH0??FFIGII<BBC@FCFCHGH.7777=D;AHEFB@?7;;>BEC;@CCCC??ACBCCCCCCC?CC@?CC
@/2
CTGGACAACGCGCCGCAATATTGCAGCTTATTAGTTTGGTGATGAGAGGACTCGCGCAACCGGCACACGCGGTTTCAGCACGTGCACAACATGTACAGAC
+SRR5874687.1.172.2 length=100
?@@FBDDDFHDHHJJJIGHIIJJGGHIGI?FH<DFHJJJCF@GHFHGHIGHHEEEDDDDDDDDDDDDDD@BBBBDDEDDDDDBDDDDDDDDDDDEEEECB
At this point Trinity had problem with empty kmer25.
Primarily I was thinking that the problem is with header position (3rd line instead of 1st), so I asked here for help. First I moved headers from third line to the first with awk proposed method and then used bbmap to add slashes (/1 and /2).
Now, headers look like this:
head -n 8 slashed_biostar_1.fastq
@SRR5874687.1.171.1 length=100 /1
GACCGTAGCCGTGGTATTTACTTCACTCAAGACTGGTGTTCAATGCCAGGTGTTATGCCAGTTGCTTCAGGTGGTATTCACGTATGGCACATGCCAGCTT
+
?1BDDD8B:AC@DEA:ACHAAFH?+2?1??FD9CFH9BDGDBBDGECBDFCFHG>F=@GFFHGIGIH@AHEEHF4;@C3.>BA3AAD=5;,:>C@>><CC
@SRR5874687.1.172.1 length=100 /1
CTTTTACTGAATCCATGGGGTGTTTCTTATTCTTAGCTCAAAGTCTGTACATGTTGTGCACGTGCTGAAACCGCGTGTGCCGGTTGCGCGAGTCCTCTCA
+
?@@FFADDHFFFFGECGIGI@GHIJ?HHDGH?FH?D@GGHGGGGGIIGCGHDEHIHIICFFHHICGGCDHECBBBCBBDDDDD=B>?B>B5953:>:@>:
head -n 8 slashed_biostar_2.fastq
@SRR5874687.1.171.2 length=100 /2
CACCGAACTGAAGACATGCGTCATCACCGAAGATTTCAACTAAAGCTGGCATGTGCCATACGTGAATACCACCTGAAGCAACTGGCATAACACCTGGCAT
+
@@@DFFDDHBFHDHGBFG@@C<@F>??CFHIH0??FFIGII<BBC@FCFCHGH.7777=D;AHEFB@?7;;>BEC;@CCCC??ACBCCCCCCC?CC@?CC
@SRR5874687.1.172.2 length=100 /2
CTGGACAACGCGCCGCAATATTGCAGCTTATTAGTTTGGTGATGAGAGGACTCGCGCAACCGGCACACGCGGTTTCAGCACGTGCACAACATGTACAGAC
+
?@@FBDDDFHDHHJJJIGHIIJJGGHIGI?FH<DFHJJJCF@GHFHGHIGHHEEEDDDDDDDDDDDDDD@BBBBDDEDDDDDBDDDDDDDDDDDEEEECB
But this time Trinity not recognizing read name formatting: [SRR5874687.1.171.2]
I have one guess, that maybe if put number of read after a space at the end it could help, like: [SRR5874687.1.171.2 1]. Somebody knows how to do it automatically?
Do you know what messed up headers? Because they didn't change after using trimmomatic (they were exactly the same as after using fastq-dump on raw data (that worked perfectly till now) ).