Entering edit mode
4.6 years ago
antgomo
▴
30
Hi all,
I converted convert paired-end fastq to fasta using fastx tooklit using fastq_to_fast
After converted, there are some reads mates that didn't complain the quality standards,here is a snippet of my FASTA file:
>A00323:108:H5W2TDSXX:4:1101:1090:32252_CAATAATAT/1
CCTGGCTAACACAGTGAAACCCTGTCTCTACTAAAAATATAAAAAATTAGCTGGGTGTGGTGGCGGGTGCCTGTAGTCCCAGCAGATCGGAAGAGCACACG
>A00323:108:H5W2TDSXX:4:1101:1090:32252_CAATAATAT/2
GCTGGGACTACAGGCACCCGCCACCACACCCAGCTAATTTTTTATATTTTTAGTAGAGACAGGGTTTCACTGTGTTAGCCAGGAGATCGGAAGAGCGTCGT
>A00323:108:H5W2TDSXX:4:1101:1145:26929_CCCCCCACA/1
TCTCTTGCTTCAGCCTGCTGAGTAGCTGGGACTACTGGCATGCACCACTACACTGGCTAATTTTTTTTTATTTTTAGTAGAAAAGATCGGAAGAGCACACG
What I want is to get the reads with both mates in and get rid of the ones without paird, in the above example, the desired output will be:
>A00323:108:H5W2TDSXX:4:1101:1090:32252_CAATAATAT/1
CCTGGCTAACACAGTGAAACCCTGTCTCTACTAAAAATATAAAAAATTAGCTGGGTGTGGTGGCGGGTGCCTGTAGTCCCAGCAGATCGGAAGAGCACACG
>A00323:108:H5W2TDSXX:4:1101:1090:32252_CAATAATAT/2
GCTGGGACTACAGGCACCCGCCACCACACCCAGCTAATTTTTTATATTTTTAGTAGAGACAGGGTTTCACTGTGTTAGCCAGGAGATCGGAAGAGCGTCGT
I am struggling with awk, but i am a newbie with, anyone has suggestions?
Thanks in advance
If you have fastq reads, please fix the issues with missing mates first with
repair.sh
from BBMap suite (Guide here). Once that is done convert properly paired reads to fasta format usingreformat.sh
from the same suite.reformat.sh in1=R1.fq.gz in2=R2.fq.gz out1=R1.fa out2=R2.fa
.Hi genomx, yes the problem is that using FASTX is giving me this kind of files because it is non-paired aware
Do you think repair.sh can deal with FASTA instead FASTQ?
Thanks
If you have fastq files please follow my advice and fix those. If plain conversion with fastx gave you these problematic files then the problem exists in original dataset and should be fixed there.
I don't know if
repair.sh
can fix fasta files since I have never had to use it for that application. BBTools programs are smart andrepair.fa
may work with fasta. If it does not then back to my original advice.