Hi all,
I have a single fasta file with paired-end reads intended for mitochondrial SAGE de novo assembly [String-overlap Assembly of Genomes, not Serial Analysis of Gene Expression]. I've gotten it through the correction software RACER already, but there are some lingering format issues I need to clear up to run SAGE.
Unix/perl solutions preferred.
(1) Remove all reads that aren't 90 bases long (discard or write into new file)
(2) Remove unpaired reads - i.e., remove those reads for which the ID does not exactly match any other ID in the file (discard or write into new file)
(3) Reorder reads alphabetically so the forward and reverse reads are interleaved
Sorry to post a multi-part problem, but I think it's a set of simple tasks that I can't find leads for in other posts. Help on one or more task would be greatly appreciated.