Hi,
I want to map paired-end RNA-Seq data with RUM. I have done adapter clipping before with cutadapt. Now I have two files, one for read1, one for read2. Both files contain the same number of reads, so I just have valid mate pairs. But, as I have done adapter clipping before, some reads are shorter than the original length. Therefore the file sizes of the two files are not equal. Unfortunately RUM requires the two input files to have the same file size.
Any idea for a work around? Do I have to parse through my file and fill up the too-short reads and their corresponding quality values?
I have talked to the author of RUM. He will have a look at this restriction. Meanwhile he recommended to pad the shorter reads with Ns.
best, steffi
can you point to where it says files must be the same size?
I have started the mapping with RUM. RUM produces a log file during mapping. There it says: "The forward and reverse files are different size. They should be the exact same size".
See this:
Synchronization Of Pair-End Reads
I do not have a problem with the pairing of my reads. This I have worked out. I do not want to delete all reads where I have found an adapter. I just want to use the shorter reads then. So I guess I will have to write a script to fill them up to the original length.
I haven't looked at the source code, but I bet that it would be fairly easy to remove the code implementing that check.