Mapping Paired-End Rna-Seq Data With Rum
2
1
Entering edit mode
12.8 years ago
Steffi ▴ 580

Hi,

I want to map paired-end RNA-Seq data with RUM. I have done adapter clipping before with cutadapt. Now I have two files, one for read1, one for read2. Both files contain the same number of reads, so I just have valid mate pairs. But, as I have done adapter clipping before, some reads are shorter than the original length. Therefore the file sizes of the two files are not equal. Unfortunately RUM requires the two input files to have the same file size.

Any idea for a work around? Do I have to parse through my file and fill up the too-short reads and their corresponding quality values?

I have talked to the author of RUM. He will have a look at this restriction. Meanwhile he recommended to pad the shorter reads with Ns.

best, steffi

rna adaptor • 4.0k views
ADD COMMENT
0
Entering edit mode

can you point to where it says files must be the same size?

ADD REPLY
0
Entering edit mode

I have started the mapping with RUM. RUM produces a log file during mapping. There it says: "The forward and reverse files are different size. They should be the exact same size".

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I do not have a problem with the pairing of my reads. This I have worked out. I do not want to delete all reads where I have found an adapter. I just want to use the shorter reads then. So I guess I will have to write a script to fill them up to the original length.

ADD REPLY
0
Entering edit mode

I haven't looked at the source code, but I bet that it would be fairly easy to remove the code implementing that check.

ADD REPLY
0
Entering edit mode
12.8 years ago

That is a poorly written error on RUM's part, or it is being absurdly strict. Why would it need the exact number of nucleotides for each pair of sequences?

cutadapt is pair-safe - i.e. it will not create widows and orphans unless you want it to

[leipzig@localhost testpairsafety]$ cat pair2.fq
@HWI-ST431_52:1:1:1259:1981/1
ATCTCGTATGCCGTCTTCTGCTTG
+
b`ZUYZKYUSV[[_[cad\\W\[X
[leipzig@localhost testpairsafety]$ cutadapt -a ATCTCGTATGCCGTCTTCTGCTTG pair2.fq > pair2.trimmed.fq
cutadapt version 0.9.5
Command line parameters: -a ATCTCGTATGCCGTCTTCTGCTTG pair2.fq
Maximum error rate: 10.00%
   Processed reads: 1
     Trimmed reads: 1 (100.0%)
   Too short reads: 0 (  0.0% of processed reads)
    Too long reads: 0 (  0.0% of processed reads)
        Total time:      0.00 s
     Time per read:      0.00 ms

=== Adapter 1 ===

Adapter 'ATCTCGTATGCCGTCTTCTGCTTG', length 24, was trimmed 1 times.

Histogram of adapter lengths
length  count
24  1

[leipzig@localhost testpairsafety]$ cat pair2.trimmed.fq 
@HWI-ST431_52:1:1:1259:1981/1

+

[leipzig@localhost testpairsafety]$

the question is whether RUM will accept empty sequences. If not, you might have to substitute a single "N" of low quality for those.

ADD COMMENT
0
Entering edit mode

RUM needs exactly the same file size. So this would result in substituting N's for all deleted characters from a read. Does anybody know any toolkit for doing that?

ADD REPLY

Login before adding your answer.

Traffic: 2108 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6