i want to identify differential alternative splicing between two conditions having three replicates each, for that i m using rMATS. I have already generated genome indexes through two pass STAR mapping.
I have three bam files for three control replicates and another three for treated. shall i merge control bam files and also treated bam files and then used the command as:
python rmats.py --b1 merged.bam --b2 merged.bam --gtf gtfFile mygtf --bi STARindexFolder index -od outDir result -t paired -readLength ?
or shall I use
python rmats.py --b1 1.bam 2.bam 3.bam --b2 4.bam 5.bam 6.bam --gtf gtfFile mygtf --bi STARindexFolder index -od outDir result -t paired -readLength ?
Where 1.bam, 2.bam and 3.bam are bam files of controll replicates and 4.bam, 5.bam and 6.bam are bam files of treated replicates.
Also I have a confusion what readLength here means? Is length of fastq reads or something else? If former then how to choose the read length when it might be different for different samples?
Thanks
Is it mandatory that the read length should be same even when we are working with BAM files? (in case of rmats)
That still seems to be the requirement - yes.
I've tried to run rMATs with two differents length with my BAM files (-readLength 100 and -readLength 130, my raw reads have 150bp but after trimming they drop around 130) , and I Have similar but note the same results from the 2 differents runs ..
So I don't really know what read length take, I don't want to restart the analyse and brutally trim the reads to have every reads with the same length, I found that this method generates too much information loss .. but maybe I'm wrong
Better to contact the developer. I believe there is a Google Group page where she (developer) is more active.
is it possible to compare uneven number of replicates for test and control? ex 2 rep for test vs 3 rep for control...