Question

input for rMATS

1

Entering edit mode

7.2 years ago

bisht20diksha ▴ 30

i want to identify differential alternative splicing between two conditions having three replicates each, for that i m using rMATS. I have already generated genome indexes through two pass STAR mapping.

I have three bam files for three control replicates and another three for treated. shall i merge control bam files and also treated bam files and then used the command as:

 python rmats.py --b1 merged.bam --b2 merged.bam  --gtf gtfFile mygtf --bi STARindexFolder index -od outDir result -t paired -readLength ?

or shall I use

python rmats.py --b1 1.bam 2.bam 3.bam --b2 4.bam 5.bam 6.bam --gtf gtfFile mygtf --bi STARindexFolder index -od outDir result -t paired -readLength ?

Where 1.bam, 2.bam and 3.bam are bam files of controll replicates and 4.bam, 5.bam and 6.bam are bam files of treated replicates.

Also I have a confusion what readLength here means? Is length of fastq reads or something else? If former then how to choose the read length when it might be different for different samples?

Thanks

rMATS STAR RNA-Seq • 11k views

ADD COMMENT • link updated 5.3 years ago by Ömer An ▴ 270 • written 7.2 years ago by bisht20diksha ▴ 30

0

Entering edit mode

Is it mandatory that the read length should be same even when we are working with BAM files? (in case of rmats)

ADD REPLY • link 6.2 years ago by iti.gupta ▴ 10

0

Entering edit mode

That still seems to be the requirement - yes.

ADD REPLY • link 6.2 years ago by Kevin Blighe 89k

0

Entering edit mode

I've tried to run rMATs with two differents length with my BAM files (-readLength 100 and -readLength 130, my raw reads have 150bp but after trimming they drop around 130) , and I Have similar but note the same results from the 2 differents runs ..

So I don't really know what read length take, I don't want to restart the analyse and brutally trim the reads to have every reads with the same length, I found that this method generates too much information loss .. but maybe I'm wrong

ADD REPLY • link 6.2 years ago by vin.darb ▴ 300

0

Entering edit mode

Better to contact the developer. I believe there is a Google Group page where she (developer) is more active.

ADD REPLY • link 6.2 years ago by Kevin Blighe 89k

0

Entering edit mode

is it possible to compare uneven number of replicates for test and control? ex 2 rep for test vs 3 rep for control...

ADD REPLY • link 5.9 years ago by Ömer An ▴ 270

score 3 · Answer 1 · 2018-02-01

3

Entering edit mode

7.2 years ago

Kevin Blighe 89k

Regarding the input BAM files, I would follow the program documentation. So, you should have 2 text files, with the follow contents:

b1.txt
1.bam,2.bam,3.bam

b2.txt
4.bam,5.bam,6.bam

Then, run the program with:

python rmats.py --b1 b1.txt --b2 b2.txt ...

---------------------------

Regarding the readLength command line parameter, rMATS requires that all of your reads are the same length. So, you will have to perform some read trimming to a specific length on your FASTQ / FASTA input files prior to alignment with STAR. For this, you can use Trimmomatic, Trim Galore!, or something else, such as the trimFastq.py script that comes with the program (see HERE for further information).

Note that there is also specific advice from the rMATS team for using STAR output:

Q: Can I run rMATS v4.0.1 (turbo) with STAR aligner output?

A: STAR aligner performs soft clipping by default which will generate variable read lengths. You can run STAR with "--alignEndsType EndToEnd" option to suppress soft clipping.

[source: http://rnaseq-mats.sourceforge.net/faq.html]

ADD COMMENT • link 5.9 years ago by Kevin Blighe 89k

0

Entering edit mode

thanks. I have illumina paired end fastq data and I have trimmed it using trim galore. After running fastqc on trimmed data, I got sequence length of 20-51. Here, I have confusion about --read length parameter. What value should I put here?

ADD REPLY • link 7.2 years ago by bisht20diksha ▴ 30

0

Entering edit mode

Hello. All of your sequences must be the exact same length. You cannot have a range of values, like 20-51.

I would use the trimFastq.py Python script that comes with rMATS (prior to alignment) so that you have reads that are all 50bp. It is highly likely that many of your reads that are as low as 20bp are very low in frequency.

Then, when running rMATS, you would choose -readLength 50

Does that help?

ADD REPLY • link 7.2 years ago by Kevin Blighe 89k

0

Entering edit mode

It is obvious that after sequencing, the read length of all the reads would not be same and the strict option of equal readlength of all threads demands that there must be some trimming which will delete all the reads below a set limit. It definetly will make a huge impact on the outcome, since a large part of reads woud be of no use.

Also you mentioned trimFastq.py script, but there is no such script in the package.

Thanks

ADD REPLY • link 7.2 years ago by bisht20diksha ▴ 30

1

Entering edit mode

Yes, that is indeed very obvious. So, please take the complaint to the authors of the program. Regarding the missing trimFastq.py, again, that's a further complaint for the authors.

Good luck.

ADD REPLY • link 7.2 years ago by Kevin Blighe 89k

0

Entering edit mode

why aren't files in b2.txt comma separated?

ADD REPLY • link 5.9 years ago by Ömer An ▴ 270

1

Entering edit mode

They are now, Sire.

ADD REPLY • link 5.9 years ago by Kevin Blighe 89k

0

Entering edit mode

is it possible to compare uneven number of replicates for test and control? ex 2 rep for test vs 3 rep for control...

ADD REPLY • link 5.9 years ago by Ömer An ▴ 270

score 0 · Answer 2 · 2019-12-24

0

Entering edit mode

5.3 years ago

Ömer An ▴ 270

I can suggest you to try rMATS pipeline to analyse your RNA-Seq data using CSI NGS Portal.

You don't have to worry about read length this way as it is auto calculated.

ADD COMMENT • link 5.3 years ago by Ömer An ▴ 270