Question

input file in rmats

1

Entering edit mode

6 months ago

Lambodarswain316 ▴ 10

I am currently engaged in transcriptomics analysis within flowering plants, specifically focusing on detecting alternative splicing patterns. For this purpose, I am utilizing the rMATS software after ensuring all necessary dependencies are properly addressed. However, I am encountering challenges with organizing the input files, particularly with respect to arranging the BAM files in the accompanying text file.

As part of my analysis, I possess BAM files corresponding to different species along with their respective GTF files. My primary concern lies in structuring the input file, particularly regarding the appropriate arrangement of BAM files within a text file. Moreover, I seek clarification regarding the utilization of the "b1" and "b2" parameters during runtime, especially when dealing with a single species and its associated GTF file.

I would greatly appreciate guidance on formulating the correct input file format to seamlessly integrate multiple species' data during the execution of the rMATS software.

rmats • 493 views

ADD COMMENT • link updated 6 months ago by Mathew ▴ 180 • written 6 months ago by Lambodarswain316 ▴ 10

score 0 · Answer 1 · 2024-05-12

Hi,

In the example they give in their GitHub (https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/README.md):

They have 2 sample groups with 2 BAM files per group. They create txt files that will be used to pass this grouping of inputs to rMATS. The expected format is , to separate replicates.

Name of txt file: /path/to/b1.txt

Contents of txt file: /path/to/1_1.bam,/path/to/1_2.bam

and

Name of txt file: /path/to/b2.txt

Contents of txt file: /path/to/2_1.bam,/path/to/2_2.bam

I am not sure whether they actually require you to name the txt file with the actual path. You can try just naming them b1 and b2, and using the pathway when you use rMATS. You can use this next step to get the pathway, either way:

So, here you will make two text files. First, start off by creating a txt file called "b1". Then, make a second txt file called "b2". As an example, I'll create two txt files with these names in a new folder called "flowering_plants_transcriptomics" in my Documents folder of my computer. Now, if you right click on one of the txt files and go to "Properties", you can find the location of this file in your computer:

enter image description here

Now, copy and paste this path into the name of your txt file. Mine would be C/Users/mpeko/Documents/flowering_plant_transcriptomics/b1.txt. Repeat this process for your b2.

Within each txt file, you will similarly give the paths of your associated BAM files, with a "," to separate them.

You will use these txt files as your input for rMATS:

python rmats.py --b1 /path/to/b1.txt --b2 /path/to/b2.txt --gtf /path/to/the.gtf -t paired --readLength 50 --nthread 4 --od /path/to/output --tmp /path/to/tmp_output

From https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/README.md#all-arguments to answer your next question about b1 and b2 parameters:

b1 (which is the first txt file we created):

A text file containing a comma separated list of the BAM files for sample_1. (Only if using BAM)

b2 (which is the second txt file we created):

A text file containing a comma separated list of the BAM files for sample_2. (Only if using BAM)

This last link I just gave has several tips by the package developers, and they also include a sample data set that might be helpful for you to practice on before you start your actual analysis: https://sourceforge.net/projects/rnaseq-mats/files/MATS/testData.tgz/download.