rMats Run Does Not Generate More Than One Output Row Per File
1
0
Entering edit mode
14 months ago
Y ▴ 10

Using singularity I pulled the docker image from the mcfonsecalab/rmats docker. Then I tried to use it using the following script on the model organism Zebrafish in my case (paths are removed due to the rules of the HPC I use).

The variables:

  • control_bams_text_file="/path/to/reference/file/containing/3/control/bams/separated/by/comma/no/spaces"
  • experimental_bams_text_file="/path/to/reference/file/containing/3/experimental/bams/separated/by/comma/no/spaces"
  • gtf_file_path="/path/to/RefSeq/gtf/file.gtf
  • output_directory=/path/to/output/directory
  • singularity_image=/path/to/singularity/image
  • RMATS_SCRIPT path defined by mcfonseca lab on Docker "/rmats-turbo/rmats.py"
  • readLength=50 - thought is number may vary depending on needs
  • nthread=4 - though this is ideally set to at least 10 when running on more samples
  • tmp_directory is made as a subdirectory of the output directory

Script:

singularity exec "${singularity_image}" python "${RMATS_SCRIPT}" --b1 "${control_bams_text_file}" --b2 "${experimental_bams_text_file}" --gtf "${gtf_file_path}" -t paired --readLength "${readLength}" --nthread "${nthread}" --od "${output_directory}" --tmp "${tmp_directory}"

Yet when it runs it does not give any genes, the files are blank except for the column names.

Singularity rMats • 572 views
ADD COMMENT
0
Entering edit mode
14 months ago
Y ▴ 10

In case it was an issue with how I made the reference files for the experiments and controls I did the following (with paths removed due to HPC rules):

# Specify the directory path
directory_path="/the/bam/folder/path/containing/3/controls/3/experimental/bams"

# Output file name
output_file="Experiment_bam_file_paths.txt"

# Use find to locate the files and concatenate their paths with commas
find "$directory_path" -type f -name "Experimental*" | tr '\n' ',' > "$output_file"

And then a modified version of the above for controls but this did not resolve the issue. The controls were in the same directory as the experimental bams and this is the modified code (with paths removed due to HPC rules):

# Specify the directory path
directory_path="/the/bam/folder/path/containing/3/controls/3/experimental/bams"

# Output file name
output_file="Control_bam_file_paths.txt"

# Use find to locate the files and concatenate their paths with commas
find "$directory_path" -type f -name "Control*" | tr '\n' ',' > "$output_file"

Then I used these files in the script mentioned in the initial post. The out file has no error but when I cat A5SS.MATS.JCEC.txt I only have the following (which seem to be column names):

ID  GeneID  geneSymbol  chr strand  longExonStart_0base longExonEnd shortES shortEE flankingES  flankingEE  ID  IJC_SAMPLE_1    SJC_SAMPLE_1    IJC_SAMPLE_2    SJC_SAMPLE_2    IncFormLen  SkipFormLen PValue  FDR IncLevel1   IncLevel2   IncLevelDifference

This suggests to me that is is not an issue with the was the reference files are made with the paths to the control and experimental bams. There are 3 controls and 3 experimental bams. But it does not tell me why it runs without errors and does not have files that have outputs beyond the first row with column names.

ADD COMMENT

Login before adding your answer.

Traffic: 2305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6