Entering edit mode
4.2 years ago
bertb
▴
20
Hello,
I am trying to check the strandedness of my RNAseq files using the how_are_we_stranded_here tool. But when I enter the command:
check_strandedness --gtf $RNA_HOME/refs/sacCer3.ncbiRefSeq.gtf --transcripts $RNA_HOME/refs/sacCer3_transcripts.fa --reads_1 $RNA_DATA_DIR/GS_UM_H_S19_L008_R1_001.fastq.gz --reads_2 $RNA_DATA_DIR/GS_UM_H_S19_L008_R2_001.fastq.gz
everything seems to process properly until it begins checking strandedness (see output below)
Results stored in: stranded_test_GS_UM_H_S19_L008_R1_001
converting gtf to bed
generating kallisto index
[build] loading fasta file /home/bioinformatics/workspace/rnaseq/refs/sacCer3_transcripts.fa
[build] k-mer length: 31
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 11222 contigs and contains 8216987 k-mers
creating fastq files with first 200000 reads
quantifying with kallisto
[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 6,125
[index] number of k-mers: 8,216,987
[index] number of equivalence classes: 7,443
Warning: 6125 transcripts were defined in GTF file, but not in the index
[quant] running in paired-end mode
[quant] will process pair 1: stranded_test_GS_UM_H_S19_L008_R1_001/GS_UM_H_S19_L008_R1_001_sample.fq
stranded_test_GS_UM_H_S19_L008_R1_001/GS_UM_H_S19_L008_R2_001_sample.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 200,000 reads, 175,255 reads pseudoaligned
[quant] estimated average fragment length: 172.246
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 565 rounds
[ bam] writing pseudoalignments to BAM format .. Segmentation fault (core dumped)
checking strandedness
stranded_test_GS_UM_H_S19_L008_R1_001/kallisto_strand_test/pseudoalignments.bam does NOT exists.
Traceback (most recent call last):
File "/home/bioinformatics/miniconda3/bin/check_strandedness", line 8, in <module>
sys.exit(main())
File "/home/bioinformatics/miniconda3/lib/python3.8/site-packages/how_are_we_stranded_here/check_strandedness.py", line 151, in main
result = pd.read_csv(test_folder + '/' + 'strandedness_check.txt', sep="\n", header=None)
File "/home/bioinformatics/miniconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/bioinformatics/miniconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 452, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/bioinformatics/miniconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 936, in __init__
self._make_engine(self.engine)
File "/home/bioinformatics/miniconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 1168, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/bioinformatics/miniconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 1998, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Thank you
You should mention that you're running an Xubuntu VM. I guess the error happens in the BAM writing stage before checking strandedness - there's a segfault happening, which means you're running out of resources (probably RAM). What are the memory footprint benchmarks for this tool that you're using?
Thank you for your reply RamRS.
The Xubuntu VM was set up with 12 GB of RAM, so perhaps this is not enough. If this is the case, is there any way around this?
I will make sure to mention I am using a Xubuntu VM in future posts.
12G is not much RAM. Can you spare 32G?
Unfortunately our current system only runs on 20G, with 12G devoted to the VM. Maybe on the next upgrade. Thanks again