Question

featureCounts segmentation fault

0

Entering edit mode

7 months ago

Jaber ▴ 30

Hi, community,

I am trying to run a bash file with the following, and everything works fine except I keep getting segmentation faults, Can anyone help me solve this?

Below is the bash script:

 #!/bin/bash

SECONDS=0

# Change working directory

cd "/mnt/e/Genomic learning/pipeline/RNA_Seq_pipeline" 

# STEP 1: Run FastQC

 fastqc Data/demo.fastq -o Data/

# Run Trimmomatic to trim reads with poor quality

 java -jar /mnt/f/Trimmomatic-0.39/trimmomatic-0.39.jar SE -threads 4 
 /mnt/e/Genomic/learning/pipeline/RNA_Seq_pipeline/data/demo.fastq 
 /mnt/e/Genomic/learning/pipeline/RNA_Seq_pipeline/data/demo_trimmed.fastq TRAILING:10 -phred33
echo "Trimmomatic finished running!"

fastqc Data/demo_trimmed.fastq -o Data/

# STEP 2: Run HISAT2

# mkdir HISAT2
# Get the genome indices

wget https://genome-idx.s3.amazonaws.com/hisat/grch38_genome.tar.gz

# Run alignment

 echo "HISAT2 started running!"
 hisat2 -q --rna-strandness R -x HISAT2/grch38/genome -U Data/demo_trimmed.fastq | samtools sort -o HISAT2/demo_trimmed.bam
 echo "HISAT2 finished running!"

# STEP 3: Run featureCounts - Quantification

# Get GTF

wget http://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz

 echo "featureCounts started running!"
 featureCounts -s 2 -a ../hg38/Homo_sapiens.GRCh38.106.gtf -o quants/demo_featurecounts.txt 
 HISAT2/demo_trimmed.bam
 echo "featureCounts finished running!"

 duration=$SECONDS
 echo "$((duration / 60)) minutes and $((duration % 60)) seconds elapsed."

Keeping in mind I have correct directories and I evaluated the HISAT bam file with samtools and no errors appeared.

The error: "Segmentation fault" after running the feature counts, and everything else runs smoothly

thanks

featureCounts pipeline RNA-Seq • 1.6k views

ADD COMMENT • link 7 months ago by Jaber ▴ 30

0

Entering edit mode

Thank you, I already extracted manually, as I am using WSL, and directories are correct, and I still get the error saying "Segmentation fault"

ADD REPLY • link 7 months ago by Jaber ▴ 30

0

Entering edit mode

Can you try running each part/command manually on the command line (instead of the bash script) to see if there might be an error/warning that gets missed in one of the previous steps which could effect the featureCounts run?

ADD REPLY • link 7 months ago by DGTool ▴ 290

0

Entering edit mode

Please show relevant code. This is almost certainly some issue with the GTF.

ADD REPLY • link 7 months ago by ATpoint 88k

0

Entering edit mode

I tried running htseq-counts and it worked, confirming the integrity of the bam file and the GTF file, featureCounts still yield the same thing "Segmentation fault"

And I've run each command separately and everything was seamless, except the featureCounts command.

 /mnt/e/Genomic learning/pipeline/RNA_Seq_pipeline$ htseq-count -f bam -r pos -s reverse -i gene_id HISAT2/demo_trimmed_sorted.bam ./hg38/Homo_sapiens.GRCh38.106.gtf > demo_counts.txt
# output
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2300000 GFF lines processed.
2400000 GFF lines processed.
2500000 GFF lines processed.
2600000 GFF lines processed.
2700000 GFF lines processed.
2800000 GFF lines processed.
2900000 GFF lines processed.
3000000 GFF lines processed.
3100000 GFF lines processed.
3200000 GFF lines processed.
3279405 GFF lines processed.
100000 alignment records processed.
200000 alignment records processed.
300000 alignment records processed.
400000 alignment records processed.
500000 alignment records processed.
600000 alignment records processed.
700000 alignment records processed.
800000 alignment records processed.
900000 alignment records processed.
1000000 alignment records processed.
1100000 alignment records processed.
1200000 alignment records processed.
1300000 alignment records processed.
1400000 alignment records processed.
1430243 alignment records processed.

ADD REPLY • link 7 months ago by Jaber ▴ 30

0

Entering edit mode

/mnt/e/Genomic learning/pipeline/RNA_Seq_pipeline$ featureCounts -s 2 -a ./hg38/Homo_sapiens.GRCh38.106.gtf -o quants/demo_featurecounts.txt HISAT2/demo_trimmed_sorted.bam

#Output

Segmentation fault

ADD REPLY • link updated 7 months ago by GenoMax 151k • written 7 months ago by Jaber ▴ 30

0

Entering edit mode

How did you obtain the featureCounts executable? Does it work ok with:

featureCounts  --help 
featureCounts  -v

Also: which version are you using? On which system?

ADD REPLY • link 7 months ago by Darked89 4.7k

0

Entering edit mode

it works pretty well for the commands you've mentioned,

I am using WSL (ubuntu)

here is the command

featureCounts

output

featureCounts

Version 2.0.6

Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...

## Mandatory arguments:

  -a <string>         Name of an annotation file. GTF/GFF format by default. See
                      -F option for more format information. Inbuilt annotations
                      (SAF format) is available in 'annotation' directory of the
                      package. Gzipped file is also accepted.

  -o <string>         Name of output file including read counts. A separate file
                      including summary statistics of counting results is also
                      included in the output ('<string>.summary'). Both files
                      are in tab delimited format.

  input_file1 [input_file2] ...   A list of SAM or BAM format files. They can be
                      either name or location sorted. If no files provided,
                      <stdin> input is expected. Location-sorted paired-end reads
                      are automatically sorted by read names.

## Optional arguments:
# Annotation

  -F <string>         Specify format of the provided annotation file. Acceptable
                      formats include 'GTF' (or compatible GFF format) and
                      'SAF'. 'GTF' by default.  For SAF format, please refer to
                      Users Guide.

  -t <string>         Specify feature type(s) in a GTF annotation. If multiple
                      types are provided, they should be separated by ',' with
                      no space in between. 'exon' by default. Rows in the
                      annotation with a matched feature will be extracted and
                      used for read mapping.

  -g <string>         Specify attribute type in GTF annotation. 'gene_id' by
                      default. Meta-features used for read counting will be
                      extracted from annotation using the provided value.

  --extraAttributes   Extract extra attribute types from the provided GTF
                      annotation and include them in the counting output. These
                      attribute types will not be used to group features. If
                      more than one attribute type is provided they should be
                      separated by comma.

  -A <string>         Provide a chromosome name alias file to match chr names in
                      annotation with those in the reads. This should be a two-
                      column comma-delimited text file. Its first column should
                      include chr names in the annotation and its second column
                      should include chr names in the reads. Chr names are case
                      sensitive. No column header should be included in the
                      file.

# Level of summarization

  -f                  Perform read counting at feature level (eg. counting
                      reads for exons rather than genes).

# Overlap between reads and features

  -O                  Assign reads to all their overlapping meta-features (or
                      features if -f is specified).

  --minOverlap <int>  Minimum number of overlapping bases in a read that is
                      required for read assignment. 1 by default. Number of
                      overlapping bases is counted from both reads if paired
                      end. If a negative value is provided, then a gap of up
                      to specified size will be allowed between read and the
                      feature that the read is assigned to.

  --fracOverlap <float> Minimum fraction of overlapping bases in a read that is
                      required for read assignment. Value should be within range
                      [0,1]. 0 by default. Number of overlapping bases is
                      counted from both reads if paired end. Both this option
                      and '--minOverlap' option need to be satisfied for read
                      assignment.

  --fracOverlapFeature <float> Minimum fraction of overlapping bases in a
                      feature that is required for read assignment. Value
                      should be within range [0,1]. 0 by default.

  --largestOverlap    Assign reads to a meta-feature/feature that has the
                      largest number of overlapping bases.

  --nonOverlap <int>  Maximum number of non-overlapping bases in a read (or a
                      read pair) that is allowed when being assigned to a
                      feature. No limit is set by default.

  --nonOverlapFeature <int> Maximum number of non-overlapping bases in a feature
                      that is allowed in read assignment. No limit is set by
                      default.

  --readExtension5 <int> Reads are extended upstream by <int> bases from their
                      5' end.

  --readExtension3 <int> Reads are extended upstream by <int> bases from their
                      3' end.

  --read2pos <5:3>    Reduce reads to their 5' most base or 3' most base. Read
                      counting is then performed based on the single base the
                      read is reduced to.

# Multi-mapping reads

  -M                  Multi-mapping reads will also be counted. For a multi-
                      mapping read, all its reported alignments will be
                      counted. The 'NH' tag in BAM/SAM input is used to detect
                      multi-mapping reads.

# Fractional counting

  --fraction          Assign fractional counts to features. This option must
                      be used together with '-M' or '-O' or both. When '-M' is
                      specified, each reported alignment from a multi-mapping
                      read (identified via 'NH' tag) will carry a fractional
                      count of 1/x, instead of 1 (one), where x is the total
                      number of alignments reported for the same read. When '-O'
                      is specified, each overlapping feature will receive a
                      fractional count of 1/y, where y is the total number of
                      features overlapping with the read. When both '-M' and
                      '-O' are specified, each alignment will carry a fractional
                      count of 1/(x*y).

# Read filtering

  -Q <int>            The minimum mapping quality score a read must satisfy in
                      order to be counted. For paired-end reads, at least one
                      end should satisfy this criteria. 0 by default.

  --splitOnly         Count split alignments only (ie. alignments with CIGAR
                      string containing 'N'). An example of split alignments is
                      exon-spanning reads in RNA-seq data.

  --nonSplitOnly      If specified, only non-split alignments (CIGAR strings do
                      not contain letter 'N') will be counted. All the other
                      alignments will be ignored.

  --primary           Count primary alignments only. Primary alignments are
                      identified using bit 0x100 in SAM/BAM FLAG field.

  --ignoreDup         Ignore duplicate reads in read counting. Duplicate reads
                      are identified using bit Ox400 in BAM/SAM FLAG field. The
                      whole read pair is ignored if one of the reads is a
                      duplicate read for paired end data.

# Strandness

  -s <int or string>  Perform strand-specific read counting. A single integer
                      value (applied to all input files) or a string of comma-
                      separated values (applied to each corresponding input
                      file) should be provided. Possible values include:
                      0 (unstranded), 1 (stranded) and 2 (reversely stranded).
                      Default value is 0 (ie. unstranded read counting carried
                      out for all input files).

and the rest of the output

I've downloaded the subread from their original website, and I've put the binaries in the PATH

ADD REPLY • link 7 months ago by Jaber ▴ 30

0

Entering edit mode

You may not have enough RAM assigned to your WSL See: https://learn.microsoft.com/en-us/answers/questions/1296124/how-to-increase-memory-and-cpu-limits-for-wsl2-win and check if that helps.

ADD REPLY • link 7 months ago by GenoMax 151k

0

Entering edit mode

Thank you, it helped me, even it didn't work, so I deleted the subread and reinstalled it using this, and it finally worked

sudo apt install subread

ADD REPLY • link 7 months ago by Jaber ▴ 30

0

Entering edit mode

Jaber : Do not use > (quote) for showing code. Instead use 101010 button to format the code portion with a monospaced font. I have done it for you this time.

ADD REPLY • link 7 months ago by GenoMax 151k

score 3 · Answer 1 · 2024-09-30

3

Entering edit mode

7 months ago

Mensur Dlakic ★ 29k

It is very difficult to follow all the lines as your script content are not properly formatted.

It appears that in one command you are downloading the .GTF file into a current directory (wget http://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz) and in the next command you are reading the unpacked GTF file located one directory up (featureCounts -s 2 -a ../hg38/Homo_sapiens.GRCh38.106.gtf ...). You may need to unpack the file after the wget command (gunzip Homo_sapiens.GRCh38.106.gtf.gz) and read from the current directory:

featureCounts -s 2 -a hg38/Homo_sapiens.GRCh38.106.gtf ...

ADD COMMENT • link 7 months ago by Mensur Dlakic ★ 29k

0

Entering edit mode

featureCounts will also accept bgzip-ed and tabix indexed GTF. It may be a bit faster than using plain GTF but frankly I have not done a proper benchmarking.