Question

RNASeq_pipeline.config.sh of HISAT

0

Entering edit mode

7.0 years ago

mlemusfuentes ▴ 20

Hello everyone I wanted to make a query regarding the use of the script available in the tutorial version of hisat (transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown). The script is named rnaseq_pipeline.config.sh. in the final part the addition of the samples in the pipeline must be done

the following appears

## list of samples
## (ony wall reads, must follow _1. * / _ 2. * file name convention)
reads1 = ($ {FASTQLOC} / * _ 1. *)

reads1 = ("$ {reads1 [@] ## * /}")

reads2 = ("$ {reads1 [@] / _ 1./_2.}")

my question

How do I add the information at this point? I do not understand if it is that I should enter in each information or if it is enough to use only one of them

thank you

RNA-Seq • 1.6k views

ADD COMMENT • link 7.0 years ago by mlemusfuentes ▴ 20

0

Entering edit mode

-Configuration file for rnaseq_pipeline.sh

-Place this script in a working directory and edit it accordingly.

-The default configuration assumes that the user unpacked the 
-chrX_data.tar.gz file in the current directory, so all the input
-files can be found in a ./chrX_data sub-directory

-how many CPUs to use on the current machine?

    NUMCPUS=4

-Program paths

-optional BINDIR, using it here because these programs are installed in a common directory

    BINDIR=/bin
    HISAT2=$BINDIR/hisat2
    STRINGTIE=$BINDIR/stringtie
    SAMTOOLS=$BINDIR/samtools

-if these programs are not in any PATH directories, please edit accordingly:

    HISAT2=$(/bin/hisat2)
    STRINGTIE=$(/bin/stringtie)
    SAMTOOLS=$(/bin/samtools_0.1.18)

-File paths for input data
-Full absolute paths are strongly recommended here.
-Warning: if using relatives paths here, these will be interpreted 
-relative to the  chosen output directory (which is generally the 
-working directory where this script is, unless the optional <output_dir>
-parameter is provided to the main pipeline script)

-Optional base directory, if most of the input files have a common path
-BASEDIR="/home/lgts/workspace/rnaseq"
-BASEDIR=$/home/lgts/workspace/rnaseq/

-FASTQLOC="$BASEDIR/data/samples/KG-1"
-GENOMEIDX="$BASEDIR/data/hisat/indexes/genome"
-GTFFILE="$BASEDIRdata/hisat/genes/hg19_genes.gtf"
-PHENODATA="$BASEDIR/geuvadis_phenodata.csv"

-TEMPLOC="./tmp" #this will be relative to the output directory

list of samples 
(only paired reads, must follow _1.*/_2.* file naming convention)
reads1=(${FASTQLOC}/*_1.*)
reads1=("${reads1[@]##*/}")
reads2=("${reads1[@]/_1./_2.}")

this is script for configuration.

ADD REPLY • link updated 7.0 years ago by GenoMax 153k • written 7.0 years ago by mlemusfuentes ▴ 20

1

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY • link 7.0 years ago by GenoMax 153k

0

Entering edit mode

samtools v.0.1.18 is an ancient version. There is no reason to use it. You should upgrade to the latest samtools.

ADD REPLY • link 7.0 years ago by GenoMax 153k

score 0 · Answer 1 · 2018-08-23

How do I add the information at this point? I do not understand if it is that I should enter in each information or if it is enough to use only one of them

How could we know, as we haven't seen the script - you didn't tell us where the script is from. My guesses are either you have to place the script on the same folder your fastq files are, or you have to pass the folder to the script as a command line argument.

How are your fastq files named? The script hard-codes a fastq naming convention, so regular-named Illumina files (something_R1_001.fastq.gz) would not work.