Hi all.
I am trying to align PE reads with STAR. I have trimmed files (used trimmomatic) that look like *_1.1P.fastq.gz *_1.1U.fastq.gz *_2.2P.fastq.gz *_2.2U.fastq.gz
. I only want to align the*_1.1P.fastq.gz
and*_2.2P.fastq.gz
files.
Here is my code:
#!/bin/bash
mkdir -p alignments
path='/trimmed/alignments/'
for i in $(ls | egrep '.[1|2]P.fastq.gz' | rev | cut -c 10- | rev | uniq)
do
STAR --genomeDir /indices/human --readFilesIn ${i}.fastq.gz ${i}.fastq.gz --runThreadN 8 --outFileNamePrefix /trimmed/alignments/${i%.fastq.gz}.cd177negtreg_tumor.Out --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat
done
When I run this, STAR seems to be aligning each _1 and _2 read separately instead of treating them as paired end reads. For instance, my output files look like this:
filtered.ABC4454232_1.1P.cd177negtreg_tumor.OutAligned.sortedByCoord.out.bam
filtered.ABC4454232_2.2P.cd177negtreg_tumor.OutAligned.sortedByCoord.out.bam
It should just be outputing one .bam file for this sample, such as. filtered.ABC4454232.cd177negtreg_tumor.OutAligned.sortedByCoord.out.bam
Any ideas what is wrong? Thanks!
You are instructing STAR to use the same fastq twice for a given value of
${i}
Have you looked at what values
i
is getting for each iteration of the loop?On a different note: Are you just running these scripts (or did you write them yourself)?
In general it would be simpler to do something like
ls -1 *_1.1P.fastq.gz | sed 's/_1.1P.fastq.gz//'
to grab the file names.I found the script online but modified it to try to fit my data.
echo $i
spits out values such as these:Which are the correct values, i guess i do not understand how to tell bash to differentiate between the
_1.1P
and its corresponding_2.2P
Have you tried
basename
that you were using in a different question you had asked yesterday to differentiate between the names?Yes, I thought of that but not sure how to incorporate
basename
into my existing code.