Hello,
I ran STAR with the command:
STAR --runThreadN ${SLURM_NPROCS:-1} \\
--genomeDir \$STAR_HG19_GENOME \\
--readFilesIn ${workingDir}/$Read1 ${workingDir}/$Read2 \\
--outSAMunmapped Within \\
--outReadsUnmapped Fastx \\
--outFileNamePrefix ${workingDir}/${outputFileLoc}/${sample}
Since the output SAM file should include both mapped and unmapped transcripts within the file (--outSAMunmapped Within), I tried extracting the mapped and unmapped reads using these commands:
samtools view -F4 sample.bam > sample.mapped.sam
samtools view -f4 sample.bam > sample.unmapped.sam
Here is a sample of what the output files looked like.
head sample.mapped.sam
NS500127:25:HJM5YBGX2:1:11101:20416:1050 163 chr1 76779521 255 1S41M33S = 76779576 130 NCAGCGTTCCTTTTCCNGCTGGNTNTGCNTNNNNTNAANNAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN #AAAAEEEEEEEEEAE#EEEEE#E#EEE#E####E#EE##EE################################# NH:i:1 HI:i:1 AS:i:102 nM:i:0
NS500127:25:HJM5YBGX2:1:11101:20416:1050 83 chr1 76779576 255 75M = 76779521 -130 GCTACTAAACTGCTTTGGACAACCTGGTACAAAGTGGATACCATTCTCCTACACATACAGGCGGCCCCTNCGAAC E6EEEEAAEEE6EEEE6EEAEE/EEAEEEEEEEAE<EEEE6EEEEEEEEEEEEEEEEEEEEEE/EEA6E#AAAA/ NH:i:1 HI:i:1 AS:i:102 nM:i:0
tail sample.unmapped.sam
NS500127:25:HJM5YBGX2:4:23612:23666:18750 77 * 0 0 * * 0 0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEA/EE NH:i:0 HI:i:0 AS:i:96 nM:i:0 uT:A:1
NS500127:25:HJM5YBGX2:4:23612:23666:18750 141 * 0 0 * * 0 0 CCACCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTTTTTTT A6/AA///E6/EE/E///E//EEEEE//AAA/AAEEEEEA//////////A/////////<E/////A///////A NH:i:0 HI:i:0 AS:i:96 nM:i:0 uT:A:1
My Log.progress.out file reports that I have 81.7% alignment of reads. However, I notice that sample.unmapped.sam has way more lines (=reads) than sample.mapped.sam.
wc -l *mapped*
5407730 sample.mapped.sam
8218835 sample.unmapped.sam
How could this be? Any ideas?
(Another point of confusion - the mate1/mate2 files associated with this file, which should also contain these unmapped reads thanks to --outReadsUnmapped Fastx, were completely blank. For another test file, however, they had 12,000,000-23,000,000 lines. Very confusing!)
Thank you, Kristin
Can you paste the Log.final.out here?
Sure:
You have in fact 93 % alignment. Need more information. Would you please provide the output of the below for checks ?
*
*
I see - so alignment is not just Uniquely mapped reads %, but Uniquely mapped reads % + % of reads mapped to multiple loci?
Also, I didn't produce any .bam files with STAR - just .sam - so I'll run the commands you suggested on that.
Hmmm, I sense a theme...What does it mean for SEQ and QUAL to be of different length? Does the "truncated file" refer to the .fastq or some other file? These same .fastqs aligned normally with TopHat...