Entering edit mode
9.0 years ago
nalandaatmi
▴
110
Dear All,
I am using STAR alignment for aligning my fastq reads from human DNA against human reference genome.
Steps followed in installing STAR alignment:
1) Using git clone https://github.com/alexdobin/STAR.git
, I cloned STAR directory in my linux machine.
[software@gw2 STAR]$ ls
bin CHANGES.md doc extras LICENSE Makefile README.md RELEASEnotes.md source STAR-Fusion
2) Under bin, I found STAR executable file. Is this the file do I need to use for aligning?
[software@gw2 STAR]$ bin/Linux_x86_64/STAR
Usage: STAR [options]... --genomeDir REFERENCE --readFilesIn R1.fq R2.fq
3) Generating index for human genome
[software@gw2 STAR]$ /bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir /references/STAR_References/ --genomeFastaFiles /references/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome.fa --runThreadN 20
Dec 01 17:10:31 ..... Started STAR run
Dec 01 17:10:31 ... Starting to generate Genome files
Dec 01 17:12:17 ... starting to sort Suffix Array. This may take a long time...
Dec 01 17:12:41 ... sorting Suffix Array chunks and saving them to disk...
Dec 01 17:55:14 ... loading chunks from disk, packing SA...
Dec 01 18:02:40 ... Finished generating suffix array
Dec 01 18:02:40 ... Generating Suffix Array index
Dec 01 18:07:01 ... Completed Suffix Array index
Dec 01 18:07:01 ... writing Genome to disk ...
Dec 01 18:08:20 ... writing Suffix Array to disk ...
Dec 01 18:16:44 ... writing SAindex to disk
Dec 01 18:17:30 ..... Finished successfully
[software@gw2 STAR]$
4) Command executed for my samples.
$ STAR/bin/Linux_x86_64/STAR --genomeDir /references/STAR_References/ --runThreadN 20 --readFilesIn r1.fastq r2.fastq --outFileNamePrefix Sample_2002 _sam
5) Log file:
Dec 02 03:38:55 ..... Started STAR run
Dec 02 03:38:55 ..... Loading genome
Dec 02 03:43:38 ..... Started mapping
Dec 02 03:43:52 ..... Finished successfully
5) I received following output files
Sample_2002.samLog.final.out
Sample_2002.samLog.out
Sample_2002.samLog.progress.out
Sample_2002.samSJ.out.tab
Sample_2002.samAligned.out.sam #(The file contents are displayed below)
@HD VN:1.4
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@PG ID:STAR PN:STAR VN:STAR_2.5.0b CL:/STAR/bin/Linux_x86_64/STAR --runThreadN 20 --genomeDir /references/STAR_References/ --readFilesIn /Sample_2002/2002_AGCTAGTG_L002_R1.all_val_1.fq /Sample_2002/2002_AGCTAGTG_L002_R2.all_val_2.fq --outFileNamePrefix /Sample_2002/2002.sam
@CO user command line: /STAR/bin/Linux_x86_64/STAR --genomeDir /references/STAR_References/ --runThreadN 20 --readFilesIn /Sample_2002/2002_AGCTAGTG_L002_R1.all_val_1.fq /Sample_2002/2002_AGCTAGTG_L002_R2.all_val_2.fq --outFileNamePrefix /Sample_2002/2002.sam
NOTHING after this?
What's in Sample_2002.samLog.final.out ? Why did you not use a genome annotation file during the genome generation step to make full use of spliced alignments?
Dear Michael,
Why did you not use a genome annotation file during the genome generation step to make full use of spliced alignments?Do you mean human GTF file? No I didn't use it. Thanks for making a note of it. I will try to create new index file based on GTF file.
Please find the content of
Sample_2002.samLog.final.out
There you have it, your input file contained no reads or was not readable or truncated or whatever.
Dear Michael,
Yes my trimmed fastq files are empty. Do I need to start new question?
I used trim galore for adapter trimming. This is the summary of trim galore
Last 3 lines of my trim galore log file:
Michael is right. I will just add the only other real caveat I have with STAR is to make sure you have enough RAM, otherwise the alignment slows down to a crawl.
Yeah, about 30GB RAM might be required for mapping of human genome.