Hi I have downloaded SRA file from the database and generated fastq files using SRA-toolkit. I tried to map these to custom genome file (fasta seq) using STAR, Bowtie, Bowtie2, BWA. But none of them worked.
I tried to download fastq files from EBI website but it didn't work. When I opened the fastq file I see weird encoding when I compared it to the fastq file generated by my own experiment. please see below - SRA fastq file-
@SRR867425.1 TEST690_0001:2:1:1013:1620/1
NACCACTATTGGACAAATCCAGGGAACATNGTCACTTCAGAACCAGAGTGCTTTTATTAGGCTTTTAGTCTGCTGTCCAGCTCTC
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%@
@SRR867425.2 TEST690_0001:2:1:1015:20229/1
NTGCAATTCCGGCGAAGGAACCACTGTTTCAGGTCGAGTAAGCACGAATCCTTCTTCCAGTAACGAGACCCCTGAAAATCCTCCC
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%A
@SRR867425.3 TEST690_0001:2:1:1015:8397/1
NAAACACGTAATATGATGCTTCTACATAGCCATCTAACTGAGTCAGTTGAGACCATAAACCACTGCTTCTGCCATGAGGCAGGAA
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%=
Fastq file from my own experiment
@HWI-700666F:84:C7GUUANXX:4:1101:2067:1954 1:N:0:AGTCAA
ATTTTTATTTTAAAAAGAGGATGTGTTTCCAATCAGTCTTTGCGCTTCTTCGATCCTTGTTCCTTTTGTGCTAAAATAATTAGCAGCCTTCTGAAGATG
+
=30:FFBFFGGG1CD@GGE>FGG@==:FGGGGGBGCGGGGG1@CGGGGGGG0CDGGGCDEGE>FGGGGGGFEG11<DCF>G:11?:FB0BDF:000BCB
@HWI-700666F:84:C7GUUANXX:4:1101:2004:1973 1:N:0:AGTCAA
TGGCCTTTATGAGTGGTCAAAGCTGCCTTGAGGAGAATTGTTTAATTCATCAACCTCCTGTATAACAGATTCATCTTCCAAACGCTGTCGATCAATTAA
+
<<ABFGGGGGGGGGGGGGGGGGGGDCFGG>DGGGDGGGCGGGGG1FGEGGGGGGGGGGFGGGGEGGGG1=EFGGGG1FGGCFGBEF=FGBGGGGGGEED
Can someone please help me to sort the issue? Thanks in advance
What was the error? Can please post it.
When I ran STAR this is the last part of the log.out file -
That log appears to be just for generation of genome indexes. Have you done the actual alignment? Please post the full STAR command lines you have used for genome index generation and for alignment runs.
first I ran this
STAR --runMode genomeGenerate --genomeDir . --genomeFastaFiles MYB_and_MYC.fasta --runThreadN 5
Then
STAR --runThreadN 14 --sjdbOverhang 50 --readFilesCommand zcat --sjdbGTFfile myb_myc.gtf --readFilesIn SRR867425_1.fastq.gz SRR867425_2.fastq.gz --runMode alignReads --outReadsUnmapped Fastx --limitBAMsortRAM 8729257684 --genomeDir . --outSAMtype BAM SortedByCoordinate --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical --twopassMode Basic --outSAMattrRGline ID:"MYB_MYC" --outFileNamePrefix "MYB_MYC"_ --quantMode GeneCounts
I suggest that you create a separate directory to hold the STAR indexes and supply it to
--genomeDir
option (instead of just./
).its still not working. exiting with the error message
Segmentation fault (core dumped)
at the 1st pass mappingAt least we have made some progress. How much memory are you assigning to this job (or should ask how much memory you have available)?
I am not assigning any particular number but I am running with 15 threads with 64GB ram. No other program is running on machine
Try reducing the number to 4 and see if that prevents the seg fault.