Hi Dear Friends, ( I'm not native in English so, be ready for some possible language flaws)
I have 6 paired end 100 bp fastq files (3 for female and 3 for males, paired-end reads = 12 files) which is from Illumina HiSeq RNA_seq process of a non-model fish. I have run some de novo transcriptome assembly using Trinity package.
Now I want to do some genome-guided investigation on the same data to check for the annotation of transcripts that did not show any blast hit in de novo assembly.
So, I have downloaded a close species reference genome from NCBI.
Now I want to map my reads to that genome.
- Which splice aware aligner is preferred? STAR or BBmap?
- In the old "bowtie --> tophat --> cufflink" pipeline, it must be some "bowtiebuild" preparing the reference genome file, is it needed yet?
- Can I use all my 12 files in just one command? or I must use sample1-lef - sample1-right mapping first and then repeat it for other 5 files?
Please share any useful script for aligning that you think it would be useful.
Thank you in advance
Dear ablanchetcohen, Hi and thank you.
I think the STAR is your first suggestion.
So, do you have any link for those Perl or Python automated mapping pipelines?
I have written my own, but they're not quite ready for release. The MUGQIC have released their pipelines on Bitbucket, but they're quite complicated to use.
If you really do a lot of analyses, you'll end up finding that the best option is just to write your own scripts. If you only do a few analyses, copy-pasting the same script also works fine.
Of course, there is always Galaxy.
Thank you, you are right, I will try my own pipeline.
I have seen in the Star manual, that there are several option for 1- indexing the genome and 2- mapping procedure
which of these options do you usually use and suggest for my case ?
(100 bp paired end illumina Hiseq2000, non model fish, and the close (not the same) species reference genome)
I really appreciate your help
BBMap works well as well. If you happen to use it make sure you add
sam=1.3
option to your command line. Counting programs (featureCounts
andHTseq-count
) don't understandsam v.1.4
format tags which BBMap outputs by default).Hi my friend and thanks,
I have search for a comprehensive step by step manual of BBMap and have found this:
https://www.biostarhandbook.com/tools/bbmap/bbmap-help.html
Do you have any better one (link for manual) ?
There are individual threads for BBMap tools at SeqAnswers. @Brian also has documentation/guides available in the program under
bbmap-nn.nn/bbmap/docs
directory. If something is unclear, we can help. Tag your post with "bbmap" so @Brian sees it.Simply, a workflow would be FastQC --> BBDuk/Trimmomatic/Cutadapt --> BBMap/STAR/HISAT2 --> Samtools --> featureCounts/HTseq-count --> DESeq2/edgeR
I find it easy to stay with BBTools since the command line format is consistent but you are free to choose/use any of the above tools.
Hi, is this fact that the number of chromosomes of my species is not equal to the number of chromosome of the reference genome species making any problem ?
No as long as you keep in perspective that the results you are getting are in reference to the surrogate reference genome.
You are not going to use positional information (but rather alignments) to get your counts. So if you discover that gene X is DE then you can't use the positional information as is since in your organism the gene could be on a different chromosome or location.