Hi All!
I'm a wetlab guy quite new to data analysis and would appreciate some help if possible!
Slowly i'm getting into commandline and understanding some of the workflow behind analysis but i've hit a bit of a wall. Following hisat2-build on the human genome (hg38) i run the following command and get the following output:
hisat2 -x genome/homo/homo.GRCm38 -U raw_data/NGS_211_553_fastq/_1_Cengiz.fastq.gz -S rna/lmsmc_ca_1.sam -p 3 -t
Time loading forward index: 00:00:09
Time loading reference: 00:00:05
Multiseed full-index search: 00:05:09
31656871 reads; of these:
31656871 (100.00%) were unpaired; of these:
375631 (1.19%) aligned 0 times
29871475 (94.36%) aligned exactly 1 time
1409765 (4.45%) aligned >1 times
98.81% overall alignment rate
Time searching: 00:05:17
Overall time: 00:05:26
Sounds great right? To me this reads like I'm doing ok!! Then i do samtools view sam > bam
then
samtools sort bam > sorted.bam
when i run samtools flagstat on this
samtools flagstat rna/lmsmc_ca_1.sorted.bam
34190647 + 0 in total (QC-passed reads + QC-failed reads)
2533776 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
33815016 + 0 mapped (98.90% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
So i'm a little confused as to why i have no properly paired sequences - i am a firm believer that i have done something wrong along the way! Any advices would be appreciated.
Many thanks, Cengiz
Oooohhh thats what it meant by 'properly paired' i thought it was properly paired against my reference genome. So assuming there is nothing to worry about i will continue !!
Thanks!
In your case following lines are the ones that are important. While you have some secondary alignments your data is well aligned to the reference.
I see! Thanks for the explanation :)