Dear all,
I'm having some trouble to identify singletons in paired-end sequencing data from Hi-C. I have a Hi-C library originated from 150 bp (75x2) paired-end Illumina flowcell. I ran the HiC-Pro (https://github.com/nservant/HiC-Pro) from the .fastq file and I got the following results:
Total_pairs_processed 3377696 100.0
Unmapped_pairs 227709 6.742
Low_qual_pairs 0 0.0
Unique_paired_alignments 716549 21.214
Multiple_pairs_alignments 686717 20.331
Pairs_with_singleton 1746721 51.713
Low_qual_singleton 0 0.0
Unique_singleton_alignments 0 0.0
Multiple_singleton_alignments 0 0.0
Reported_pairs 716549 21.214
I'm trying to have more information about the 51.713% Pairs_with_singleton. To do this, I'm trying to extract these singleton reads. However, I can't find the proper sam/bam flag to retrieve singletons.
1- Does anyone know the proper sam/bam flag to retrieve singletons?
Apart from that, I decided to map my fastq file with bowtie2 independently of HiC-Pro using the following command:
bowtie2 -N 1 -x ~/Desktop/Genomes_ref/bowtie2/hg19 -1 mysample_S1_L001_R1_001.fastq -2 mysample_S1_L001_R2_001.fastq -S mysample.sam
Then once I tried to retrieve any singleton information, I received different flag numbers for the same read pair:
~/Desktop/test$ samtools view -f 9 mysample.sam | head
M02015:342:000000000-BPD5F:1:1101:9901:1145 89 chr16 150502 42 76M = 150502 0 CACAGGCTGCAGAGAGTGGGCGCTGTTACCCGTTCACATAAACTTTCTAACCATGCACACAGATCAGAAAACACCC CGGGEEC<ECFAF@F:GEGEGCGGGGGGGEF9EGGGGE9FDFGGGGGGGGGFGGGGFGDGGFFCFGGGGGECCCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:9279:1148 89 chrUn_gl000225 71224 1 26M = 71224 0 CAAGAGATGTAACTATTCTCCAGGCT EECE<ACFGFGGFFE6C-G@ECC?CC AS:i:-5 XS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:2G23 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150 77 * 0 0 * * 0 0 AGTCCTGATCCCCAAATCTGATCCCCAAATCTGATCAGTCAGAGGAAAGTGGGCCACACGGGAAGAGAGGTTCTC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150 141 * 0 0 * * 0 0 NGACAGAGACAGATCCCATCCCNNNNNNNACTGGCCTTCAAACNNNNNANATTTTAAAGCCTGAAAANNAAGCTAC #8BCCGGGGGGGGFFFFGFGGG#######::DFGGFFGGGGG?#####:#:9CCFECFGFGGCCFFF##::CFFGG YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:13285:1152 73 chr9 140284100 42 76M = 140284100 0 GAGAGGGACAGAGAGGGACAGTGAGACCAGCAAGGAGCTGGGACGCTGGGAGCCAGGTGGATGCATGCAGAGAGGG CCCCCEGGGGGGGGGGECGGGFGGGGGGGGGFFEGGGGFFGECFCCGEGGGGGGGGG@<DEGG<EGG@CF9<6FFE AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152 77 * 0 0 * * 0 0 AAAAAATTGGGCCAGGCATGGTAGCTCATGCCTATAATCCCAGCACTTTGGGAGGCCAAGAGGGGAGGAACAGATT CC<CCFGG9<,6CF@@8F@C@FGGG<F<FGGGFGFCF<6,CE,EFC<<FGGG,@@@<E<E<AFCECEF:,C,C,CF YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152 141 * 0 0 * * 0 0 NTCATCGAATGGACTCGAAAGGNNNNNNNTAATGGACTTGAATNNNNNGNTCCCCAAATCTGATCCCNNAATCTG #-ACCGFAFFF8<C,CD<86@@#######,,:@FFG,CEFGG9#####:#,:CFFGEF<D@9@C@F@##9:CDFC YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9920:1152 89 chr9 40639288 1 76M = 40639288 0 CCTGCCAGCAGATGAGCTTCAAAGTGCCTTAAGGAAGCACTTTGACCAGAAGGTAGATAACTCTTATTATAGAAGA GEGGGGGGCGGGCGFGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGFGGGGGGGGGGGGGGFGGGGGGGGGCCCCC AS:i:0 XS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10574:1152 89 chr3 162948628 30 76M = 162948628 0 GACAAAAACAAGCAATGGGGAAATAATTCCCTATTTAATAAATGGTGTTGGGAAAACTGGCTAGCCATATGCAGAA <C7GGGGGGGGFE9GGE<C<EDCGCGGGGGGGGGGFFCGGGGGFFFGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC AS:i:0 XS:i:-5 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:15507:1153 77 * 0 0 * * 0 0 AATCCCAGCACTTTGGCAGGCCGAGGTGGGCGGATCCCCAAATCTGATCCCCAAATCTGATCCCCAAATCTGATCC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG YT:Z:UP
~/Desktop/test$ samtools view -f 5 mysample.sam | head
M02015:342:000000000-BPD5F:1:1101:9901:1145 133 chr16 150502 0 * = 150502 0 NTCCAGCTCTGTATTTAGAGTCNNNNNNNGTTGGGGAGATTGGNNNNNANTTGGGGATCAGATTTGGNNATCTTGT #8ACCFF<FGGEFGGGGCC9FC#######::CFFDGDGGGGGG#####:#696<<@7@FF,,,FEDF##::CDC9E YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9279:1148 133 chrUn_gl000225 71224 0 * = 71224 0 NATCAGTGCATAGATAACTCACNNNNNNNCCTGTAAGCAGAGCNNNNNCNAGAGTTACATAACCCCGNNAATCAGT #8-B-CFFG@,,;,;FEGGDG8#######,:CC6,,<CF@F@F#####:#:,,99,,CFE,C886BC##99:C<AC YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:10882:1150 77 * 0 0 * * 0 0 AGTCCTGATCCCCAAATCTGATCCCCAAATCTGATCAGTCAGAGGAAAGTGGGCCACACGGGAAGAGAGGTTCTC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150 141 * 0 0 * * 0 0 NGACAGAGACAGATCCCATCCCNNNNNNNACTGGCCTTCAAACNNNNNANATTTTAAAGCCTGAAAANNAAGCTAC #8BCCGGGGGGGGFFFFGFGGG#######::DFGGFFGGGGG?#####:#:9CCFECFGFGGCCFFF##::CFFGG YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:13285:1152 133 chr9 140284100 0 * = 140284100 0 * * YT:Z:UP YF:Z:LN
M02015:342:000000000-BPD5F:1:1101:11747:1152 77 * 0 0 * * 0 0 AAAAAATTGGGCCAGGCATGGTAGCTCATGCCTATAATCCCAGCACTTTGGGAGGCCAAGAGGGGAGGAACAGATT CC<CCFGG9<,6CF@@8F@C@FGGG<F<FGGGFGFCF<6,CE,EFC<<FGGG,@@@<E<E<AFCECEF:,C,C,CF YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152 141 * 0 0 * * 0 0 NTCATCGAATGGACTCGAAAGGNNNNNNNTAATGGACTTGAATNNNNNGNTCCCCAAATCTGATCCCNNAATCTG #-ACCGFAFFF8<C,CD<86@@#######,,:@FFG,CEFGG9#####:#,:CFFGEF<D@9@C@F@##9:CDFC YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9920:1152 133 chr9 40639288 0 * = 40639288 0 NACCTG #8BCCG YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:10574:1152 133 chr3 162948628 0 * = 162948628 0 NAAACCTCTAGGATCCCCAAATNNNNNNNCCAAATATGATCCTNNNNNANCCTGACAAAAACAAGCANNGGGGAA #86A@<FGGGF9@AEGGCGGCG#######,:C@FC,C<,CFFG#####:#:,@FFGGFCCFE<F@FG##::7@F: YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:15507:1153 77 * 0 0 * * 0 0 AATCCCAGCACTTTGGCAGGCCGAGGTGGGCGGATCCCCAAATCTGATCCCCAAATCTGATCCCCAAATCTGATCC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG YT:Z:UP
~/Desktop/test$ samtools view -f 5 -F 9 mysample.sam | head
~/Desktop/test$
For example, the read M02015:342:000000000-BPD5F:1:1101:9901:1145 presents the flag 89 when I use -f 9 and the same read presents the flag 133 once I use -f 5.
2- Does anyone knows why the flag changes?
Thank you in advance for your time, Raphael
The flag does not change. Each mate has its own flag. 89 means (1=paired | 8=mate unmapped | 16=read reverse strand | 64=first in pair) and 133 means (1=paired | 4=unmapped | 128=second in pair).
So, if you want to look for singletons that are aligned, use flag 8, if you want the non-aligned mate, use flag 4 as @prasundutta87 suggests.