Entering edit mode
4.8 years ago
mudithekanayake
▴
10
Hello,
I'm trying to call peaks using MACS2. Now I have merged 4 SAM files from different lanes into one. But it contains extra lines because of the merge step. Is there any simple way to remove the unnecessary lines? First few lines of the data looks like this. I need to remove lines 24, 25, 26 (Last 3 @PG lines before the start of the sequence lines). Files are very large.
@HD VN:1.0 SO:unsorted
@SQ SN:10 LN:130694993
@SQ SN:11 LN:122082543
@SQ SN:12 LN:120129022
@SQ SN:13 LN:120421639
@SQ SN:14 LN:124902244
@SQ SN:15 LN:104043685
@SQ SN:16 LN:98207768
@SQ SN:17 LN:94987271
@SQ SN:18 LN:90702639
@SQ SN:19 LN:61431566
@SQ SN:1 LN:195471971
@SQ SN:2 LN:182113224
@SQ SN:3 LN:160039680
@SQ SN:4 LN:156508116
@SQ SN:5 LN:151834684
@SQ SN:6 LN:149736546
@SQ SN:7 LN:145441459
@SQ SN:8 LN:129401213
@SQ SN:9 LN:124595110
@SQ SN:X LN:171031299
@SQ SN:Y LN:91744698
@PG ID:bowtie2 PN:bowtie2 VN:2.3.4.1 CL:"/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/bowtie2-2.3.4.1-jl5zqymv3rkp64s5oovgymylg5eftfoi/bin/bowtie2-align-s --wrapper basic-0 -x /work/LAS/geetu-lab/hhvu/index_files/mm10/bowtie2-ensembl/mm10 -S /work/LAS/geetu-lab/hhvu/project1/chip-seq/1_bowtie2/I10_S64_L005.bam -U I10_S64_L005_R1_001.fastq"
@PG ID:bowtie2-2F3DB176 PN:bowtie2 VN:2.3.4.1 CL:"/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/bowtie2-2.3.4.1-jl5zqymv3rkp64s5oovgymylg5eftfoi/bin/bowtie2-align-s --wrapper basic-0 -x /work/LAS/geetu-lab/hhvu/index_files/mm10/bowtie2-ensembl/mm10 -S /work/LAS/geetu-lab/hhvu/project1/chip-seq/1_bowtie2/I10_S64_L006.bam -U I10_S64_L006_R1_001.fastq"
@PG ID:bowtie2-54BA705F PN:bowtie2 VN:2.3.4.1 CL:"/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/bowtie2-2.3.4.1-jl5zqymv3rkp64s5oovgymylg5eftfoi/bin/bowtie2-align-s --wrapper basic-0 -x /work/LAS/geetu-lab/hhvu/index_files/mm10/bowtie2-ensembl/mm10 -S /work/LAS/geetu-lab/hhvu/project1/chip-seq/1_bowtie2/I10_S64_L007.bam -U I10_S64_L007_R1_001.fastq"
@PG ID:bowtie2-31E64087 PN:bowtie2 VN:2.3.4.1 CL:"/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/bowtie2-2.3.4.1-jl5zqymv3rkp64s5oovgymylg5eftfoi/bin/bowtie2-align-s --wrapper basic-0 -x /work/LAS/geetu-lab/hhvu/index_files/mm10/bowtie2-ensembl/mm10 -S /work/LAS/geetu-lab/hhvu/project1/chip-seq/1_bowtie2/I10_S64_L008.bam -U I10_S64_L008_R1_001.fastq"
J00102:65:HF3WFBBXY:6:1101:2777:1806 16 12 77384783 1 50M * 0 0 TCCTTTCCCAATCTGTTGGTGGTCTTTTTGTCTTACTGACGGTGTCTTTT FFJFF7JJJFAJJJFJJJFA-JJJJJJJJFFJJJJJJJJJJJJJJFAFAA AS:i:0 XS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YT:Z:UU
J00102:65:HF3WFBBXY:8:1101:2341:1965 0 13 73416403 40 50M * 0 0 GAGAAGGGAAGGGAACAATTGGAAGGGGGATTTGTGAGATGACGATTGGG AAF<AFJ<7FFJJJJFFJJJJJJ<JJJJJA<FFJF-<A7<-F-<7-<A-A AS:i:-6 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:40C1A7 YT:Z:UU
Did you merge the SAM files with
samtools merge
? And do you really need to remove these @PG lines? Do they cause any problem downstream?Actually I don't have much idea about that. I was told to do so. Yes, I used samtools merge.
There is absolutely no need to do that,
macs
does not care about any of these PF header lines. In fact it is good to keep them. That allows tracking back what was done with the single files prior to the merge.I think you are correct. I called peaks using that file and I did not get any errors. But when I tried to visualize it using UCSC Genome browser I got a error that it does not recognize the first line. So I thought may be that was the issue. My narrowPeak file looks like this. Do you have any suggestion what to do?
These are first few lines.
I guarantee you that these lines have no impact on anything. @PR header lines are always ignored.
When I tried to visualize in UCSC Genome browser, I got above error. So I added chr to the chromosome column instead of 1. Then I'm getting following error.
Do you know what to do?
UCSC browser is interpreting the file as being in the BED format, but Macs2 produces a file in narrowPeak format. The first 6 columns of both formats are the same, but the additional 4 columns that Macs2 produces are interpreted differently in a standard BED file. You can either tell UCSC that it is in narrowPeak format (I think I answered one of your questions a few days ago with instructions on how to do this), or just turn it into a 6-column BED file using the cut command in UNIX (cut -f1,2,3,4,5,6 testOne_peaks_NEW2.narrowPeak > testOne_peaks_NEW2.bed).