Entering edit mode
3.6 years ago
salman_96
▴
70
Hi I have hg19 snps file which has some extra rows that I do not need and looks like this below
##INFO=<ID=COMMON,Number=1,Type=Integer,Description="RS is a common SNP. A common SNP is one that has at least one 1000Genomes population with a minor allele of frequency >= 1% and for which 2 or more >
##INFO=<ID=TOPMED,Number=.,Type=String,Description="An ordered, comma delimited list of allele frequencies based on TOPMed, starting with the reference allele followed by alternate alleles as ordered in>
#CHROM POS ID REF ALT QUAL FILTER INFO
1 10019 rs775809821 TA T . . RS=775809821;RSPOS=10020;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000020005000002000200;GENEINFO=DDX11L1:100287102;WGT=1;VC=DIV;R5;ASP
1 10039 rs978760828 A C . . RS=978760828;RSPOS=10039;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP
1 10043 rs1008829651 T A . . RS=1008829651;RSPOS=10043;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP
1 10051 rs1052373574 A G . . RS=1052373574;RSPOS=10051;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP
1 10055 rs892501864 T A . . RS=892501864;RSPOS=10055;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP
I only want to keep anything from this row using either R or Linux
#CHROM POS ID REF ALT QUAL FILTER INFO
What have you tried? The logic you need is to exclude all lines that begin with
##
.grep
should help you achieve this. Use google to find out how to exclude lines that start with a pattern using grep.I used
sed
to remove first 55 rowsThat approach has many pitfalls:
grep
would tell you what content you deleted, and given that number of lines is not important as long as the nature of the content is known, you should focus on documenting that.