Add `chr` to first column of specific rows?
3
2
Entering edit mode
5.9 years ago
star ▴ 350

I have a .gtf file as below I would like to add "chr" to the first column of the file but not in first 5 rows?

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07

output:

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
chr1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
chr1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07

I used the foloww cods but it add "chr" to the first 5 lines, as well.

cat Homo_sapiens.GRCh37.gtf | sed 's/^/chr/' > chr.gtf
RNA-Seq R alignment • 11k views
ADD COMMENT
0
Entering edit mode

Check if a line starts with a number (and X/Y/MT) and only then add chr using sed.

ADD REPLY
0
Entering edit mode

Could be done using R, but this is more suitable to command line, sed, awk,...

ADD REPLY
10
Entering edit mode
5.9 years ago
ATpoint 85k
awk 'OFS="\t" {if (NR > 5) $1="chr"$1; print}' in.gtf

Could have been found on google easily...

ADD COMMENT
5
Entering edit mode
5.9 years ago
Kairos ▴ 90

I would have gone with:

cat <(grep '^#' file.txt) <(grep -v '^#' file.txt  | sed 's/^/chr/g')

In order to be more dynamical.

ADD COMMENT
3
Entering edit mode
5.9 years ago
Chirag Parsania ★ 2.0k

~ R way.

dd <- tibble::tribble(
  ~V1,       ~V2,          ~V3, ~V4,       ~V5, ~V6, ~V7, ~V8,                                               ~V9,
    1, "ensembl", "chromosome",   1, 300239041, ".", ".", ".",      "ID=1;Name=chromosome:AGPv1:1:1:300239041:1",
    1, "ensembl",       "exon",   3,       104, ".", "+", ".", "Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07"
  )

dd2 <- dd %>% dplyr::mutate(V1 = paste("Chr" , V1 , sep=""))

# A tibble: 2 x 9
  V1    V2      V3            V4        V5 V6    V7    V8    V9                                             
  <chr> <chr>   <chr>      <dbl>     <dbl> <chr> <chr> <chr> <chr>                                          
1 Chr1  ensembl chromosome     1 300239041 .     .     .     ID=1;Name=chromosome:AGPv1:1:1:300239041:1     
2 Chr1  ensembl exon           3       104 .     +     .     Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07
ADD COMMENT

Login before adding your answer.

Traffic: 2573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6