`chromosome_strand` all change to `1`
0
0
Entering edit mode
7.4 years ago
wangshx ▴ 10

The chromosome_strand column should indicate "-" or "+". It is right when I use zcat simple_somatic_mutation.open.tsv.gz |head -5 to see the head of data (from ICGC tsv file). But when I use awk to get some columns I am interested in, chromosome_strand all change to 1. Anybody know why?

  $zcat simple_somatic_mutation.open.tsv.gz |head -5 | awk 'BEGIN{FS="\t";OFS="\t";} {print $1,$2,$3,$12}'
    icgc_mutation_id    icgc_donor_id   project_code    chromosome_strand
    MU28469596  DO50633 BOCA-FR 1
    MU28469596  DO50633 BOCA-FR 1
    MU28469596  DO50633 BOCA-FR 1
    MU28469596  DO50633 BOCA-FR 1
ICGC awk • 1.4k views
ADD COMMENT
0
Entering edit mode

paste the output of zcat simple_somatic_mutation.open.tsv.gz |head -5

ADD REPLY
0
Entering edit mode
 $zcat simple_somatic_mutation.open.tsv.gz |head -5
icgc_mutation_id    icgc_donor_id   project_code    icgc_specimen_id    icgc_sample_id  matched_icgc_sample_id  submitted_sample_id submitted_matched_sample_id chromosome  chromosome_start    chromosome_end  chromosome_strand   assembly_version    mutation_type   reference_genome_allelemutated_from_allele  mutated_to_allele   quality_score   probability total_read_count    mutant_allele_read_count    verification_status verification_platform   biological_validation_status    biological_validation_platform  consequence_type    aa_mutation cds_mutation    gene_affected   transcript_affected gene_build_version  platform    experimental_protocol   sequencing_strategy base_calling_algorithm  alignment_algorithm variation_calling_algorithm other_analysis_algorithm    seq_coverage    raw_data_repository raw_data_accession  initial_data_release_date
MU28469596  DO50633 BOCA-FR SP111595    SA529113    SA529110    IC280T_WGS  IC280C_WGS  X   123185046   123185046   1   GRCh37  deletion of <=200bp G   G   -           60  20  not tested      not tested      frameshift_variant  E365        ENSG00000101972 ENST00000371157 75  Illumina HiSeq  Paired-End http://technology.illumina.com/technology/next-generation-sequencing/paired-end-sequencing_assay.html    WGS CASAVA http://support.illumina.com/sequencing/sequencing_software/casava.ilmn   BWA http://bio-bwa.sourceforge.net/bwa.shtml    Bambino https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html        35.0    EGA EGAS00001000855 
MU28469596  DO50633 BOCA-FR SP111595    SA529113    SA529110    IC280T_WGS  IC280C_WGS  X   123185046   123185046   1   GRCh37  deletion of <=200bp G   G   -           60  20  not tested      not tested      frameshift_variant  E365        ENSG00000101972 ENST00000371160 75  Illumina HiSeq  Paired-End http://technology.illumina.com/technology/next-generation-sequencing/paired-end-sequencing_assay.html    WGS CASAVA http://support.illumina.com/sequencing/sequencing_software/casava.ilmn   BWA http://bio-bwa.sourceforge.net/bwa.shtml    Bambino https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html        35.0    EGA EGAS00001000855 
MU28469596  DO50633 BOCA-FR SP111595    SA529113    SA529110    IC280T_WGS  IC280C_WGS  X   123185046   123185046   1   GRCh37  deletion of <=200bp G   G   -           60  20  not tested      not tested      frameshift_variant  E365        ENSG00000101972 ENST00000371144 75  Illumina HiSeq  Paired-End http://technology.illumina.com/technology/next-generation-sequencing/paired-end-sequencing_assay.html    WGS CASAVA http://support.illumina.com/sequencing/sequencing_software/casava.ilmn   BWA http://bio-bwa.sourceforge.net/bwa.shtml    Bambino https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html        35.0    EGA EGAS00001000855 
MU28469596  DO50633 BOCA-FR SP111595    SA529113    SA529110    IC280T_WGS  IC280C_WGS  X   123185046   123185046   1   GRCh37  deletion of <=200bp G   G   -           60  20  not tested      not tested      frameshift_variant  E365        ENSG00000101972 ENST00000455404 75  Illumina HiSeq  Paired-End http://technology.illumina.com/technology/next-generation-sequencing/paired-end-sequencing_assay.html    WGS CASAVA http://support.illumina.com/sequencing/sequencing_software/casava.ilmn   BWA http://bio-bwa.sourceforge.net/bwa.shtml    Bambino https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html        35.0    EGA EGAS00001000855

............. too many columns. Thanks for reminding me. I am wrong~

ADD REPLY
0
Entering edit mode

So your 12th col is not the strand, as you might have noticed too.

ADD REPLY

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6