Question

Converting tab-delimited text file into HTML/PDF/latex/knitr report.

0

Entering edit mode

9.6 years ago

anilkanthi ▴ 10

I have a bash script that takes ABI file as input and uses ANNOVAR for annotating the variants. A tab-delimited text file is produced that contains the annotated variants. So everytime the bash script is executed for different ABI files, the number of columns are fixed in the tab-delimited file but the number of rows as well as the individual annotations may vary for each resulting variant.

Sometimes in the "AAchange" column, there maybe 1 or 2 or many transcripts that may be affected by the variation. Thus difficult to automate the reporting. In Clinvar, sometimes the OMIM ID may or may not be present in the "CLNDSDB=" part of the "clinvar" column.

ANNOVAR Result

Chr     Start      End        Ref   Alt   Func.refGene   Gene.refGene   GeneDetail.refGene   ExonicFunc.refGene   AAChange.refGene                                                                    snp138          clinvar                                                                                                                                                                                                                                 SIFT_score
chr13   52523808   52523808   C     T     exonic         ATP7B                               nonsynonymous SNV    ATP7B:NM_000053:exon12:c.G2855A:p.R952KATP7B:NM_001243182:exon13:c.G2522A:p.R841K   rs732774        CLINSIG=nonpathogenic|CLNDBN=Wilson's_disease|not_specified CLNREVSTAT=single|single CLNACC=RCV000029357.1|RCV000078044.1 CLNDSDB=GeneReviews:MedGen:OMIM:Orphanet:SNOMED_CT|. CLNDSDBID=NBK1512:C0019202:277900:ORPHA905:88518009|.    0.99
chr13   52523867   52523867   T     G     exonic         ATP7B                               synonymous           ATP7B:NM_000053:exon12:c.2796A>C:p.S932S ATP7B:NM_001243182:exon13:c.2463A>C:p.S821S

Attempts so far-->

I have tried to write a bash script that extracts [for the first variant] different fields from the tab-delimited text file, saves it as text file, combines all the resulting text individual files and using AWK script it assigns different variables to each of the fields in the Combined Text File. I have created HTML page using AWK and have used these variables in AWK script to print in respective tags in HTML and it works fine for a file that follows the same pattern in tab-delimited text file. But when a particular field is not present for other annotated results with different pattern, the script prints different fields than the variable it has been assigned for.

So in the above example, the first variant contains the Clinically significant mutation since there is annotation present in the "clinvar" column and thus it needs to be reported in a different section along with other details.

Combined Text File

p.R952K  Arginine  Lysine  chr13:52523808  chr13  Non-pathogenic  Wilson's_disease  ATP7B  chr13:52523808C>T  277900  rs732774  NM_000053  exonic  Nonsynonymous-SNV

The order of the combined text file is not the same for each variant, hence the report generated for it is not correct.

Expected Result-->

Since the format of the tab-delimited file is not uniform, is there any way that for each row I can set multiple conditions wherein for example If a specific column [for ex:clinvar] has a value, then print it in between HTML tags and if it is not present, then check for another column [for ex: rsID] and if a value is present then print it in some other HTML tags, and so on for other columns as well!

Gene Name   Disease           Result           Test Number
ATP7B       Wilsons Disease   Non-pathogenic

Variant position    Variant Type        rsID       Amino Acid Change   OMIM
chr13:52523808C>T   Nonsynonymous-SNV   rs732774   p.R952K             277900

In a similar manner, when there is a novel variant wherein the ExonicFunc.refGene column contains "non-synonymous" and there is no value in the snp138 column, then it should print the SIFT_score along with other details in between HTML tags. These are just some of the conditions that are needed, but if anyone can give an idea as to how to go about all this, it will be really helpful!

Thank you for reading such a long issue and any help on this problem would be greatly appreciated.

html awk ANNOVAR • 4.1k views

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.6 years ago by anilkanthi ▴ 10

0

Entering edit mode

Hello anilkanthi!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=60608

This is typically not recommended as it runs the risk of annoying people in both communities.