Hello,
I have access to SNP and genotype information from NGS data, particularly Exon-Seq reads of 56 samples. For every sample a variant file exists in two formats: (.gff3 and kind of tab delimited file). Data include e.g. rsID, pos, CHR, refAllele, quality Score...
This means 56 files each with more than 400.000 SNPs. I know several tools for SNP data processing (plink, imputation stuff) but have no idea how to use them for this kind of data. Perhaps you can help me and suggest some tools to create eg. ped/map files or generally one genotype file for 56 samples of selected or all SNPs.
Are there standardized tools, at all? Or one has to use R, Unix &Perl commands to cut, combine and work with such data?
Thanks
Before getting into file conversion etc., What do you want to do with these data? Are these diseased individuals? Are you just trying to learn about NGS variant analyses? With a few more details you will get the answer you are looking for.
precisely. first comes the "what?", and then the "how?". and not in the other way round.
Can you post a snippet of the tab delimited file so we can see how it's structured? It could be VCF (though you'd likely have noticed the header). BTW, you can convert GFF3 into VCF (see this thread: Converting a SNP GFF3 file to VCF format) and then convert that into .ped and .map with vcftools if nothing else.