Entering edit mode
8.7 years ago
dr.akilalshawi
•
0
Hi everyone
I am new here..so I have question:
I have file data - Bovine HD SNPs- from Illumina and there type extension as (FinalReport.txt),so.. I want to start analyze of my file in Linux - PLINK or in R ,therefore I ask if anyone know how to start with it?What is the best route to analyze this type of file? Someone told me,you are need to import PED &Map file from your file..but I need more information in this topic... how to import these files from (FinalReport.txt)?
Thanks
When you're working with Linux, file name rarely matters. File type and file content, along with the structure of the data inside are what matter.
Peek into the
.txt
file and get an idea of the format. Start off withfile <filename>
to ensure it is in plain text, then usehead
to peek into the contents.This existing Biostars comment may be helpful: Question: Converting Illumina Raw Genotype Data Into Plink Ped Format It is a bit older so it may have changed. The easiest way, if you have Illumina GenomeStudio is to simply re-output the data in PLINK format. Otherwise you could also get the data provider to do so as well. I'm not sure about the lgen format. As Ram suggests you need to start looking at the data file format, understanding what everything is, and then what the program you want to use needs. Plink is well supported so getting data to plink format as in the link is straightforward. Many R packages probably can read Plink format files, but they may need am input format that is completely different.
Thank you very much for your suggestions, but unfortunately these information not enough because in addition to the final report file I have many files of snp map and sample map,therefore ...what are the files that I must select it together with final-report file?
It depends on what you want to do with PLINK. Like I said, you can output the data in PLINK format so that it is ready to go. Otherwise you typically would need to construct the MAP file for the SNP array you are using. It is a file describing the genomic coordiates and genetic distances for the SNPs found in the PED file.
There is some more information from Illumina here
The MAP, FAM, and PED formats are described on the PLINK website. The Illumina formats should be fairly well described but "Final Report" isn't a specific format. You can generate a Final Report file from Genome Studio a number of different ways but its header should describe what the columns are.
Thank you sir,I need to create PED/MAP files from my files Illumine with extension (FinalReport.txt) .Your link that you mentioned is not valid! Thanks
The links and information I provided absolutely are valid. Here's the thing, there isn't one single FinalReport format. Illumina GenomeStudio will output SNP data in multiple different ways when you go through a Wizard function. All of the data that it outputs is then called, by default, FinalReport.txt, prepended with sample names, etc. FinalReport.txt is not a file format, it is just a default name.
So you need to understand 1) What data is in your final report and 2) What the format for PED and MAP files is. Then you can convert data from one to the other.
Exactly where I'd start.