how to extract mutation from MAF file in python
1
0
Entering edit mode
4.4 years ago
anasjamshed ▴ 140

I just downloaded data from tcga and GDC . I have two files , one is anotation.txt and other is .maf. I want to extract mutation from these files with the help of python script and analyze them in python, plz help me

genome python DEGs • 4.6k views
ADD COMMENT
0
Entering edit mode

MAF is already a mutation file. What do you mean by extract mutations?

ADD REPLY
0
Entering edit mode

Yes but my file contains sacttered data how can i analyze them these scattered data into python ? how can i load?

ADD REPLY
0
Entering edit mode

What does "scattered data" mean?

Also, do not add answers unless you're answering the top level post. If you're doing that to bump the post, that is bad etiquette and the post will be closed.

ADD REPLY
0
Entering edit mode

scattered means raw data which is not organized just like in excel

ADD REPLY
0
Entering edit mode

Do you have an MAF file or raw data? MAF files are processed files. What is the exact format of your data? Also, Excel is not a good tool in bioinformatics. Please give us the exact file name of this data file you're referring to.

ADD REPLY
0
Entering edit mode

i downloaded the file annotations.txt from tcga which contains data like this :

id  submitter_id    entity_type entity_id   category    classification  created_datetime    status  notes
b9e2adea-809d-5e34-bab7-5ed878bdae95    814 case    f39edd06-1016-4e8e-a42e-12f7e699ddc5    Prior malignancy    Notification    2010-10-28T00:00:00 Approved    No Note Specified
d68360a8-9a8a-5cbe-b0dc-e623ba48323e    18424   analyte 74f3a478-527e-4d49-9c3f-29f353e1fb6c    General Observation 2013-10-21T00:00:00 Approved    DNA analyte UUID: 74F3A478-527E-4D49-9C3F-29F353E1FB6C was involved in an extraction protocol deviation wherein isopropanol precipitation was used as a means of buffer exchange on the column-eluted analyte.
35f5f0d9-803b-5cd7-9279-81e9354ed552    18407   analyte e277fe01-b1da-4a50-a5f7-9d97706c29fe    General Observation 2013-10-21T00:00:00 Approved    DNA analyte UUID: E277FE01-B1DA-4A50-A5F7-9D97706C29FE was involved in an extraction protocol deviation wherein an additional column purification step was used as a means of buffer exchange on the column-eluted analyte.
50b03a27-96ee-50f3-85fd-3bcdb993951b    18406   analyte fe7f0b20-42ce-48c7-a407-61dc0ea0878e    General Observation 2013-10-21T00:00:00 Approved    DNA analyte UUID: FE7F0B20-42CE-48C7-A407-61DC0EA0878E was involved in an extraction protocol deviation wherein an additional column purification step was used as a means of buffer exchange on the column-eluted analyte.
ddd39064-59cd-54fb-8154-f923b0ee2ca2    18415   analyte 04727f81-231a-4f9d-80ab-7fd3d3564a98    General Observation 2013-10-21T00:00:00 Approved    DNA analyte UUID: 96DB3A0E-2D63-4E79-BF6D-BC7D2FE60157 was involved in an extraction protocol deviation wherein isopropanol precipitation was used as a means of buffer exchange on the column-eluted analyte.
4b2239b1-c3c8-5b41-87e4-e02104a4abd8    17882   analyte d0330be0-6b5b-4f07-a73f-2bf53bce86cd    General Observation 2013-10-08T00:00:00 Approved    DNA analyte UUID: D0330BE0-6B5B-4F07-A73F-2BF53BCE86CD was involved in an extraction protocol deviation wherein isopropanol precipitation was used as a means of buffer exchange on the column-eluted analyte.
9ed62acd-06fb-5862-915b-daae738d35df    18416   analyte 89907265-4e6c-4b3c-9f2f-e5ec2ad50d07    General Observation 2013-10-21T00:00:00 Approved    DNA analyte UUID: C88EB59E-18A1-4F37-9FC8-9BE86EA9CBDF was involved in an extraction protocol deviation wherein isopropanol precipitation was used as a means of buffer exchange on the column-eluted analyte.
a1fb1dc5-4e68-55d0-b1da-f85f3a8f7345    23697   case    16fc3677-0393-4ed1-ad3f-c8355f056369    History of unacceptable prior treatment related to a prior/other malignancy Notification    2014-11-25T00:00:00 Approved    Patient had prior breast malignancy (in opposite breast) with systemic chemotherapy (Tamoxifen).
e018b82c-e9df-551e-8e42-861ef244b56f    18402   analyte e27e9375-d153-4d0d-80ea-19c0f58c6c60    General Observation 2013-10-21T00:00:00 Approved    DNA analyte UUID: E27E9375-D153-4D0D-80EA-19C0F58C6C60 was involved in an extraction protocol deviation wherein an additional column purification step was used as a means of buffer exchange on the column-eluted analyte.
358b5574-802a-5d1e-b4cf-404f5a0374d6    18420   analyte 024b6e54-dc95-457d-a70d-9db56806159f    General Observation 2013-10-21T00:00:00 Approved    DNA analyte UUID: 5493F123-5740-48C9-A531-B351FDA6B081 was involved in an extraction protocol deviation wherein isopropanol precipitation was used as a means of buffer exchange on the column-eluted analyte.
eece96b7-df23-57b5-bef5-e00edc5a4ea3    1052    case    09765b0a-94f6-47d2-af56-93368084ac3a    Prior malignancy    Notification    2010-12-14T00:00:00 Approved    Case had prior malignancy
40f69d58-20fe-5bab-9f7f-0a29c52f52d1    20713   case    dfefb76a-ec6b-4cd2-9d45-2e1e4befc7ea    History of unacceptable prior treatment related to a prior/other malignancy Notification    2014-06-16T00:00:00 Approved    systemic treatment given to the prior/other malignancy
fd8b1342-cb70-5c35-be8d-38c595fb8670    811 case    dfefb76a-ec6b-4cd2-9d45-2e1e4befc7ea    Prior malignancy    Notification    2010-10-28T00:00:00 Approved    No Note Specified
08f67bcd-c2b4-5c66-9a3c-1c8dd289bcae    17891   analyte 11f3c3e2-2b1d-4409-a43e-42eae8358ce3    General Observation 2013-10-08T00:00:00 Approved    DNA analyte UUID: 11F3C3E2-2B1D-4409-A43E-42EAE8358CE3 was involved in an extraction protocol deviation wherein isopropanol precipitation was used as a means of buffer exchange on the column-eluted analyte.
6c41401e-adee-51f3-ac4f-0fb80203db3e    820 case    2c04a2f5-321e-4dea-8e00-268325da65cb    Prior malignancy    Notification    2010-10-28T00:00:00 Approved    No Note Specified
38269e94-1c37-5d51-bb61-a5cea639d8b5    813 case    02f5ae33-a563-4ecb-9e33-dfa500a44931    Prior malignancy    Notification    2010-10-28T00:00:00 Approved    No Note Specified
ADD REPLY
0
Entering edit mode

That's not the MAF file, it's the MANIFEST file with metadata.

ADD REPLY
0
Entering edit mode

so which one is MAF file I downloaded tar folder in which 3 files are present

ADD REPLY
0
Entering edit mode

Look at the description of MAF files online (on the GDC/NCI website) and compare the three files you have, that should help you pick the right file.

ADD REPLY
0
Entering edit mode

there are 3 files in tar folder one is manifest 2nd one is annotations and the third one is mad file which is built in ms access and does not open in my pc

ADD REPLY
0
Entering edit mode

built in ms access

No, it's not.

does not open in my PC

Do not use double-click to open. Use either the Windows Subsystem for Linux or use a Linux computer to read these files. They are plain text files and can be read either using linux commands (such as head, tail, cat) or notepad++ (which is a GUI application and could crash if it attempts to open a HUGE file).

Please contact someone that knows linux or bioinformatics to help you with this task, we cannot hand-hold you through it.

ADD REPLY
0
Entering edit mode

Sir, I am also a bioinformatician but when I try to open this maf file this directly open into ms access and show errors

ADD REPLY
0
Entering edit mode

I apologize, I meant for you to check with someone that knows linux better than you do, not suggest that you're not one of us.

when I try to open

How are you opening it? Are you using linux commands or a point and click interface? If it's the latter, it's time to switch to linux commands.

ADD REPLY
0
Entering edit mode

now I successfully open it in notepad++. I have also Linux OS beside windows in my PC.

ADD REPLY
1
Entering edit mode

That's good progress. Remember to use linux as much as you can - it will only help.

ADD REPLY
0
Entering edit mode

salut, j'ai le même problème que vous , vous pouvez m'aider comment lire le fichier.maf sous notebook python?

ADD REPLY
0
Entering edit mode

A MAF file is a tab delimited file. You should be able to use pandas to read it into a data frame.

ADD REPLY

Login before adding your answer.

Traffic: 991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6