Entering edit mode
3.1 years ago
anasjamshed
▴
140
I have a dataset in the tsv file which contains gene information. First, upload it in the pandas' data frame and now I want to remove all missense mutations present in data through the 'mutation somatic status' column
My code:
chunks=pd.read_csv("CosmicGenomeScreensMutantExport.tsv",chunksize=1000000,sep='\t')
dfList = []
for df in chunks:
dfList.append(df)
df = pd.concat(dfList,sort=False)
After removing missense mutations I want to isolate only those records that contain gene P23
Can anyone help me in this?
I also want to remove N.A, null and duplicated values from my dataset
This code : print(df[(df.'mutation somatic status' !="missense") & (df.'Gene name'=="TP53")]) is giving me syntax error
I have updated the code with appropriate column names. Please replace SO term and Gene name with appropriate values and also check the column names.