creating a list of genes in python
2
0
Entering edit mode
2.1 years ago
arshad1292 ▴ 110

Hello,

I have a .csv file that contains columns called, geneid, log2FoldChange, pvalue, padj and symbol etc. I have gene symbols under the symbol column as shown in the attached screenshot enter image description here.

I want to generate a volcano plot and for that I want to create a list called let's say picked1 that contains selected 5 gene symbols (CLK1, HMOX1, SAT1, TUFT1, RHOB).

How can I create that in python? I tried the following:

picked1 = ('CLK1', 'HMOX1', 'SAT1', 'TUFT1', 'RHOB')

In the above example, it does create a list but it's not reading from the .csv file. I want it to read it from the .csv file and whenever I call pikced1 it should pick these 5 symbols from the .csv file and tell me "log2FoldChange" values of these 5 genes.

Could anyone please guide me how should I create this list so it's linked to .csv file?

Many thanks in advance!

rnaseq python • 1.1k views
ADD COMMENT
0
Entering edit mode
2.1 years ago
iraun 6.2k

Not sure if I am following the question but... First you have to read the file:

df = pd.read_csv("input.csv")

Then you create the list with the genes of interest:

picked = ['CLK1', 'HMOX1', 'SAT1', 'TUFT1', 'RHOB']

Then you select those rows of the input file whose symbol column value is included in picked list, and you return log2FoldChange:

print(df[df['symbol'].isin(picked)]['log2FoldChange'])
ADD COMMENT
0
Entering edit mode
2.1 years ago
M.O.L.S ▴ 100
  • open the file.
  • Append the contents to a list.
  • Make your picked one list.
  • split the line based on semicolons get your gene symbol and your log fold change.
  • filter all values based on the ones in your picked1 list.
  • close the file.
  • make your volcano plot?

    enter code here

     infile = open("biostars.csv", "r")
    
     my_list = []
      for line in infile:
              my_list.append(line)
    
      lfc = []
      symb = []
    
      picked1 = ['CLK1', 'HMOX1', 'SAT1', 'TUFT1', 'RHOB']
    
      for line in my_list:
          individual_rows = line[:-2].split(";")
          log_fold_change = individual_rows[2]
          lfc.append(log_fold_change)
    
      symbol = individual_rows[-1]
      symb.append(symbol).  
    
     for val, gene in zip(lfc, symb):
         if gene in picked1:
         print(gene, val)
    
      infile.close()
    

Perhaps you could do something like that to make your volcano plot using the gene and lfc values in a list.

ADD COMMENT

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6