I am trying to use a pipeline for RNA seq which has been used by one of my colleagues. I have got in total of 8 files. These files are 8 count files of RNA seq data. I am using less command
less wth1count.txt
less wth2count.txt
less wth3count.txt
less wth4count.txt
less 99h1count.txt
less 99h2count.txt
less 99h3count.txt
less 99h4count.txt
and to count the number of lines for each of the 8 files the command that I use is
wc -l wth1count.txt
wc -l wth2count.txt
wc -l wth3count.txt
wc -l wth4count.txt
wc -l 99h1count.txt
wc -l 99h2count.txt
wc -l 99h3count.txt
wc -l 99h4count.txt
My subsequent goal is to filter out reads that are lower than 5 and for that I am using python scripts
filtercounts.py
#!/usr/bin/python
#Load relevant modules
import sys
#Read in the HTseq gene count output files for each of the 8 samples
inputlist = [sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], sys.argv[6], sys.argv[7], sys.argv[8]]
##Determine the genes with sufficient coverage
#Set up a variable to hold genes with sufficient coverage in any of the 8 samples
genelist = []
#For each HTseq file
for i in range(len(inputlist)):
#Open it
inputfile = open(inputlist[i], 'r')
#For each line
for line in inputfile:
#Determine the number of reads/transcripts for this gene
splitline = line.strip().split('\t')
#Does it pass the threshold?
if int(splitline[1]) >= 5:
#If so, add it to list
if not splitline[0] in genelist:
genelist.append(splitline[0])
#Close the file
inputfile.close()
#Write out the list of sufficiently expressed genes
outputlist = open(sys.argv[9], 'w')
for gene in genelist:
outputlist.write(gene + '\n')
outputlist.close()
##Filter each of the HTseq files for only sufficiently expressed genes across our samples
#For each HTseq file
for i in range(len(inputlist)):
#Open it
inputfile = open(inputlist[i], 'r')
#Open an output 'filtered' file
outputfile = open(sys.argv[i+10], 'w')
#For each line
for line in inputfile:
#Determine the gene
splitline = line.strip().split('\t')
#Is it in our list?
if splitline[0] in genelist:
#If so, write out the original line
outputfile.write(line)
#Close the files
inputfile.close()
outputfile.close()
Problem - When I am trying to execute the script I am getting the error "index list out of range"
I am also not sure where in the script I can feed the string of files as input?
Any help will be useful.
Please invest some time into formatting your post properly. Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Also, avoid blank lines between lines of code - that just makes code harder to read and adds to the character count unnecessaily. See:
vs your double-spaced:
Please post the full error.
The full traceback is
I would highly recommend you learn to separate your code into functions. For example:
You can then just call this function whenever you need to.
Does this answer OP's question? If not, it should be a comment, not an answer. Please read the posts under the how-to tag to contribute to and use the forum better.
Oh I see, understood - will use appropriately next time. Thanks for helping to moderate this forum!
No problem, I've moved you answer to a comment this time.