Trying to get UNIX to read gene names in one file so I can grep lines from another file
2
0
Entering edit mode
2.4 years ago
mxm189 • 0

Hi there question is in the title,

I found an article that lists a number of genes, I made a text file out of this, called MTOR. Using MTOR.txt I wanted to use UNIX or a Bash script to read the contents of this list, and then search in a CSV file (SkeletalMuscle) the lines that contain the name of the genes. These lines contain Differential gene expression. I tried using grep and awk separately but honestly I'm kinda new to UNIX, still have to get used to it and I'm at the point where I think to myself maybe I should just try using Excel and using VLOOKUP or something... please help

grep csv text • 1.6k views
ADD COMMENT
0
Entering edit mode

Show the contents of the files in question. A simple head -5 filename should be adequate.

ADD REPLY
0
Entering edit mode

This is the Simple MTOR text file:

LAMTOR1
LAMTOR2
LAMTOR3
LAMTOR4
LAMTOR5

This is the SkeletalMuscle.csv file:

genename,WT,A1,FDR,AveExpr,P value,htseq-countdata76B1Mus203_2,htseq-countdata76C1Mus218_2,htseq-countdata76E1Mus209_2,htseq-countdata76F1Mus210_2,htseq-countdata76G1Mus211_2,htseq-countdata76B1Mus203_2 CPM,htseq-countdata76C1Mus218_2 CPM,htseq-countdata76E1Mus209_2 CPM,htseq-countdata76F1Mus210_2 CPM,htseq-countdata76G1Mus211_2 CPM
2010010A06RIK,0,-2.12806812,0.371865991,-4.254517526,0.001484291,1,2,0,0,0,0.1,0.104,0,0,0
4930426L09RIK,0,-2.12806812,0.371865991,-4.254517526,0.001484291,1,2,0,0,0,0.1,0.104,0,0,0
4930455G09RIK,0,-2.731721932,0.371865991,-4.010039042,0.000404351,2,3,0,0,0,0.2,0.155,0,0,0
5830444B04RIK,0,-3.226294438,0.371865991,-1.266945299,0.001403814,26,23,3,1,4,2.597,1.191,0.168,0.058,0.201
ADD REPLY
1
Entering edit mode
2.4 years ago

sort both files and use join https://linux.die.net/man/1/join

ADD COMMENT
0
Entering edit mode

I can try this, tomorrow its 23:42 at the moment. Thanks for the tip will update how it went.

ADD REPLY
0
Entering edit mode
join -1 1 -2 3 -t "," MTORsorted.txt SMsorted.csv > MTORGENE.txt

gives me an error message:

join: SMsorted.csv:3: is not sorted: 0610009E02RIK,0,-0.557568688,0.791414822,-1.504006552,0.329848873,3,12,3,6,6,0.3,0.621,0.168,0.347,0.301

But I checked SMsorted.csv and its alphabetically sorted. Any tips?

ADD REPLY
1
Entering edit mode
2.4 years ago
GenoMax 147k

See this example. data file contains your data from above.

$ more names
LAMTOR1
LAMTOR2
LAMTOR3
4930455G09RIK

$ grep -w -f names data
4930455G09RIK,0,-2.731721932,0.371865991,-4.010039042,0.000404351,2,3,0,0,0,0.2,0.155,0,0,0
ADD COMMENT
0
Entering edit mode

I don't know if I follow but I will try to use this tomorrow morning. My MTOR file did not contain 4930455G09RIK but I see what the output will be. I'll try, will let you know if it worked out. Thanks for the help

ADD REPLY
1
Entering edit mode

I added that name to demonstrate how this will work. I assume that your data csv file will contain LAM* names in it.

ADD REPLY
0
Entering edit mode

I tried

more MTOR.txt

followed by

grep -w -f MTOR.txt SkeletalMuscle.csv

This did not work... I also tried more MTOR.txt and then followed by grep -w -f SkeletalMuscle.csv but then the terminal does nothing.

ADD REPLY
0
Entering edit mode

I looked up other people with similar questions and they give a similar solution but when I convert it to a file it returns an empty file... why is that?

grep -w -f MTOR.txt SkeletalMuscle.csv > filterMTORgenes.csv

command results into blank text file

ADD REPLY
0
Entering edit mode

Make sure your files are in "unix" format by using dos2unix if they were created on a windows machine.

ADD REPLY

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6