Script for batch protein RMS calculation

First of all, the script depends on a tool named ProFit, developed and maintained by Dr. Andrew C.R. Martin’s group at UCL. So, download it from there and install it first to use the script.

The script is very very simple and has a very very simple purpose. I have folders with hundreds of models, output from Modeller, and I wanted to compared the 100 generated models to see how different they were and to compare also the distribution of their Objective Function, an energy parameter that evaluates which model is the most desirable. So, first, I selected as reference model the best model (with the best OBJ.FCT.) and then I would loop over the whole directory and run profit on each model against that reference. As I only need the value of the RMS, there’s no need for the whole output of ProFit, thus the small bit of parsing of the results. Also, for the Objective Function, I needed to read the model pdb file and extract it from there. Nothing grep can’t handle :) Additionally, I wanted a list with all the results and I wanted it sorted by RMS value.

In the end, some user-error-proof measures such as not inserting reference model, or this model not existing were added.

If someone has suggestions, as this was my first bash script, shoot!



if [ $# = 1 ] && [ -f $1 ]; then

    echo 'Analyzing...'

    for i in $(ls Target.B9*.pdb); do

        echo ${i%.p*b} $(profit -f script.txt $1 $i | grep RMS | cut -c 9- ) $(cat $i | grep -i OBJECT | awk '{ print $6 }') >> 'RMS_'$CURRENTDIR'.temp'


    echo 'Saving in RMS_'$CURRENTDIR'.txt ...'
    echo 'Ordering File by RMS'</span>

    sort -nk 2 'RMS_'$CURRENTDIR'.temp' > 'RMS_'$CURRENTDIR'.txt'

    rm 'RMS_'$CURRENTDIR'.temp'

    echo 'Finished!'

    echo -e 'Error!!\nUsage: <reference_model .pdb>\nCheck if reference_model.pdb exists!'



2 thoughts on “Script for batch protein RMS calculation

  1. Not bad for a first bash script! But there’s a few things that should be changed and a number of other that might be. No need to set CURRENTDIR, as that is what the command ‘pwd’ is for. There’s also no need to use ls in the for statement. Simply using ‘for i in Target*pdb’ will do the trick. Same thing for cat. Using ‘grep file’ in stead of ‘cat file | grep ‘ will save you a program call. In fact, you might even do the grep with awk, reducing the second part to a single call: awk ‘//{print $6}’ file. Similarly, the first part could be simplified by using sed.
    One of the great advantages of bash is that you can actually pipe the results from a for loop to another program, so you don’t really need an intermediate file. But that’s almost showing off. As a final twitch, you could even use the shorthand notation for an if .. else statement, using &&/|| O yes, and note that -f $1 returns false if it is not set, so you don’t need to check the number of arguments really. So here’s what I’d do… although I haven’t actually checked it :p (hope the formatting doesn’t get screwed)


    [[ -f $1 ]] \
    && for i in Target.B9*.pdb
    do echo ${i%.*} \
    $(profit -f script.txt $1 $i | sed -n ‘/RMS/s/^.\{8\}//p’)
    $(awk ‘/(OBJECT|object)/{print $6}’ $i)
    done | sort -nk 2 > RMS_`pwd`.txt \
    || echo -e ‘Error!!\nUsage: \nCheck if reference_model.pdb exists!’

  2. Pingback: Script for batch RMS protein calculation: UPDATE « Doei Doei

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s