Automator Services for finding coordinates in DNA/AA strings

Do you write code to analyze or modify DNA or proteins?  Do you do your work on a Mac?  If so, I have a few Automator Services, written in AppleScript, that you may find very handy:

  1. Get the sequence (& alignment) length of a selected nucleotide string
  2. Get the sequence length of any selected string (e.g. protein or quality string)
  3. Show where a coordinate is in any selected string (including white-spaces)
  4. Show where a coordinate is in a selected sequence (e.g. protein or quality string)
  5. Show where a nucleotide coordinate is in a selected sequence
  6. Show where an alignment coordinate is in a selected nucleotide sequence
  7. Get the reverse complement of a selected nucleotide sequence
  8. Guess the barcodes present in a FastQ file *NEW

With these, you can highlight a sequence anywhere in any application and either get the selection length or show where a supplied coordinate is in the selected sequence.

Each service, once installed, will show up in the contextual menu that shows up anytime you right-click any selected text, system-wide on your mac, under the services sub-menu, e.g.:

countntsservicemenu.png

Here are the full details of how to use each service and what it does:

1. Get the sequence (& alignment) length of a selected nucleotide string

Name: Count Nucleotides

countntsoutput.png

This service does a bit more than count nucleotides.  As seen in the example on the right, it reports the number of nucleotides in the selection (sequence length), the length of the selected alignment, the number of discrete & ambiguous nucleotides, and a breakdown of all case-insensitive sequence characters (including gaps and ambiguous nucleotides).

Spaces, tabs, newlines, carriage returns, numbers, or any other non-sequence characters are completely ignored, so you can select any sequence, even if it is formatted & displayed with coordinates.  Only the sequence found inside the highlighted text is considered.  The first selected sequence character is coordinate 1.

2. Get the sequence length of any selected string (e.g. protein or quality string)

Name: Count Sequence Characterscountnonwschars.png

This service counts every character selected except for spaces, tabs, newlines, and carriage returns.  It’s good for getting the length of selected unaligned protein sequences (with no formatted coordinates in the selection) or of quality strings.  Note, there is currently no service for aligned amino acid sequences, but if you would like such a service, let me know in the comments and I’ll whip one up.  I work mostly with DNA, thus I haven’t had much need for aligned protein coordinate determinations.

3. Show where a coordinate is in any selected string (including white-spaces)

Name: Select N Characters

This service works only on “solid sequence” (i.e. having no whitespaces, hard returns, or for that matter: any non-sequence characters).  See services 4-6 for sequence-specific functions.  The way this service shows where a coordinate is, is by changing the length of the selection.  The resulting last character of the selection after the length modification is the length supplied by you, the user.  For example, if you tell it to select 4 characters in this string you selected: ATGCCGTAG, the selection will end up as: ATGCCGTAG.

lengthprompt.png

There are 2 ways to supply the coordinate.  The default way is to grab the coordinate from the clipboard, so all you have to do is copy the number you want to use to set the selection length.  However, if the content of the clipboard is not a number, a popup window will appear to ask you to enter the desired selection length.

 

To use this service:

  1. [Optional] Copy a number/length indicating the amount of the sequence you want to select.
  2. Select any length of sequence from the start position (position 1)
  3. Right-click the selection and select Services -> Select N Characters
  4. [If you didn’t do step 1] Enter the length of sequence you want to select

This script alters the selection length of the selected text you right-clicked on in the window in which you clicked on the sequence, regardless of application.  However, if you are in Terminal.app, it displays the selection result in a popup window instead of in the Terminal itself.  This is because the modification of the selection length is accomplished by shift-arrow keystroke emulation and this is not a means by which you can modify a selection in the Terminal app.  This has 1 side effect.  Normally, if the entered length is longer than the selection, in any other app, that’s not a problem and the selection just expands, but in Terminal’s popup result window, all the service has access to is the selected text, so placeholder ‘N’s are appended to show the desired sequence length.

lengthresult-terminal.pngThere are other drawbacks to this Terminal selection-length work-around.  The font is not fixed-width and the width of the popup window is fixed at a fairly narrow size, thus large sequences cannot be displayed very well.

Since the selection modification happens via emulated keystrokes, it takes a little time for the final selection to be made, but you’ll see how fast it goes as you watch the selection being made.  Since it’s not instantaneous, the script will either adjust the selection from the end or select anew from the beginning for efficiency.

4-8. New services

Names: Select N Sequence Characters, Select N Nucleotides, Select N Alignment Characters, Reverse Complement, & Guess barcodes

These services operate just like Select N Characters, but take the character type into account.

Select N Sequence Characters doesn’t include whitespace characters such as spaces, tabs, and newlines in the calculation of a coordinate in a string.

Select N Nucleotides doesn’t include non-nucleotide characters such as white spaces, gap characters, numbers, etc. in the calculation of a coordinate in a string.

Select N Alignment Characters behaves just like Select N Nucleotides but includes gap characters in the calculation of a coordinate in a string as the one exception.

Guess Barcodes allows you to right-click on a file and find out what the likely barcodes are.  It takes a little bit to run and makes some common assumptions, but if you think the results are wrong, you are given the opportunity to tweak the parameters and run again.

Installation

  1. Go to the github gist containing the Applescript code for each service
  2. Copy the code from one of the 3 files in the github gistcopygistcode.png
  3. Open Automator.appautomatordock.png
  4. Select service “Service”/gear icon from the dropdown sheet & click “Choose”selectservice.png
  5. Drag the “Run AppleScript” action into your workflowautomatorapplescript.gif
  6. Replace the purple code in the Run AppleScript action with the code you copied in step 1
  7. Save the workflow and name it however you would like it to appear in the services contextual menu (E.g. “Count Non-Whitespace Characters.workflow” – the extension will not appear in the menu)
  8. Repeat for the remaining 6 services.

The workflows/services will be saved automatically to your Library/Services directory in your home directory.  If you right-click the file name at the top of the window in Automator, you can select the Services folder to reveal it in the Finder.  You can then copy that file and send it to any other computer you would like to also have that service.

Just try your new services out by right-clicking on selected text anywhere.selectncharsexample.gif

And as you can see from the example above, Select N Characters works on any text.

Have fun!

Disclaimers: These services are only intended as a quick and dirty solution to work in any context, & any app.  If you have a repeated common use-case, consider other solutions.  Note also that any application which reserves the arrow keys for some function when the shift key is held down, other than modifying the most recent selection, this service will fail.  Some applications, such as java applications, modify selections using shift-arrow navigation differently, depending on the direction of the mouse drag during text selection.  This can produce unexpected results.  A work-around for both such issues can be to use the strategy used for Terminal.app, but this would required modification to the code.  A few of the features in the script rely on some tricks such as statically set delays and command-line calls, necessary to either wait for an application to respond or to control the focus of various windows.  If your computer is very busy or has any configuration issues, the proper functioning of these services may be disrupted.  These services were developed and tested on macOS Sierra, 10.12.6.  They may or may not work in other macOS versions.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s