I use a lot of desktops (formerly know to OSX as “spaces”). Each desktop has windows open relating to a different project and when I split my time between two different active projects, I switch desktops. This occurs infrequently, but maybe a couple times a day. I find it rather disorienting and disruptive when I accidentally get thrown to an apparently random desktop for one reason or another, so I have set my system preferences accordingly, and that works fairly well for every application… except for the Finder – the app I arguably use most frequently. If you use the dock to switch apps instead of command-tab, and you frequently use multiple desktops, you may have noticed this inconsistency that frsutrates me every day, and it’s exacerbated by the fact that the animation doesn’t intuitively tell you how far you have to go to get back to the desktop you were just on. Thus, today’s Apple feedback:
Finder’s Space-Switching Behavior is Inconsistent with Other Apps
In the Mission Control system prefs pane, I have “When switching to an application, switch to a space with open windows for the application” *UN*checked. I like that if I click any app in the dock twice, it will still switch spaces despite that setting, yet clicking it once will not switch spaces (as long as it’s not the foremost app). However, this behavior is different for the Finder depending on whether or not there are any finder windows open on the current space. If no windows are present and the finder is not foremost, a single click of the app in the dock will switch spaces despite the system preference setting. This is inconsistent because the same conditions for any other app (e.g. Safari) will not do this any stay on the current space. This inconsistency disrupts my productivity.
People know that DNA sequencing technology has advanced and I think that the common lay-person’s perception is that we can sequence a whole genome, each chromosome, from end to end. In many cases, that’s possible, but it’s still a monumental effort. Notions of a “$1000 genome” belie the difficulties in full genome sequencing. When you hear in the news that we can sequence your genome – services like “23 and Me”, you think that we’re getting the whole picture, but we aren’t. We can sequence multitudes of short sequences very quickly and what we get is then mapped to a reference genome (which was one of those pain-staking efforts). But a (what I would consider) large portion of what is sequenced cannot be mapped and those that are mapped can have many inconsistencies – because one person’s genome may have a certain number of shuffled portions and subtle differences. AND you could even have two different cells from your own body possess 2 different distinct genomes.
Then there’s metagenomics – where we sequence multiple organisms all in one shot. You take a sample of water, dirt, or a swab from the flora of your mouth and you extract the DNA from all the microbes there and sequence it without a reference to map any of the resulting sequences to. In this torrent of information, we lack certain controls typically used to gauge quality of the sequence. As with all machinery, there is a margin of error. Sometimes a sequence that comes out has a typo, an A instead of a T or an extra G, or a missing C. When we’re sequencing one organism, we can compare a piece of DNA with other copies of DNA with the same “word” and the error gets out-voted and ignored. It’s like having 100 secretaries type up the same document in a foreign language that you don’t know. If 99 secretaries type the first word as “Que” and 1 of them types “Uqe” or “Quee”, we can pretty safely say that the correct word is “Que”. But if each secretary is randomly given 1 of 25 different documents to type up – each of which is purposefully slightly different, it’s not so easy to dismiss “Uqe” or “Quee”.
But if we know that the “e” key is slightly sticky and prone to typing double letters every once in awhile, it becomes easier to dismiss an instance of “Quee”, and that’s what this post is about. But what if there actually is a word such as “Quee” and we’d be dismissing a real word because we assume all rare occurrences of a double ‘e’ as a mistake? We can figure this out by using a control to measure how frequently this type of mistake occurs. As long as the occurrences of “Quee” fall into that general frequency or below, we can reasonably assume that it’s a typo. If we see “Quee” twice or three times as many times as we would expect if it were a typo, we might conclude that it’s a real word. And that is the basis for my recent paper and related software.
Typically, these sorts of errors are filtered out by first grouping all the most similar “words” together and then selecting the most frequent one as the representative of that group – assuming all others are errors of it. However our method forgoes the clustering step and first tries to measure the frequencies of each type of error present in the data – the ones we are fairly confident are errors. Then we look at the most similar words and determine whether one word could be an error of another by measuring how frequently each word is encountered and determine how likely the less frequent word is to be an error of the other by seeing whether it falls into the typical error rate/frequency we measured earlier. We call our method “Cluster Free Filtering”, or CFF for short.
There’s a lot more to it, but that’s the basic concept. You can get the nitty gritty details from the paper or even try out CFF for yourself if you have some DNA on your computer. It’s freely available. Note though that this is software specific to 1 narrow realm of metagenomic analysis: analysis of 16S rRNA variable regions where all the short sequences are very similar at the starting point.