While genetic modification tools like CRISPR have opened up new doors to making changes to genomes, it is still not as simple of a task as putting the CRISPR complex into a cell and having it go to work. It first must be given a target DNA sequence to focus on and, in most cases, guide RNA in order to assist it in quickly finding the sequence it needs to manipulate. The primary issue with this for human involvement is that these guide RNA must often be made from scratch for each particular alteration that needs to be done.
Taking Advantage of Computer Analysis
These sort of issues also plague other sorts of genetic modification without CRISPR, except for one field involving gene silencing with short hairpin RNA. And that’s because researchers that work with that system have created computer algorithms that reference a database to quickly put together likely RNA candidates that would work for whatever specific change is being conducted. This machine learning program is able to use past successes and failures to better optimize what kind of RNA sequence it needs to build in the future.
Researchers at Cold Spring Harbor Laboratory decided to try and use a similar system, but with CRISPR-mediated guide proteins (sgRNA). The huge number of possible combinations of sequences makes some sort of computer system a necessity. The focus of their research is on CRISPR knockout experiments, whereby a gene’s functionality is impaired. Through looking at previous studies, they found that certain characteristics of the sgRNA increased the likelihood of inducing frameshift mutations that would impair gene function. Their plan was to create a database of high efficiency RNA sequences to ensure no unwanted changes would cause an experiment to fail.
Modeling An Excise of Introns
One primary issue in past efforts has been having CRISPR properly target the functional protein domains. This is due to the fact that genes are not written in a single line of connected code in the genomes of organisms. Instead, the functional parts are split up, sometimes separated by quite long stretches of nonsense code called introns that do not code for proteins. If CRISPR was used to alter one of these, it would have largely no effect on the functionality of the gene. Instead, it needs to be directed to target specifically the parts of the gene used for protein coding.
This, in turn, is still complicated, as the total functional domains vs introns haven’t been completely mapped out for every gene we know of. To get around this, the researchers used a computer program to assign probability targets based on modeled effects of changing a specific nucleotide in a gene. This was combined with several other algorithms to complete a highly efficient way to determine where to target on the genome, which then lets the actual work of constructing the appropriate RNA sequence to begin.
Creating A Library
With this added computerized knowledge, they went about constructing a library of guide RNA sequences for other scientists to use, with notations on what they are best functional for. These specifically designed RNAs are made to increase the efficacy of a CRISPR knockout without requiring multiple attempts, along with significantly reducing any risk of off-target gene changes. Also, their library allows the use of multiple RNAs against a single target gene if using a particular tool capable of doing multiple CRISPR cuts all at once, like Cas12 (Cpf1) can, or just by using multiple Cas9 complexes at the same time.
In total, the library is currently made up of 10,000 sgRNA pairs and 50,000 verified constructs that are each made for recognizing a particular kind of sequence to cut. This may not sound like much, but since they are made to recognize each specific gene in the human genome, of which there are only about 20,000, it doesn’t require too many to have complete coverage.
The ultimate goal is to have a complete library of the 100,000 constructs needed for gene knockout for any gene in the human genome. But, at the time of printing, they had only half-way completed this end result. Meaning that there are certain types of gene groups that aren’t covered by the library. They hope to have this rectified soon, but they’ve set a high bar, as they want to have 5 available pair options for each human gene, thus the 100k focus.
A Source For Improvement
As of this article’s writing, there are only 781 genes without any constructs yet made, though only 2,368 have the full 5 options. So they’re getting there, but it will still take some time. You can follow their progress from this website and by clicking over to the “Sequence-verified clones” tab.
And, obviously, these computer algorithms work in such a way that they can hopefully be utilized by researchers working with other genomes, in order to improve the efficacy of their experiments. For the desire to be able to fix human diseases with efficiency and accuracy, especially single gene mutations, this library may prove invaluable in the years to come.
Photo CCs: SK8-18-2 human derived cells, fluorescence microscopy (29942101073) from Wikimedia Commons