Image analysis of yeasts experiments Benoît Clavier 1er septembre 2020 Table des matières 1 Experiments and objectives 1.1 The experiments . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Objective of the macro . . . . . . . . . . . . . . . . . . . . . . 2 2 2 2 Presentation of the macro 2.1 Technical difficulties . . . . . . . . . . . . 2.1.1 Trap placement . . . . . . . . . . . 2.1.2 Shape of the traps . . . . . . . . . 2.1.3 Small sacrifices . . . . . . . . . . . 2.1.4 Data management . . . . . . . . . 2.2 Step 1 . . . . . . . . . . . . . . . . . . . . 2.3 Step 2 . . . . . . . . . . . . . . . . . . . . 2.4 Data analysis - Jupyter Notebook . . . . . 2.4.1 Normal size determination . . . . . 2.4.2 Brightness threshold determination . . . . . . . . . . 3 3 3 3 4 4 5 6 7 7 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Results 3.1 Area distribution . . . 3.2 Brightness distribution 3.3 Circularity distribution 3.4 Summary of every case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 9 10 10 4 Results analysis 4.1 Data quality . . . . . . 4.2 Number of cells . . . . 4.3 GFP estimation . . . . 4.4 Real value of the area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 12 12 13 5 Conclusion 14 6 Annex 15 6.1 Step 1 of the macro - Code . . . . . . . . . . . . . . . . . . . 15 6.2 Step 2 of the macro - Code . . . . . . . . . . . . . . . . . . . 19 1 Abstract The purpose of this short paper is to present the new results of experiments led by the Physical Microfluidics and Bioengineering team in Institut Pasteur. The results are obtained with an original code for the software ImageJ®. It automatically analyzes photographs to detect as much cells as possible without being disturbed by the flaws present on the photograph. This allows the quick analysis of a large number of cells (up to 150 000 per experiment), which then gives access to really quantitative results in only a few minutes, which isn’t possible by hand. After having presented the experiments and the objectives of this project, we will go through the ImageJ code, which we will call the macro, to see how it works and deals with the difficulties it encounters. This allows other people to use it on similar experiments, or to adjust it depending on what results they are precisely looking for. 1 1.1 Experiments and objectives The experiments The experiments led by the lab in Institut Pasteur are experiments on DNA reparation. The principle is to introduce a repeated sequence of nucleotides (called micro-satellites) in the DNA of yeast cells, and then see how good different endonucleases can repair it. Every experiment is done with thousands of cells. To observe them all, they are placed in a microfluidic device where they are trapped in small wells (each well is a 100 µm by 100 µm square). Once the cells are in the microfluidic trap, we put the trap in a microscope where they are regularly photographed. In order to have a better monitoring of the development of the cells, we introduce the micro-satellites in the middle of the GFP gene, which is expressed through the fluorescence of the cell. Therefore, the microscope takes two pictures every time : one picture in the bright field, and one picture that only detects the fluorescence of the cells. This allows to check if the GFP gene has been successfully repaired by the endonuclease. 1.2 Objective of the macro After multiple experiments, we can observe some anomalies with some conditions. One of the main concerns is the GAA-SpCas9 case (GAA is the microsatellite introduced and SpCas9 is the endonuclease used). In this case, we can observe that some cells are malformed : they are either not round, significantly bigger than the others or both. The difficulty to evaluate the proportion of this kind of malformation by hand is what motivated the creation of a macro that could efficiently analyze all the cells of an experiment 2 to give an effective summary of the cell distribution. The objective is to have a macro that directly analyzes all the data straight out of the microscope, and that doesn’t need any hand manipulation. The output of the macro itself is a simple dataframe (in .csv format). It can then be easily visualized using diverse tools (such as Matlab ®or Excel ®), here I wrote a Jupyter Notebook to visualize automatically what we want. 2 Presentation of the macro 2.1 Technical difficulties To get a better understanding of the macro, it is interesting to look at what difficulties it had to face first, and how those changed the construction of the macro. 2.1.1 Trap placement Since there are about 1500 traps in the device and we want to photograph them all in a relative short period of time (less than 20 minutes), we can’t take a photo of every single trap. Therefore, every photograph contains a certain number of traps, at different places. Figure 1 – two different photographs of the CGG-Cpf1 experiment Since we don’t want to do any hand manipulation, the macro has to detect the traps alone. This is the first task the macro has to fulfill. 2.1.2 Shape of the traps The first task leads to some problems when cells are one the sides of the trap. To solve those, a good solution is to convexify the shape of the traps. This is the second task of the macro. It helps recover almost all cells (except 3 those right in the corners sometimes). There will be an error estimation after the presentation of the macro. Figure 2 – Photograph - detected shape of trap - convexified shape of trap GAA - SpCas9 experiment 2.1.3 Small sacrifices Another small issue we encounter are the bright lines that appear on the sides of the traps. They are as bright as cells and therefore can be mistaken as cells, because we use a brightness threshold to detect them. To avoid this, the macro erodes the sides of the square enough to crop those lines out. We may lose some cells that were in the region but this prevents a big error due to those lines. Figure 3 – Trap with bright lines on the side - Eroded shape of the trap The red highlighted part on the left is detected as cells GAA - SpCas9 experiment 2.1.4 Data management To simplify the data management, I divided the macro in two separate steps : one that "cleans" the data (isolating the traps and highlighting the 4 cells), and one that processes the cleaned data. This allows to run different analyses on the same data without having to clean it every time . This is practical and time saving since the initial data is heavy (about 2GB per experiment) and the first step of the macro takes time (up to 10 minutes for one experiment). 2.2 Step 1 The commented code of the macro can be found in the annex to this paper. Now that we know how it has been constructed, let’s break down how it operates. Once we isolated the inside of the traps, the highlighting of the cells is done via a simple threshold with always the same values. After converting the photo to an 8-bit image, we can observe that the background value is never above 80. The cells are always above 110, therefore we use the value 100 for the threshold. We obtain the masks of the cells as a binary image (the pixel has value 1 if it is inside a cell and 0 otherwise). If we multiply this image by the photo of the green emissions we obtain a representation of the fluorescence of the cells. Figure 4 – Binary representation of cells - Green light emissions The two ingredients for the final image 5 Figure 5 – Original photograph - Output of step 1 2.3 Step 2 The commented code of the macro can be found in the annex to this paper. The second step of the macro is way shorter than the first one. Once the data is easy to process after step 1, we run a basic function of ImageJ : Analyze Particles. To avoid being disturbed by some other flaws (like air bubbles in the traps), we program it to focus on the particles that only have a certain size and circularity. After observing several photographs we observe that the cells are always between 20 and 400 pixels big, and the most malformed ones all have a circularity of at least 0,6 . Therefore we use those values for the Analyze Particles function. For each experiment, we obtain a dataframe containing the information of all the cells detected in the microfluidic device. The second column, ’Mean’, is the mean gray value of the cell. It is an indicator of the fluorescence of the cell, hence it shows if the DNA is successfully repaired. Figure 6 – Output of step 2 6 2.4 Data analysis - Jupyter Notebook Finally, we use a Jupyter Notebook for data visualization. For the study of malformations and DNA reparation, we draw the distribution of the area and the mean value of the cells. We also compare the circularity distribution of normal-sized cells and large cells. The Notebook can be found in the annex to this paper. The macro makes its size measurements in pixels, but the pixel scale is known : 1pix = 0, 325µm (1) 2.4.1 Normal size determination In order to determine what the "normal" size of a cell should be, we have a control : a wild type of the yeasts cells. Using the same macro, we can determine the mean size of the cells and the standard deviation. Adding those two values gives us the limit between large cells and normal cells. We also calculate the mean circularity. Figure 7 – Area distribution of the wild type The limit value is 54.33 7 Figure 8 – Circularity distribution of the wild type The mean value is 0.94 2.4.2 Brightness threshold determination We need to determine the minimum mean gray value of a "bright" cell (a cell that expresses GFP). After the 8-bit conversion, the background of the green emissions image has a value that oscillates around 47. Therefore a cell is considered bright if it’s mean gray value is above 71 (150 % of 47). 8 3 Results The same experiment was led with 3 different micro-satellites and 2 different endonucleases. The experiment was also led without repeated microsatellites (the NR case), so there are in total 8 different cases. The "Summary" section condenses the information of each case on one figure. The red lines represent the thresholds. 3.1 Area distribution Figure 9 – Comparison of area distributions 3.2 Brightness distribution Figure 10 – Comparison of brightness distributions 9 3.3 Circularity distribution Figure 11 – Comparison of circularity distributions 3.4 Summary of every case 10 Figure 15 – Summary of every case 11 4 Results analysis There are diverse factors that can cause errors in the results, they appear at every step of the process 4.1 Data quality The cell detection part of the macro will never be perfect. If we want to avoid mistaking flaws of the photo with cells, then we have to be specific about the cells you are analyzing : a certain size, brightness, circularity, etc... But the more specific we are, the more we miss the cells that are actually important : too big, too dark, misshapen, etc... Therefore, even if the macro makes everything automated, it is important to take a look at the data first to understand what kind of error you have to expect. 4.2 Number of cells There are multiple factors that reduce the number of cells that are detected by the macro. Air bubbles Sometimes an air bubble lands in a trap and it makes the photograph very hard to process for the macro. In those cases, the macro is designed to ignore them completely. In the worst cases there can be about 30 bubbles for an experiment, which corresponds to 2% of the traps. This implies a 2% uncertainty on the number of cells. Edge erosion Since we eroded the edges to avoid mistakes, we lost all the cells that are right on the edges. The erosion corresponds to 2% of the total area of the trap. Again, this implies a 2% uncertainty on the number of cells. Cell overlaying, etc... In some traps there are so much cells that it is hard to know whether or not they are one on another. In those cases, the macro will surely count less cells than there really are. There are a lot of other cases where the macro might miscount. However, we know it will always count less cells than there are, and the total uncertainty must be around 5% 4.3 GFP estimation The way the brightness threshold value is calculated can lead to some errors. In fact, We don’t have the real mean value of the background, just a mean calculated with a rather small sample. We know its value is between 35 and 55, which means that the threshold value uncertainty is less than 10%. 12 For the cases where only a few cells are near the threshold value (like GAA-SpCas9 or CTG-Sp-Cas9), this 10% uncertainty doesn’t really affects the results on the GFP positive cells. The threshold value we calculate is sometimes in the middle of a peak of population. In this case, the 10% uncertainty affects significantly the results. The worst case is CTG-Cpf1 : the percentage of GFP positive cells (42,3%) has a relative uncertainty that can go up to 15%. In the other cases, the relative uncertainty is below 7%. Nonetheless, this doesn’t prove anything wrong, and the value is always calculated the same way. The uncertainty is a consequence of the value determination method. We could improve the macro by making it calculate the mean background value for each image. However, it is difficult to make it detect what the background is, and also another time consuming step of the macro. 4.4 Real value of the area The measurements made by the macro in pixels give a good estimation of the size of the cells. However, the detected part of the cell is only its brighter pixels, not the whole cell. Therefore, we must keep a critical eye on the area measurements in squared micrometers. Figure 16 – Size of the detected size / real size comparison Given the blur of the photograph, it is hard to tell what the real size of the cell is. Still, we can reasonably assume that the real size is between 25 % and 75 % bigger than the detected size. 13 5 Conclusion Even if the macro has some difficulties processing all the data with precision sometimes, it can analyze quickly a big load of data, which was its main objective. The new microfluidic device allows this kind of detailed quantitative data analysis. Another advantage of this macro is that it can be used for a large range of experiments. The important parameters (such as the threshold values) can be modified in order to adapt to all kind of conditions. 14 6 6.1 Annex Step 1 of the macro - Code // The macro is separated in two steps, this makes the managing of files and directories easier ////////////////////// // // // ########## // // # STEP 1 # // // ########## // // // ////////////////////// // The purpose of this macro is to isolate the information we want : the cells and their green light emissions. // It use the first slice to detect the position of the microfluidic traps, it isolates them, then it isolates the cells inside the traps and overlays the green canal (slice two) to the cells // To apply this macro to a dataset, simply use the Process>Batch>Macro function of ImageJ, then choose the dataset folder as input folder, and it will create the output folder (named "particles")inside it. //Titles and directory management title0 = getTitle(); for example //this returns "gaa-spcas9-xxxx.tif" title_split = split(title0, "."); title = title_split[0]; // separate the title and ".tif" // get the title title1 = title+"-0001"; // title of the two different slices (0001 : photo , 0002 : green canal) title2 = title+"-0002"; dir = getDirectory("image") + "particles\\"; output folder File.makeDirectory(dir); // creation of the ////////////////////////////////////////////////// // // 15 // PART 1 : isolation of the microfluidic traps // // // ////////////////////////////////////////////////// // Stack to images // This allows to use title1 and title2 to select which slice to work on selectWindow(title0); run("Stack to Images"); // Detection of the big squares selectWindow(title1); run("Duplicate...", "title=squares" ); image to use as a "tool" selectWindow("squares"); // make a copy of the setAutoThreshold("Default dark no-reset"); // threshold of the dark parts (the big squares & the contour of the cells) //run("Threshold..."); run("Convert to Mask"); run("Analyze Particles...", "size=10000-200000 show=Masks clear"); // keep only the biggest particles (the squares) as masks (the masks are automatically named "Mask of squares"). // (the squares are about 150000 pixels big, but some are cropped on the side of the pic, therefore we use 10000-200000) selectWindow("squares"); //close this window because we don’t need it anymore close(); ////////////////////////////////// // Convexification of the masks // ////////////////////////////////// // We select each "square" and make it convex using this code. This allows cells that touch the side not to be "forgotten" by the macro 16 selectWindow("Mask of squares"); run("Analyze Particles...", "minimum=50 show=Nothing clear record"); n = nResults; for (i=0; i<n; i++) { doWand(getResult(’XStart’, i), getResult(’YStart’, i)); run("Convex Hull"); setForegroundColor(0, 0, 0); run("Fill", "slice"); } run("Select None"); run("Clear Results"); // Erosion of the squares (lose some quantity to win quality) // We erode the masks so we don’t have to deal with the white lines that appear on the sides of the squares, that can be mistaken with cells selectWindow("Mask of squares"); run("Options...", "iterations=15 count=1 do=Erode"); // Keep only the inside of the squares imageCalculator("Multiply create 32-bit", title1, "Mask of squares"); // the result is automatically named "Result of xxx-xxxx-xxxx-0001" (title1) selectWindow("Mask of squares"); //close this window because we don’t need it anymore close(); selectWindow(title1); //close this window because we don’t need it anymore close(); //////////////////////////////////////////////////////// // // // PART 2 : isolation of the green canal of the cells // // // 17 //////////////////////////////////////////////////////// // Detection of the cells selectWindow("Result of "+title1); setOption("ScaleConversions", true); run("8-bit"); // we convert to 8-bit to use our values for treshloding etc... run("Manual Threshold...", "min=100 max=255"); // now we can detect all the cells inside the squares with a simple threshold and keep them as a mask run("Convert to Mask"); // Keep only the green canal // this creates our final image. this overlays all the single cells ("Result of"+title1) with their green canal (title2) imageCalculator("Multiply create 32-bit", title2, "Result of "+title1); selectWindow("Result of "+title1); //close this window because we don’t need it anymore close(); selectWindow(title2); //close this window because we don’t need it anymore close(); //save the final image as 8-bit selectWindow("Result of "+title2); setOption("ScaleConversions", true); run("8-bit"); saveAs("Tiff", dir+title0); close(); 18 6.2 Step 2 of the macro - Code // The macro is separated in two steps, this makes the managing of files and directories easier ////////////////////// // // // ########## // // # STEP 2 # // // ########## // // // ////////////////////// // The purpose of this short macro is just to convert the output of Step 1 into a csv dataset // To apply this macro, simply use the Process>Batch>Macro function of ImageJ, then choose the "particles" folder you got with step 1 as input folder, and it will create the output folder (named "resultscsv")inside it. // Creation of the output forder dircsv = getDirectory("image") + "resultcsv\\"; File.makeDirectory(dircsv); // Detection and analysis of the cells run("Manual Threshold...", "min=10 max=255"); // thanks to step 1, the background is pure black so we can choose a low threshold value (anything that isn’t oure black is a cell) run("Set Measurements...", "area mean perimeter shape redirect=None decimal=3"); // here we define what parameters we want to measure (area for the size, mean for the fluorescence, shape descriptors for the morphology) run("Analyze Particles...", "size=20-Infinity circularity=0.65-1.00 display"); // the function "Analyze Particles" stores all the measurements we want saveAs("Results", dircsv+"Results.csv"); // this exports the measurements as a csv file, if you process a batch of images all the measurements still end up in a single csv file 19