Telechargé par Benoît Clavier

UE

publicité
Image analysis of yeasts experiments
Benoît Clavier
1er septembre 2020
Table des matières
1 Experiments and objectives
1.1 The experiments . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Objective of the macro . . . . . . . . . . . . . . . . . . . . . .
2
2
2
2 Presentation of the macro
2.1 Technical difficulties . . . . . . . . . . . .
2.1.1 Trap placement . . . . . . . . . . .
2.1.2 Shape of the traps . . . . . . . . .
2.1.3 Small sacrifices . . . . . . . . . . .
2.1.4 Data management . . . . . . . . .
2.2 Step 1 . . . . . . . . . . . . . . . . . . . .
2.3 Step 2 . . . . . . . . . . . . . . . . . . . .
2.4 Data analysis - Jupyter Notebook . . . . .
2.4.1 Normal size determination . . . . .
2.4.2 Brightness threshold determination
.
.
.
.
.
.
.
.
.
.
3
3
3
3
4
4
5
6
7
7
8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Results
3.1 Area distribution . . .
3.2 Brightness distribution
3.3 Circularity distribution
3.4 Summary of every case
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
10
4 Results analysis
4.1 Data quality . . . . . .
4.2 Number of cells . . . .
4.3 GFP estimation . . . .
4.4 Real value of the area
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
12
12
13
5 Conclusion
14
6 Annex
15
6.1 Step 1 of the macro - Code . . . . . . . . . . . . . . . . . . . 15
6.2 Step 2 of the macro - Code . . . . . . . . . . . . . . . . . . . 19
1
Abstract
The purpose of this short paper is to present the new results of experiments led by the Physical Microfluidics and Bioengineering team in Institut
Pasteur. The results are obtained with an original code for the software
ImageJ®. It automatically analyzes photographs to detect as much cells as
possible without being disturbed by the flaws present on the photograph.
This allows the quick analysis of a large number of cells (up to 150 000 per
experiment), which then gives access to really quantitative results in only a
few minutes, which isn’t possible by hand. After having presented the experiments and the objectives of this project, we will go through the ImageJ
code, which we will call the macro, to see how it works and deals with the
difficulties it encounters. This allows other people to use it on similar experiments, or to adjust it depending on what results they are precisely looking
for.
1
1.1
Experiments and objectives
The experiments
The experiments led by the lab in Institut Pasteur are experiments on
DNA reparation. The principle is to introduce a repeated sequence of nucleotides (called micro-satellites) in the DNA of yeast cells, and then see how
good different endonucleases can repair it. Every experiment is done with
thousands of cells. To observe them all, they are placed in a microfluidic
device where they are trapped in small wells (each well is a 100 µm by 100
µm square). Once the cells are in the microfluidic trap, we put the trap in a
microscope where they are regularly photographed. In order to have a better
monitoring of the development of the cells, we introduce the micro-satellites
in the middle of the GFP gene, which is expressed through the fluorescence
of the cell. Therefore, the microscope takes two pictures every time : one picture in the bright field, and one picture that only detects the fluorescence of
the cells. This allows to check if the GFP gene has been successfully repaired
by the endonuclease.
1.2
Objective of the macro
After multiple experiments, we can observe some anomalies with some
conditions. One of the main concerns is the GAA-SpCas9 case (GAA is
the microsatellite introduced and SpCas9 is the endonuclease used). In this
case, we can observe that some cells are malformed : they are either not
round, significantly bigger than the others or both. The difficulty to evaluate
the proportion of this kind of malformation by hand is what motivated the
creation of a macro that could efficiently analyze all the cells of an experiment
2
to give an effective summary of the cell distribution. The objective is to have
a macro that directly analyzes all the data straight out of the microscope,
and that doesn’t need any hand manipulation. The output of the macro
itself is a simple dataframe (in .csv format). It can then be easily visualized
using diverse tools (such as Matlab ®or Excel ®), here I wrote a Jupyter
Notebook to visualize automatically what we want.
2
Presentation of the macro
2.1
Technical difficulties
To get a better understanding of the macro, it is interesting to look at
what difficulties it had to face first, and how those changed the construction
of the macro.
2.1.1
Trap placement
Since there are about 1500 traps in the device and we want to photograph
them all in a relative short period of time (less than 20 minutes), we can’t
take a photo of every single trap. Therefore, every photograph contains a
certain number of traps, at different places.
Figure 1 – two different photographs of the CGG-Cpf1 experiment
Since we don’t want to do any hand manipulation, the macro has to
detect the traps alone. This is the first task the macro has to fulfill.
2.1.2
Shape of the traps
The first task leads to some problems when cells are one the sides of the
trap. To solve those, a good solution is to convexify the shape of the traps.
This is the second task of the macro. It helps recover almost all cells (except
3
those right in the corners sometimes). There will be an error estimation after
the presentation of the macro.
Figure 2 – Photograph - detected shape of trap - convexified shape of trap
GAA - SpCas9 experiment
2.1.3
Small sacrifices
Another small issue we encounter are the bright lines that appear on the
sides of the traps. They are as bright as cells and therefore can be mistaken
as cells, because we use a brightness threshold to detect them. To avoid this,
the macro erodes the sides of the square enough to crop those lines out. We
may lose some cells that were in the region but this prevents a big error due
to those lines.
Figure 3 – Trap with bright lines on the side - Eroded shape of the trap
The red highlighted part on the left is detected as cells
GAA - SpCas9 experiment
2.1.4
Data management
To simplify the data management, I divided the macro in two separate
steps : one that "cleans" the data (isolating the traps and highlighting the
4
cells), and one that processes the cleaned data. This allows to run different
analyses on the same data without having to clean it every time . This is
practical and time saving since the initial data is heavy (about 2GB per
experiment) and the first step of the macro takes time (up to 10 minutes for
one experiment).
2.2
Step 1
The commented code of the macro can be found in the annex to this
paper. Now that we know how it has been constructed, let’s break down how
it operates.
Once we isolated the inside of the traps, the highlighting of the cells is
done via a simple threshold with always the same values. After converting
the photo to an 8-bit image, we can observe that the background value is
never above 80. The cells are always above 110, therefore we use the value
100 for the threshold.
We obtain the masks of the cells as a binary image (the pixel has value 1
if it is inside a cell and 0 otherwise). If we multiply this image by the photo
of the green emissions we obtain a representation of the fluorescence of the
cells.
Figure 4 – Binary representation of cells - Green light emissions
The two ingredients for the final image
5
Figure 5 – Original photograph - Output of step 1
2.3
Step 2
The commented code of the macro can be found in the annex to this
paper.
The second step of the macro is way shorter than the first one. Once
the data is easy to process after step 1, we run a basic function of ImageJ :
Analyze Particles. To avoid being disturbed by some other flaws (like air
bubbles in the traps), we program it to focus on the particles that only
have a certain size and circularity. After observing several photographs we
observe that the cells are always between 20 and 400 pixels big, and the most
malformed ones all have a circularity of at least 0,6 . Therefore we use those
values for the Analyze Particles function.
For each experiment, we obtain a dataframe containing the information of
all the cells detected in the microfluidic device. The second column, ’Mean’,
is the mean gray value of the cell. It is an indicator of the fluorescence of the
cell, hence it shows if the DNA is successfully repaired.
Figure 6 – Output of step 2
6
2.4
Data analysis - Jupyter Notebook
Finally, we use a Jupyter Notebook for data visualization. For the study
of malformations and DNA reparation, we draw the distribution of the area
and the mean value of the cells. We also compare the circularity distribution
of normal-sized cells and large cells. The Notebook can be found in the annex
to this paper. The macro makes its size measurements in pixels, but the pixel
scale is known :
1pix = 0, 325µm
(1)
2.4.1
Normal size determination
In order to determine what the "normal" size of a cell should be, we
have a control : a wild type of the yeasts cells. Using the same macro, we
can determine the mean size of the cells and the standard deviation. Adding
those two values gives us the limit between large cells and normal cells. We
also calculate the mean circularity.
Figure 7 – Area distribution of the wild type
The limit value is 54.33
7
Figure 8 – Circularity distribution of the wild type
The mean value is 0.94
2.4.2
Brightness threshold determination
We need to determine the minimum mean gray value of a "bright" cell
(a cell that expresses GFP). After the 8-bit conversion, the background of
the green emissions image has a value that oscillates around 47. Therefore a
cell is considered bright if it’s mean gray value is above 71 (150 % of 47).
8
3
Results
The same experiment was led with 3 different micro-satellites and 2 different endonucleases. The experiment was also led without repeated microsatellites (the NR case), so there are in total 8 different cases.
The "Summary" section condenses the information of each case on one
figure. The red lines represent the thresholds.
3.1
Area distribution
Figure 9 – Comparison of area distributions
3.2
Brightness distribution
Figure 10 – Comparison of brightness distributions
9
3.3
Circularity distribution
Figure 11 – Comparison of circularity distributions
3.4
Summary of every case
10
Figure 15 – Summary of every case
11
4
Results analysis
There are diverse factors that can cause errors in the results, they appear
at every step of the process
4.1
Data quality
The cell detection part of the macro will never be perfect. If we want
to avoid mistaking flaws of the photo with cells, then we have to be specific
about the cells you are analyzing : a certain size, brightness, circularity, etc...
But the more specific we are, the more we miss the cells that are actually
important : too big, too dark, misshapen, etc... Therefore, even if the macro
makes everything automated, it is important to take a look at the data first
to understand what kind of error you have to expect.
4.2
Number of cells
There are multiple factors that reduce the number of cells that are detected by the macro.
Air bubbles Sometimes an air bubble lands in a trap and it makes the
photograph very hard to process for the macro. In those cases, the macro is
designed to ignore them completely. In the worst cases there can be about
30 bubbles for an experiment, which corresponds to 2% of the traps. This
implies a 2% uncertainty on the number of cells.
Edge erosion Since we eroded the edges to avoid mistakes, we lost all the
cells that are right on the edges. The erosion corresponds to 2% of the total
area of the trap. Again, this implies a 2% uncertainty on the number of cells.
Cell overlaying, etc... In some traps there are so much cells that it is
hard to know whether or not they are one on another. In those cases, the
macro will surely count less cells than there really are.
There are a lot of other cases where the macro might miscount. However, we know it will always count less cells than there are, and the total
uncertainty must be around 5%
4.3
GFP estimation
The way the brightness threshold value is calculated can lead to some
errors. In fact, We don’t have the real mean value of the background, just a
mean calculated with a rather small sample. We know its value is between 35
and 55, which means that the threshold value uncertainty is less than 10%.
12
For the cases where only a few cells are near the threshold value (like
GAA-SpCas9 or CTG-Sp-Cas9), this 10% uncertainty doesn’t really affects
the results on the GFP positive cells.
The threshold value we calculate is sometimes in the middle of a peak of
population. In this case, the 10% uncertainty affects significantly the results.
The worst case is CTG-Cpf1 : the percentage of GFP positive cells (42,3%)
has a relative uncertainty that can go up to 15%. In the other cases, the
relative uncertainty is below 7%. Nonetheless, this doesn’t prove anything
wrong, and the value is always calculated the same way.
The uncertainty is a consequence of the value determination method. We
could improve the macro by making it calculate the mean background value
for each image. However, it is difficult to make it detect what the background
is, and also another time consuming step of the macro.
4.4
Real value of the area
The measurements made by the macro in pixels give a good estimation of
the size of the cells. However, the detected part of the cell is only its brighter
pixels, not the whole cell. Therefore, we must keep a critical eye on the area
measurements in squared micrometers.
Figure 16 – Size of the detected size / real size comparison
Given the blur of the photograph, it is hard to tell what the real size of the
cell is. Still, we can reasonably assume that the real size is between 25 %
and 75 % bigger than the detected size.
13
5
Conclusion
Even if the macro has some difficulties processing all the data with precision sometimes, it can analyze quickly a big load of data, which was its main
objective. The new microfluidic device allows this kind of detailed quantitative data analysis.
Another advantage of this macro is that it can be used for a large range
of experiments. The important parameters (such as the threshold values) can
be modified in order to adapt to all kind of conditions.
14
6
6.1
Annex
Step 1 of the macro - Code
// The macro is separated in two steps, this makes the managing of
files and directories easier
//////////////////////
//
//
//
##########
//
//
# STEP 1 #
//
//
##########
//
//
//
//////////////////////
// The purpose of this macro is to isolate the information we want
: the cells and their green light emissions.
// It use the first slice to detect the position of the microfluidic
traps, it isolates them, then it isolates the cells inside the traps
and overlays the green canal (slice two) to the cells
// To apply this macro to a dataset, simply use the Process>Batch>Macro
function of ImageJ, then choose the dataset folder as input folder,
and it will create the output folder (named "particles")inside it.
//Titles and directory management
title0 = getTitle();
for example
//this returns "gaa-spcas9-xxxx.tif"
title_split = split(title0, ".");
title = title_split[0];
// separate the title and ".tif"
// get the title
title1 = title+"-0001";
// title of the two different
slices (0001 : photo , 0002 : green canal)
title2 = title+"-0002";
dir = getDirectory("image") + "particles\\";
output folder
File.makeDirectory(dir);
// creation of the
//////////////////////////////////////////////////
//
//
15
// PART 1 : isolation of the microfluidic traps //
//
//
//////////////////////////////////////////////////
// Stack to images
// This allows to use title1 and title2 to select which slice to
work on
selectWindow(title0);
run("Stack to Images");
// Detection of the big squares
selectWindow(title1);
run("Duplicate...", "title=squares" );
image to use as a "tool"
selectWindow("squares");
// make a copy of the
setAutoThreshold("Default dark no-reset");
// threshold of the
dark parts (the big squares & the contour of the cells)
//run("Threshold...");
run("Convert to Mask");
run("Analyze Particles...", "size=10000-200000 show=Masks clear");
// keep only the biggest particles (the squares) as masks (the
masks are automatically named "Mask of squares").
// (the squares are about 150000 pixels big, but some are cropped
on the side of the pic, therefore we use 10000-200000)
selectWindow("squares"); //close this window because we don’t need
it anymore
close();
//////////////////////////////////
// Convexification of the masks //
//////////////////////////////////
// We select each "square" and make it convex using this code. This
allows cells that touch the side not to be "forgotten" by the macro
16
selectWindow("Mask of squares");
run("Analyze Particles...", "minimum=50 show=Nothing clear record");
n = nResults;
for (i=0; i<n; i++) {
doWand(getResult(’XStart’, i), getResult(’YStart’, i));
run("Convex Hull");
setForegroundColor(0, 0, 0);
run("Fill", "slice");
}
run("Select None");
run("Clear Results");
// Erosion of the squares (lose some quantity to win quality)
// We erode the masks so we don’t have to deal with the white lines
that appear on the sides of the squares, that can be mistaken with
cells
selectWindow("Mask of squares");
run("Options...", "iterations=15 count=1 do=Erode");
// Keep only the inside of the squares
imageCalculator("Multiply create 32-bit", title1, "Mask of squares");
// the result is automatically named "Result of xxx-xxxx-xxxx-0001"
(title1)
selectWindow("Mask of squares"); //close this window because we don’t
need it anymore
close();
selectWindow(title1); //close this window because we don’t need it
anymore
close();
////////////////////////////////////////////////////////
//
//
// PART 2 : isolation of the green canal of the cells //
//
//
17
////////////////////////////////////////////////////////
// Detection of the cells
selectWindow("Result of "+title1);
setOption("ScaleConversions", true);
run("8-bit");
// we convert to 8-bit to use our values for treshloding
etc...
run("Manual Threshold...", "min=100 max=255"); // now we can detect
all the cells inside the squares with a simple threshold and keep
them as a mask
run("Convert to Mask");
// Keep only the green canal
// this creates our final image. this overlays all the single cells
("Result of"+title1) with their green canal (title2)
imageCalculator("Multiply create 32-bit", title2, "Result of "+title1);
selectWindow("Result of "+title1); //close this window because we
don’t need it anymore
close();
selectWindow(title2); //close this window because we don’t need it
anymore
close();
//save the final image as 8-bit
selectWindow("Result of "+title2);
setOption("ScaleConversions", true);
run("8-bit");
saveAs("Tiff", dir+title0);
close();
18
6.2
Step 2 of the macro - Code
// The macro is separated in two steps, this makes the managing of
files and directories easier
//////////////////////
//
//
//
##########
//
//
# STEP 2 #
//
//
##########
//
//
//
//////////////////////
// The purpose of this short macro is just to convert the output
of Step 1 into a csv dataset
// To apply this macro, simply use the Process>Batch>Macro function
of ImageJ, then choose the "particles" folder you got with step 1
as input folder, and it will create the output folder (named "resultscsv")inside
it.
// Creation of the output forder
dircsv = getDirectory("image") + "resultcsv\\";
File.makeDirectory(dircsv);
// Detection and analysis of the cells
run("Manual Threshold...", "min=10 max=255"); // thanks to step
1, the background is pure black so we can choose a low threshold
value (anything that isn’t oure black is a cell)
run("Set Measurements...", "area mean perimeter shape redirect=None
decimal=3"); // here we define what parameters we want to measure
(area for the size, mean for the fluorescence, shape descriptors
for the morphology)
run("Analyze Particles...", "size=20-Infinity circularity=0.65-1.00
display"); // the function "Analyze Particles" stores all the measurements
we want
saveAs("Results", dircsv+"Results.csv");
// this exports the measurements
as a csv file, if you process a batch of images all the measurements
still end up in a single csv file
19
Téléchargement