Knowledge-Guided Semantic Indexing of Breast Cancer Histopathology Images

Knowledge-Guided Semantic Indexing of Breast Cancer
Histopathology Images
Adina Tutac, Daniel Racoceanu, Thomas Putti, Wei Xiong, Wee-Kheng Leow,
Vladimir Cretu
To cite this version:
Adina Tutac, Daniel Racoceanu, Thomas Putti, Wei Xiong, Wee-Kheng Leow, et al..
Knowledge-Guided Semantic Indexing of Breast Cancer Histopathology Images. BMEI2008, In-
ternational Conference on BioMedical Engineering and Informatics, May 2008, Sanya, Hainan,
China. <hal-00342275>
HAL Id: hal-00342275
https://hal.archives-ouvertes.fr/hal-00342275
Submitted on 27 Nov 2008
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destin´ee au d´epˆot et `a la diffusion de documents
scientifiques de niveau recherche, publi´es ou non,
´emanant des ´etablissements d’enseignement et de
recherche fran¸cais ou ´etrangers, des laboratoires
publics ou priv´es.
Knowledge-Guided Semantic Indexing of Breast Cancer Histopathology
Images
Adina Eunice Tutac
1,5
, Daniel Racoceanu
1,6
, Thomas Putti
2
, Wei Xiong
1.4
,
Wee-Kheng Leow
1,3
Vladimir Cretu
5
1
IPAL - UMI CNRS 2955 Singapore
2
NUH, Singapore,
3
NUS, Singapore,
4
I2R, A-STAR, Singapore,
5
Politehnica University of Timisoara, Romania,
6
University of Besançon, France
Abstract
Narrowing the semantic gap represents one of the
most outstanding challenges in medical image analysis
and indexing. This paper introduces a medical
knowledge – guided paradigm for semantic indexing of
histopathology images, applied to breast cancer
grading (BCG). Our method improves pathologists’
current manual procedures consistency by employing a
semantic indexing technique, according to a rule-based
decision system related to Nottingham BCG system.
The challenge is to move from the medical concepts/
rules related to the BCG, to the computer vision (CV)
concepts and symbolic rules, to design a future generic
framework- following Web Ontology Language
standards - for an semi- automatic generation of CV
rules. The effectiveness of this approach was
experimentally validated over six breast cancer cases
consisting of 7000 frames with domain knowledge from
experts of Singapore National University Hospital,
Pathology Department. Our method provides
pathologists a robust and consistent tool for BCG and
opens interesting perspectives for the semantic
retrieval and visual positioning.
1. I
NTRODUCTION
Within the last decade, histological grading [1] has
become widely accepted as a powerful indicator of
prognosis in breast cancer. Most grading systems
currently employed for breast cancer combine criteria in
nuclear pleomorphism, tubule formation and mitotic
counts. In general, each grading criteria is evaluated by
a score of 1 to 3 (3 being associated to the most serious
case) and the score of all three components are added
together to give the "grade”. Breast Cancer Grading
(BCG) [1] [2] requires time and attention while
classifying 100 cases/ day, each of them having around
2000 frames, as the pathologists usually do. Currently,
BCG is achieved by visual examinations of pathologists.
Such a manual work is time-consuming and
inconsistent. According to those issues, developing an
automatic grading system represents a strong medical
requirement.
Such an automatic grading system should naturally
be able to semantically index the images in line with the
medical domain knowledge, and inspired from their real
content. Content-based image indexing [3], [5] has been
subject of significant researches in the context of
medical imaging domain [4] [6]. Solving the issue of
the semantic gap [7] [8] between the low level features
[9], [10] and the high level semantic concepts [11]
represents the cutting edge research [12], [13].
In this paper, we propose a solution to meet
pathologist needs for automatic BCG. Beyond this, we
further model the BCG-related medical knowledge into
reasoning rules. These rules are embedded in semantic
indexing approaches.
The proposed method provides pathologists a robust
and consistent tool, as a second opinion, for breast
cancer grading, using the Nottingham grading system
[1] [2]. The effectiveness of the proposed approach has
been validated in experiments over six breast cancer
cases consisting of 7000 frames with domain
knowledge from pathologist experts.
The paper is organized as follows. In section 2,
domain knowledge analysis is introduced by describing
a synthesis of the breast cancer grading standard system
and showing the importance of grading in breast cancer
detection. Section 3 presents our grading approach
model with the medical image indexing inspired rules,
with the computation of local and global grading. The
semantic indexing of image features to give the local
and the global grading is presented in section 4. Section
5 contains experiments and results leading to
understanding semantic breast cancer image analysis,
thus, to achieve the grading aim. Finally, the results and
approaches are analyzed and conclusion/perspectives
are indicated.
2. D
OMAIN
K
NOWLEDGE ANALYSIS
COMPUTATION OF LOCAL AND GLOBAL
G
RADING
Breast cancer refers to a malignant tumor that has
developed from cells within the breast. Breast cancer is
a leading cause of death among women, and its
incidence is rising. Although curable, especially when
detected at early stages, breast cancer is expected to
account for 28% of incident cancer and 20% of cancer
deaths in women. A powerful marker in breast cancer
detection is the breast cancer grading. Among the
standard grading systems used all over the world,
Nottingham Grading System (NGS) is preferred for the
reason of providing more objective criteria for the
three component elements of grading and specifically
addresses mitosis counting in a more rigorous fashion.
The three component NGS criteria are briefed below
(
see Table 1, Figure 1
):
Tubule Formation (TF) - are referred as white
blobs (lumina) surrounded by a continuous
string of cell nuclei. The assessment of
tubular/acinar differentiation applies to the
neoplasm overall (over the whole tumor) and
requires examination of several sections at
scanning magnification.
Mitoses represent diving cells and the Mitosis
Count (MC) score is assessed in the peripheral
areas of the neoplasm and not in the sclerotic
central zone. Mitoses are abundant in areas of
poor tubule formation. Although, the
Oncologic Standards Committee considers
that mitosis count per square millimeter is
most accurate, the NGS
uses a scoring system
based on the number of mitoses per 10 High
Power Field’s (HPF’s).
Nuclear Pleomorphism Score (NPS) -
categorizes cells nuclei based on two features:
size and shape.
Figure 1. NGS synthesis a) Tubule Formation with more than 75% of neoplasm having a tubular
pattern b) Mitosis differentiation - the black arrow indicates mitosis; the green arrow indicates
non-dividing cells c) Small size and regular shape nuclei.
The scores for the three separate parameters
(tubules, nuclei and mitoses) are summated and the
overall grade of the neoplasm is determined. The
summation is usually done on 10 frames chosen
visually by the pathologist, according to his expertise
(
see Table 2
). In the technical reports [1], only the way
of choosing frames for mitosis count is mentioned. In
our approach, we propose the use of a simple
summation fusion between the 3 component criteria by
frame, to automatically choose the top ten hyperfields
able to compute the global grading.
Table 1. CRITERIA Score – Breast Cancer
Grading
Hyperfield (Frame) Score Criteria
Score TF NPS MC
1
>75%
neoplasm
tubules
Small size and
regular shape < 9
2
10 - 75 %
neoplasm
tubules
Medium size and
variation shape 10-19
3
<10%
neoplasm
tubules
Big size and irregular
shape > 19
Table 2. COMPOSITE score –Breast Cancer
Grading
Composite score /10 frames Global score
(TF+NPS+MC)
Composite score /10 frames Global score
(TF+NPS+MC)
Grade I (well differentiated) 3- 5
Grade II (moderately
differentiated) 6-7
Grade III (poorly differentiated) 8-9
3.
B
REAST
C
ANCER
G
RADING
R
ULES
-B
ASED
S
YSTEM
M
ODELLING
The purpose of this section is to introduce the
approach used to step from the medical concepts and
rules related to the breast cancer grading, to the
computer vision (CV) concepts and symbolic rules.
The aim is to move towards a future generic frame for
an assisted semi-automatic generation of CV rules and
(in future) computer programs, starting from specific
medical queries.
The modeling demarche has been leaded according
to the Ontology Web Language (OWL) developed in
the Semantic Web Protégé framework [14].
This section is structured in three parts, according to
the main steps of the proposed approach: development
of the correspondence between medical concepts and
computer vision concepts with respect to the OWL
standard; definition of intermediate CV rules and
generation of the final Symbolic Rules by combination
of the CV concepts and intermediate CV symbolic
rules.
Figure 2. PROTEGE model of the Breast Cancer Grading computer vision symbolic rules and
concepts
3.1
Correspondance between the Medical
concepts and adequate Computer Vision
concepts
According to the NGS synthesis, we proceed to the
tubule formation extraction as the nuclear
pleomorphism and mitosis count parameters
computation in order to create the rule-based method
able to automatically generate the grading (
see Table
3
). Therefore, to clearly define the rules, medical
concepts are transformed into computer vision
concepts. The computer vision concepts are then
modeled as Protégé concepts, in standard types as
classes and properties. Various instances for different
classes are created as individuals, where specific
values are assigned for classes and properties,
respectively.
Table 3. CONCEPTS correspondence (Medical
– CV Protégé
)
Medical
Concepts
CV concepts Protégé concepts type
Slide Image
(digitized)
Super class
Grading Grading Super class
Cells Cells Class inherited from Image
CellsCluster Union of Cells Class inherited from Cells
DarkCellsClust
er/
VeryDarkCells
Cluster
Union of Cells Class inherited from Cells
with hasIntensity
(property) Dark/VeryDark
(instances of Intensity
class)
Lumina White compact
segments of the
Image included
in union of dark
Cells
Class inherited from
WhiteBlobs with
hasIntensity (property)
White (instance of Intensity
class) hasSize (property)
Small (instance of Size
class) , hasLocalization
(property) Included_In
(instance of Localization
class) DarkCellsCluster
(instance of Cells)
Tubule
Formation
Union of Cells
Class inherited from
Grading
Mitosis Dividing Cells
nuclei
Class inherited from
Grading
Nuclear
Pleomorphism
dimension and
shape
characteristics
of nuclei
Class inherited from
Grading
Local Grading
FBCG
Computation of
grading for
Tubule
Formation ,
Mitosis number,
Nuclear
Pleomorphism
for a single
image (FTFS,
FMC, FNPS,
FBCG)
Class inherited from
TubuleFormation/
NPS
MC
Global Grading
GBCG
(10HPFs)
Computation of
grading for
Tubule
Formation ,
Mitosis number,
Nuclear
Pleomorphism
for ten images
(GTFS, GMC,
GNPS, GBCG)
Class inherited from
TubuleFormation
NPS
MC
Intensity Color spectrum
[White…VeryD
ark]
e.g. White
Super class for White, Dark
and VeryDark
Size/Shape Dimension/Sha Super classes for Small,
pe range
[smalllarge]/
[regular…irregu
lar]
e.g. small
e.g. regular
Medium, Large Classes /
Regular, Variation,
Irregular Classes
Localization Position
indicator
Supper class for
Included_In class and
hasLocalization property
Math operators count, round, sum, divide
property
CV concepts are defined in terms of Protégé language
as super classes, classes and properties. The inheritance
strategy is applied in the building of the classes’
structure. Combining these elements enables modeling
the symbolic rules related to each breast cancer grading
criterion.
Figure 2
presents a part of the Protégé model designed
for Breast Cancer Grading with computer vision
concepts implemented as classes, properties and
symbolic rules.
3.2
Intermediate rules
To obtain the symbolic tubule formation rule, we
create intermediate rules for each domain concept used
for this criterion.
- DarkCellsCluster is defined as containing group of
cells with intensity property value setup between
VeryDark and White
limits.
{ | }
DarkCellsCluster Cells VeryDark i White
= < <
Figure 3. PROTEGE DarkCellsCluster
symbolic rule
In terms of Protégé (
see Figure 3
), this rule is defined
as: Cells with hasIntensity (property) some Dark,
which is an instance of Intensity class. Some Dark may
1 / 10 100%