
Onco-proteomics approach to identify novel protein variants in databases derived from individual melanoma sequenced genomes
Patrice Waridel1, Christian Iseli2,3, Donata Rimoldi3, Armand Valsesia2, Alexandra Potts1, Brian J. Stevenson2,3, Ioannis Xenarios2 and Manfredo Quadroni1
1 Protein Analysis Facility, University of Lausanne, Lausanne, Switzerland, 2 SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland, 3 Ludwig Center for Cancer Research, University of Lausanne, Lausanne, Switzerland
Introduction
• RNAseq and exome sequencing allowed the identification of genomic variations (SNPs & splice variants) in metastatic
melanoma cell lines derived from patients [Valsesia et al., PloS ONE 2011; Nikolaev et al., Nature Genetics 2011].
• Proteomics analyses for the observation of genomic variations at the protein level request customized protein data-
bases based on RNAseq and exome sequencing data.
• Seven metastatic melanoma cell lines were analysed by proteomics for detecting expression of specific variant proteins.
Overview
Customized protein databases based on cell line specific genomic data
were used for the identification of protein variants in metastatic melanoma
cell by proteomics
>P12268/IMDH2_HUMAN: Inosine-5'-monophosphate dehydrogenase 2 (EC 1.1.1.205)
Score = 744 bits (1922), Expect = 0.0
Identities = 391/431 (90%), Positives = 394/431 (91%), Gaps = 34/431 (7%)
Query: 1 MAIAMALTGGIGFIHHNCTPEFQANEVRK--KYEQGFITDPVVLSPKDRVRDVFEAKARH 58
MAIAMALTGGIGFIHHNCTPEFQANEVRK KYEQGFITDPVVLSPKDRVRDVFEAKARH
Sbjct: 78 MAIAMALTGGIGFIHHNCTPEFQANEVRKVKKYEQGFITDPVVLSPKDRVRDVFEAKARH 137
Query: 59 GFCGIPITDTGRMGSRLVGIISSRDIDFLKEEEHDCFLEEIMTKREDLVVAPAGITLKEA 118
GFCGIPITDTGRMGSRLVGIISSRDIDFLKEEEHDCFLEEIMTKREDLVVAPAGITLKEA
Sbjct: 138 GFCGIPITDTGRMGSRLVGIISSRDIDFLKEEEHDCFLEEIMTKREDLVVAPAGITLKEA 197
Query: 119 NEILQRSKK--------DDELVAIIARTDLKKNRDYPLASKDAKKQLLCGAAIGTHEDDK 170
NEILQRSKK DDELVAIIARTDLKKNRDYPLASKDAKKQLLCGAAIGTHEDDK
Sbjct: 198 NEILQRSKKGKLPIVNEDDELVAIIARTDLKKNRDYPLASKDAKKQLLCGAAIGTHEDDK 257
Query: 171 YRLDLLAQAGVDVVVLDSSQGNSIFQINMIKYIKDKYPNLQVIGGNVVTAAQAKNLIDAG 230
YRLDLLAQAGVDVVVLDSSQGNSIFQINMIKYIKDKYPNLQVIGGNVVTAAQAKNLIDAG
Sbjct: 258 YRLDLLAQAGVDVVVLDSSQGNSIFQINMIKYIKDKYPNLQVIGGNVVTAAQAKNLIDAG 317
Query: 231 VDALRVGMGSGSICITQEVAPKIPPDIKSHSPKCPSTVTGCYMLACGRPQATAVYKVSEY 290
VDALRVGMGSGSICITQEV LACGRPQATAVYKVSEY
Sbjct: 318 VDALRVGMGSGSICITQEV------------------------LACGRPQATAVYKVSEY 353
Query: 291 ARRFGVPVIADGGIQNVGHIAKALALGASTVMMGSLLAATTEAPGEYFFSDGIRLKKYRG 350
ARRFGVPVIADGGIQNVGHIAKALALGASTVMMGSLLAATTEAPGEYFFSDGIRLKKYRG
Sbjct: 354 ARRFGVPVIADGGIQNVGHIAKALALGASTVMMGSLLAATTEAPGEYFFSDGIRLKKYRG 413
Query: 351 MGSLDAMDKHLSSQNRYFSEADKIKVAQGVSGAVQDKGSIHKFVPYLIAGIQHSCQDIGA 410
MGSLDAMDKHLSSQNRYFSEADKIKVAQGVSGAVQDKGSIHKFVPYLIAGIQHSCQDIGA
Sbjct: 414 MGSLDAMDKHLSSQNRYFSEADKIKVAQGVSGAVQDKGSIHKFVPYLIAGIQHSCQDIGA 473
Query: 411 KSLTQSHDVLW 421
KSLTQ +++
Sbjct: 474 KSLTQVRAMMY 484
Peptides covering common sequence
Peptides covering specific sequence
Active site
Fig 3: BLAST output highlighting the sequence differences between the variant
tr|NC_000003_736_13 and the matched protein Inosine-5'-monophosphate
dehydrogenase 2 (IMDH2). The MS/MS spectra corresponding to specific pep-
tides identfied by MASCOT and annotated by SCAFFOLD are also shown.
IMDH2_HUMAN:
VGMGSGSICITQEVLACGR
Mascot score 54
tr|NC_000003_736_13:
VGMGSGSICITQEVAPK
Mascot score: 54
tr|NC_000001_3252_0:
AVPSAEPQAGGPMTLSCQTK
Mascot score: 75
FCRLA_HUMAN:
AVPSAEPQAGSPMTLSCQTK
Mascot score: 64
Fig 2: BLAST output highlighting the sequence differences between the variant
tr|NC_000001_3252_0 and the matched protein Fc receptor-like A (FCRLA). The
MS/MS spectra corresponding to specific peptides identfied by MASCOT and
annotated by SCAFFOLD are also shown.
>Q7L513/FCRLA_HUMAN: Fc receptor-like A precursor
Score = 683 bits (1762), Expect = 0.0
Identities = 333/348 (95%), Positives = 337/348 (96%)
Query: 15 LPLSLLLVSRTSSWMSCRFETLQCEGPVCTEESSCHTEDDLTDAREAGFQVKAYTFSEPF 74
L LSL ++ ++ FETLQCEGPVCTEESSCHTEDDLTDAREAGFQVKAYTFSEPF
Sbjct: 12 LYLSLGVLWVAQMLLAASFETLQCEGPVCTEESSCHTEDDLTDAREAGFQVKAYTFSEPF 71
Query: 75 HLIVSYDWLILQGPAKPVFEGDLLVLRCQAWQDWPLTQVTFYRDGSALGPPGPNREFSIT 134
HLIVSYDWLILQGPAKPVFEGDLLVLRCQAWQDWPLTQVTFYRDGSALGPPGPNREFSIT
Sbjct: 72 HLIVSYDWLILQGPAKPVFEGDLLVLRCQAWQDWPLTQVTFYRDGSALGPPGPNREFSIT 131
Query: 135 VVQKADSGHYHCSGIFQSPGPGIPETASVVAITVQELFPAPILRAVPSAEPQAGGPMTLS 194
VVQKADSGHYHCSGIFQSPGPGIPETASVVAITVQELFPAPILRAVPSAEPQAG PMTLS
Sbjct: 132 VVQKADSGHYHCSGIFQSPGPGIPETASVVAITVQELFPAPILRAVPSAEPQAGSPMTLS 191
Query: 195 CQTKLPLQRSAARLLFSFYKDGRIVQSRGLSSEFQIPTASEDHSGSYWCEAATEDNQVWK 254
CQTKLPLQRSAARLLFSFYKDGRIVQSRGLSSEFQIPTASEDHSGSYWCEAATEDNQVWK
Sbjct: 192 CQTKLPLQRSAARLLFSFYKDGRIVQSRGLSSEFQIPTASEDHSGSYWCEAATEDNQVWK 251
Query: 255 QSPQLEIRVQGASSSAAPPTLNPAPQKSAAPGTAPEEAPGPLPPPPTPSSEDPGFSSPLG 314
QSPQLEIRVQGASSSAAPPTLNPAPQKSAAPGTAPEEAPGPLPPPPTPSSEDPGFSSPLG
Sbjct: 252 QSPQLEIRVQGASSSAAPPTLNPAPQKSAAPGTAPEEAPGPLPPPPTPSSEDPGFSSPLG 311
Query: 315 MPDPHLYHQMGLLLKHMQDVRVLLGHLLMELRELSGHRKPGTTKATAE 362
MPDPHLYHQMGLLLKHMQDVRVLLGHLLMELRELSGHRKPGTTKATAE
Sbjct: 312 MPDPHLYHQMGLLLKHMQDVRVLLGHLLMELRELSGHRKPGTTKATAE 359
Methods
Ref. cell line Me275:
heavy SILAC label
Cell lines Me235,
Me246, Me280,
T149, T618 or T50:
light SILAC label
extracted proteins
mix 1:1
tryptic digestion &
peptide fractionation
by OFFGEL
Peptide fractions analysed by nanoLC-
MS/MS on LTQ-ORBITRAP XL
6 samples
Experiments Data processing
Peak extraction &
MS/MS spectra export:
MAXQUANT
Database search
of MS/MS spectra:
MASCOT
Identification validation
by Peptide & Protein Prophet:
SCAFFOLD
Validation of variants proteins:
BLAST
DBs: UniProt + SwissProt splice variants
+ cell line specific DB + contaminants
Generation of cell line specific DB
* * * *
* * * ** * *
** * *
*
Splice variant 1 Splice variant n
Splice variant 1 + all SNPs Splice variant n + all SNPs
Protein sequences generated from transcripts
using graph method
(nodes = splice donor & acceptor sites,
edges = introns & exons)
++
REMOVED:
- Subset sequences
- Sequences generating identical tryptic peptides
- Sequences corresponding to (or subset of) proteins in
UniProt and SwissProt splice variant DBs
Redundancy filtering
Genomic data source:
- Reference human genome hg18
- RNAseq 454-titanium
- Exome capture/DNA Illumina
# chromosome # gene # transcript
tr|NC_000016_2002_91
Results
• 72 variants identified in 7 melanoma cell lines:
~ 70 % SNPs and ~ 30% splice variants (figure 1).
• 32 SNPs already documented in dbSNP (NCBI)
(but not present in SwissProt splice variants DB).
• 18 new SNPs, 7 in proteins with known cancer association (table 1),
Ex: Fc receptor-like A protein (FCRLA), figure 2.
• 22 new splice variants, 2 in proteins with known cancer association (table 1),
Ex: Inosine-5'-monophosphate dehydrogenase 2 (IMDH2), figure 3.
38%
6%
15%
10%
28%
3%
38%
6%
15%
10%
28%
3%
Fig 1: Proportion of SNPs and splice variants in variant proteins found in melanoma cell lines.
Report of cancer association of proteins was based on UniProt annotations (www.uniprot.org).
Known SNPs (dbSNP)
Known SNPs of proteins with
known cancer association
New SNPs
New SNPs of proteins with
known cancer association
New splice variants
New splice variants of proteins
with known cancer association
Known SNPs (dbSNP)
Known SNPs of proteins with
known cancer association
New SNPs
New SNPs of proteins with
known cancer association
New splice variants
New splice variants of proteins
with known cancer association
Conclusions
• Use of cell line specific databases derived from individual genomic data for proteome analyses of cancer cells
→ recent progress in sequencing technologies allow the tailored proteomics analyses of individual variations
• SILAC labeling will allow quantitation of variant proteins between cell lines and eventually correlation with cell phenotype
Table 1: New variants of protein with some known cancer association (based on UniProt annotations)
Variant protein ID UniProt ID Variation Biological annotation (UniProt)
Me275 Me280 Me246 T149D T50B
tr|NC_000001_2993_0
X
ARHG2_HUMANSNP
Proliferating cell nucleolar antigen p40, may be involved in cell
cycle regulation, and cancer
tr|NC_000001_3252_0
X
FCRLA_HUMANSNP
Fc receptor-like A protein, expressed in melanoma and
melanocytes
tr|NC_000003_2776_0
X
TRFM_HUMANSNP
Melanotransferrin, found predominantly in human melanomas
tr|NC_000003_736_13
X
IMDH2_HUMANSplice var.
IMP dehydrogenase 2, may have a role in the development of
malignancy and the growth progression of some tumors
tr|NC_000009_275_0
X
CD2A1_HUMANSNP
Cyclin-dependent kinase inhibitor, defects in CDKN2A are the
cause of cutaneous malignant melanoma
tr|NC_000010_327_30
X X
VIME_HUMANSNP
Vimentin, expressed in many hormone-independent mammary
carcinoma cell line
tr|NC_000017_17_2
X
GLOD4_HUMANSNP
Glyoxalase domain-containing protein 4, expression is
decreased in hepatocellular carcinoma samples as compared
to adjacent non-cancerous liver tissues
tr|NC_000019_1846_0
X
BAX_HUMANSNP
Apoptosis regulator BAX, expressed in leukemia, lymphoma
and carcinoma cell lines
tr|NC_000020_601_16
X
PHLA1_HUMANSplice var.
Apoptosis-associated nuclear protein, progressively reduced
expressed in primary and metastatic melanomas
Peptides covering common sequence
Peptides covering specific sequence