14.6 Topic Model: Latent Dirichlet Allocation ...........................285
15 Social Network Analysis 303
15.1 Introduction ..........................................303
15.2 Co-occurrence Network ....................................303
15.3 Appendix: matrix multiplication in PySpark .........................307
15.4 Correlation Network ......................................310
16 ALS: Stock Portfolio Recommendations 311
16.1 Recommender systems .....................................312
16.2 Alternating Least Squares ...................................313
16.3 Demo ..............................................313
17 Monte Carlo Simulation 321
17.1 Simulating Casino Win .....................................321
17.2 Simulating a Random Walk ..................................323
18 Markov Chain Monte Carlo 333
18.1 Metropolis algorithm ......................................333
18.2 A Toy Example of Metropolis .................................334
18.3 Demos .............................................335
19 Neural Network 343
19.1 Feedforward Neural Network .................................343
20 Automation for Cloudera Distribution Hadoop 347
20.1 Automation Pipeline ......................................347
20.2 Data Clean and Manipulation Automation ...........................347
20.3 ML Pipeline Automation ....................................350
20.4 Save and Load PipelineModel .................................350
20.5 Ingest Results Back into Hadoop ...............................351
21 Wrap PySpark Package 353
21.1 Package Wrapper ........................................353
21.2 Pacakge Publishing on PyPI ..................................355
22 PySpark Data Audit Library 357
22.1 Install with pip ........................................357
22.2 Install from Repo ........................................357
22.3 Uninstall ............................................357
22.4 Test ...............................................358
22.5 Auditing on Big Dataset ....................................359
23 Zeppelin to jupyter notebook 369
23.1 How to Install .........................................369
23.2 Converting Demos .......................................370
24 My Cheat Sheet 375
iii