In this talk, we will discuss the following questions―why is neural network training non-smooth from an optimization perspective, and how should we analyze convergence in nonsmooth settings? We will start by showing that the non-smoothness is essential to standard neural network training procedures, and that network training converges in an unstable manner. We then provide theoretical models for understanding why optimization in neural network is unstable. We conclude by showing that new definitions of convergence in the nonsmooth settings can reconcile theory with practice.