We discuss two issues: Convergence and Mixing. These two things are strongly related, but not completely the same. The chain have mixing will typically have a slow convergences. But you still can tell the dierence if you study the dierent points.
Remember that when we discuss the MCMC algorithm, its the algorithm that classically explores the spaces on the posterior distributions, and that tends to explore more if you have more posterior probability.
If you think about an algorithm it terms of the sort of random exploration of the spaces to focus on the algorithm have the higher probability it seems to think about how these two things are dierent.
1.1 Dierence between Convergence and Mixing
1. Convergence
How fast you move from your starting point ( (0) ) towards a region of high probability.
It is a transiting state of the chain : form the beginning (random starting place) to the place we want.
2. Mixing
How fast we explore the high probability regions once you have "forgotten" for initial state.
There is an example of the dierence between Convergence and Mixing. (See Figure 1)
The reason people discuss them separately : Converge could be very quick if you have a good starting point. But the mixing is a something you can not deal with easily.
1.2 Convergence
1.2.1 The Gold Standard
The optimal gold standard would be to explore the "distance" between 0 (initial state) and n (state after n iterations) Hopefully this distance decrecses with n and we can pick the number of observations to burn by bounding that