Chapter 3 Locally dependent exponential random network models
To begin, we define a random network, developed by Fellows & Handcock (2012). By way of motivation, note that in the ERGM the nodal variates are fixed and are included in the model as explanatory variables in making inferences about network structure. Furthermore, there is a class of models that we do not discuss here that consider the network as a fixed explanatory variable in modeling (random) nodal attributes. It is not difficult to come up with situations where a researcher would like to jointly model both the network and the node attributes. Thus we define a class of networks in which both the network structure and the attributes of the individual nodes are modeled as random quantities.
Roughly, this is the proportion of the total sample space \(\mathcal{Z}\) that is possible with \(x\) fixed. This is not, in general, equal to one, so the ERNM is not equal to the ERGM (Fellows & Handcock, 2012).
3.1 Definitions and notation
We will consistently refer to a set of nodes, \(A_{k}\), as the \(k\)-th neighborhood, with an uppercase \(K\) representing the total number of neighborhoods and a lowercase \(k\) representing a specific neighborhood. The variable \(\mathcal{N}\) will refer to the domain of a random network, usually the union of a collection of neighborhoods. Nodes within the network will be indexed by the variables \(i\) and \(j\), with \(Z_{ij} = (\{Y_{ij}, X_i, X_j\})\), where \(Y_{ij}\) is referring to the edge between nodes \(i\) and \(j\), and \(X_{i}\) and \(X_j\) refer to the random vectors of node attributes. Abstracting this further, \(i\) and \(j\) will also refer to tuples of nodes, so we will write \(\vec{i} = (i_1, i_2, \dots, i_q) \in \mathcal{N} \times \mathcal{N} \times \dots \times \mathcal{N}\). The variables \(Z\) and \(Y\) will also often carry a subscript of \(W\) or \(B\) (for example \(Y_{Bij}\)) which emphasizes that the edge from \(i\) to \(j\) is within or between neighborhoods, respectively. Finally, for lack of a better notation, the indicator function \(\mathbb{I}_{B}(i,j)\) (where \(B\) is for between) is one if \(i \in A_{l}\) and \(j \in A_{p}\) where \(l \neq p\), and zero otherwise.
3.2 Preliminary theorems
In proving our theorems, we will make use of several other central limit theorems, all of which can be found in Billingsley (1995). The first is the Lindeberg-Feller central limit theorem for triangular arrays. The second is Lyapounov’s condition, which gives a convenient way to show that the Lindeberg-Feller theorem holds. Finally, we make use of a central limit theorem for dependent random variables. For the sake of brevity, in this section we state each of these without proof.
The final theorem is Slutsky’s theorem, a classic result of asymptotic theory in statistics.
3.3 Consistency under sampling
With these in place, we attempt to extend a result about locally dependent ERGMs proven by Schweinberger & Handcock (2015) to locally dependent ERNMs. In short, this theorem states that the parameters estimated by modeling a small sample of a larger network can be generalized to the overall network. It was shown by Shalizi & Rinaldo (2013) that most useful formulations of ERGMs do not form projective exponential families in the sense that the distribution of a subgraph cannot be, in general, recovered by marginalizing the distribution of a larger graph with respect to the edge variables not included in the smaller graph. Hence, we are unable to generalize parameter estimates from the subnetwork to the total network.
To show that locally dependent ERNMs do form a projective family, let \(\mathbb{A}\) be a collection of sets \(\mathcal{A}\), where each \(\mathcal{A}\) is a finite collection of neighborhoods. Also, allow the set \(\mathbb{A}\) to be an ideal, so that if \(\mathcal{A} \in \mathbb{A}\), every subset of \(\mathcal{A}\) is also in \(\mathbb{A}\) and if \(\mathcal{B} \in \mathbb{A}\), then \(\mathcal{A} \cup \mathcal{B} \in \mathbb{A}\). If \(\mathcal{A} \subset \mathcal{B}\), think of passing from the set \(\mathcal{A}\) to the set \(\mathcal{B}\) as taking a larger sample of the (possibly infinite) set of neighborhoods in the larger network. Then let \(\{\mathcal{P}_{\mathcal{A}, \theta}\}_{\mathcal{A} \in \mathbb{A}}\) be the collection of ERNMs with parameter \(\theta\) indexed by the sets in \(\mathbb{A}\). For each \(\mathcal{A} \in \mathbb{A}\), let \(\mathcal{P}_{\mathcal{A}, \Theta} = \{P_{\mathcal{A}, \theta}\}_{\theta \in \Theta}\) be the collection of ERNMs on the neighborhoods in \(\mathcal{A}\) with parameter \(\theta \in \Theta\) where \(\Theta \subset \mathbb{R}^{p}\) is open. Assume that each distribution in \(\mathcal{P}_{\mathcal{A}, \Theta}\) has the same support \(\mathcal{Z}_{\mathcal{A}}\) and that \(\mathcal{A} \subset \mathcal{B}\) if and only if \(\mathcal{Z}_{\mathcal{B}} = \mathcal{Z}_{\mathcal{A}} \times \mathcal{Z}_{\mathcal{B} \setminus \mathcal{A}}\). Then, the exponential family \(\{\mathcal{P}_{\mathcal{A}, \Theta}\}_{\mathcal{A} \in \mathbb{A}}\) is projective in the sense of Shalizi & Rinaldo (2013 Definition 1) precisely when Theorem 3.6 holds.
This follows from a specific case of the general definition given by Shalizi & Rinaldo (2013). There, for every pair \(\mathcal{A}\) and \(\mathcal{B}\) with \(\mathcal{A} \subset \mathcal{B}\), they define the natural projection mapping \(\pi_{\mathcal{B} \to \mathcal{A}}: \mathcal{Z_B} \to \mathcal{Z_A}\). Informally, this mapping projects the set \(\mathcal{Z_B}\) down to \(\mathcal{Z_A}\) by simply removing the extra data. For example if \(\mathcal{B} = \{A_1, A_2\}\) and \(\mathcal{A} = \{A_1\}\) as in Figure 3.1, then the mapping \(\pi_{\mathcal{B} \to \mathcal{A}}\) is shown in Figure 3.2.
This is desirable because Shalizi & Rinaldo (2013) have demonstrated the following theorem.
This is trivially achieved by setting \(r = 1\) for all values of \(\left| \mathcal{Z} \right|\) and setting \(a(\theta) = \log(C(\theta, \mathcal{Z}))\). We have differentiability of \(a\) with respect to \(\theta\) by a result from multivariable calculus that follows from Fubini’s theorem. From a practical perspective, this means that a researcher using this model can assume that parameters estimated from samples of a large network are increasingly good approximations for the true parameter values as the sample size increases.
3.4 Asymptotic normality of statistics
In this section we will prove that certain classes of statistics of locally dependent random networks are asymptotically normally distributed as the number of neighborhoods tends to infinity. The statistics we consider can be classified into three types: first, statistics which depend only on the graph structure; second, statistics that depend on both the graph and the nodal variates; and third, statistics that depend only on the nodal variates. The first class of statistics has already been considered by Schweinberger & Handcock (2015), but we will reproduce the proof here, as the second proof is very similar. The third class of statistics becomes normal in the limit by a central limit theorem for \(M\) dependent random variables in Billingsley (1995).
Before we begin to explicitly define each of these classes, we clarify the notation that will be used. A general statistic will be a function \[\begin{equation*} S:\mathcal{N}^d \to \mathbb{R}, \end{equation*}\] where \(\mathcal{N}^d\) is the \(d\)-fold Cartesian product of the set of nodes, \(\mathcal{N}\), with itself: \[\begin{equation*} \mathcal{N}^d = \underbrace{\mathcal{N} \times \dots \times \mathcal{N}}_{d \text{ times}}. \end{equation*}\]Additionally, the statistic will often carry a subscript \(K\), indicating that the statistic is of the random network with \(K\) neighborhoods.
Formally, as explained in Schweinberger & Handcock (2015), the first class of statistics contains those that have the form \[\begin{equation*} S_{K} = \sum_{i \in \mathcal{N}^d} S_{Ki}, \end{equation*}\] where \[\begin{equation*} S_{Ki} = \prod_{l, p \in i} Y_{lp}, \end{equation*}\]a product \(q\) of edge variables that captures the interaction desired. We will also make use of the set \(A_k^d,\) wich is a similar cartesian product. When we write \(i \in A_k^d\), we mean the every component of the \(d\)-tuple \(i\) is an element of \(A_k\). Furthermore, by a catachrestic abuse of notation, we will write \(l, p \in i\) to mean that \(l\) and \(p\) are vertices contained in the \(d\)-tuple \(i\). Now we are ready to prove the first case of the theorem.
Proof. As the networks \(Z_K\) are unweighted, all edge variables \(Y_{ij}\in \{0, 1\}\). Let \(\mu_{ij} = \mathbb{E}(Y_{ij})\). Then define \(V_{ij} = Y_{ij} - \mu_{ij}\). Therefore, without loss of generality, we may work with \(V_{ij}\), which has the convenient property that \(\mathbb{E}(V_{ij}) = 0\). This means that we can similarly shift our statistics of interest, \(S_K\). Therefore, call \(S_K^{\ast} = S_{K} - \mathbb{E}(S_K)\), so that \(\mathbb{E}(S_K^{\ast}) = 0\).
Note that we can write \[\begin{equation*} S^{\ast}_{K} = W^{\ast}_{K} + B^{\ast}_{K}, \end{equation*}\] with \[\begin{equation} W^{\ast}_{K} = \sum_{k = 1}^{K} W^{\ast}_{K,k} = \sum_{k = 1}^{K} \sum_{i \in \mathcal{N}_{K}^d} \mathbb{I}(i\in A_{k}^d) S^{\ast}_{Ki} \tag{3.3} \end{equation}\] and \[\begin{equation} B^{\ast}_{K} = \sum_{i \in \mathcal{N}_{K}^d} \mathbb{I}_{B}(i) S^{\ast}_{Ki}, \end{equation}\]where the indicator functions restrict the sums to within the \(k\)-th neighborhood and between neighborhoods of the graph, respectively. Specifically, \(\mathbb{I}_{B}(i) = 1\) when the \(d\)-tuple of nodes \(i\) contains nodes from different neighborhoods, or exactly when \(\mathbb{I}(i \ in A_{k}^d) = 0\) for all neighborhoods \(k\). By splitting the statistic into the within and between neighborhood portions, we are able to make use of the independence relation between edges that connect neighborhoods. We also have \(\mathbb{E}(W^{\ast}_{K}) = 0\) and \(\mathbb{E}(B^{\ast}_{K}) = 0\), as each quantity is a sum of random variables with mean zero.
The idea of this proof is to gain control over the variances of \(B^{\ast}_{K}\) and all the elements of the sequence \(W^{\ast}_{Kk}\). We can then show that \(B^{\ast}_{K}\) is converging in probability to zero and that the triangular array \(W^{\ast}_{K}\) satisifies Lyaponouv’s condition, and is thus asymptotically normal. Finally, Slutsky’s theorem allows us to extend the result to \(S^{\ast}_{K}\).
To bound the variance of \(B^{\ast}_{K}\), note that \[\begin{equation*} \operatorname{Var}(B^{\ast}_{K}) = \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}_{B}(i) \mathbb{I}_{B}(j) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}). \end{equation*}\] Despite independence, some of these covariances may be nonzero if the two terms of the statistic both involve the same edge. For example, in Figure , a statistic that counted the number of edges between gray nodes plus the number of edges between diamond shaped nodes would have a nonzero covariance term because of the edge between the two nodes that are both gray and diamond shaped. To show that, in the limit, these covariances vanish, we need only concern ourselves with the nonzero terms in the sum. That is, only those terms where \(\mathbb{I}_{B}(i) \mathbb{I}_{B}(j) = 1\). This happens exactly when both \(S^{\ast}_{Ki}\) and \(S^{\ast}_{Kj}\) involve a between neighborhood edge variable. So, note that we have \[\begin{align} \begin{split} \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) &= \mathbb{E}(S^{\ast}_{Ki} S^{\ast}_{Kj}) - \mathbb{E}(S^{\ast}_{Ki}) \mathbb{E}(S^{\ast}_{Kj}) \\ &= \mathbb{E}(S^{\ast}_{Ki}S^{\ast}_{Kj}), \end{split} \end{align}\] as the expectation of each term is zero. Next we take \(Y_{l_1 l_2}\) to be one of the (possibly many) between neighborhood edge variables in this product (such that \(\mathbb{I}_B(i) = 1\) where \(i\) is any tuple containing \(l_1\) and \(l_2\)) and \(V_{l_1 l_2}\) to be the recentered random variable corresponding to \(Y_{l_1 l_2}\). Then, \[\begin{align*} \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) &= \mathbb{E}\left( \prod_{m,n \in i} V_{mn} \prod_{m,n \in j} V_{mn} \right) \\ &= \mathbb{E}\left( V_{l_1 l_2}^p \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} \right), & (p = 1,2) \end{align*}\] where we must consider the case where \(p = 1\) to account for the covariance of \(S^{\ast}_{Ki}\) and \(S^{\ast}_{Kj}\) when \(i \neq j\) and the case where \(p = 2\) to account for the variance of \(S^{\ast}_{Ki}\), which is computed in the case where \(i = j\). So, if \(p = 1\), then \[\begin{align*} \mathbb{E}\left( V_{l_1 l_2} \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} \right) &= \mathbb{E}\left( V_{l_1 l_2}\right) \mathbb{E}\left( \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} \right) \\ &= 0, \end{align*}\] by the local dependence property and the assumption that \(\mathbb{E}(V_{l_1 l_2}) = 0\). The local dependence property allows us to factor out the expectation of \(V_{l_1 l_2}\), as this edge is between neighborhoods, and therefore independent of every other edge in the graph. Now, if we have \(p = 2\), then, by sparsity and the fact that the product below is at most \(1\), \[\begin{equation*} \mathbb{E}\left( V_{l_1 l_2}^2\right) \mathbb{E}\left( \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} \right) \leq DCn^{-\delta}, \end{equation*}\] where D is a constant that bounds the expectation above. There exists such a constant because each of the V_{mn} are bounded by definiton, so a product of them is bounded. So as \(K\) grows large, the between neighborhod covariances all become asymptotically negligible. Therefore, we can conclude that \[\begin{equation} \operatorname{Var}(B^{\ast}_{K}) = \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}_{B}(i) \mathbb{I}_{B}(j) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \leq DCn^{2d(-\delta)}. \end{equation}\] So we have \[\begin{equation} \operatorname{Var}{B^{\ast}_{K}} \to 0. \end{equation}\] Then, for all \(\epsilon > 0\), Chebyshev’s inequality gives us \[\begin{equation} \lim_{K \to \infty} P(|B^{\ast}_{K}| > \epsilon) \leq \lim_{K \to \infty} \frac{1}{\epsilon^2} \operatorname{Var}(B^{\ast}_{K}) = 0, \end{equation}\] so \[\begin{equation} B^{\ast}_{K} \xrightarrow[K \to \infty]{\text{p}} 0. \end{equation}\] Next, we bound the within neighborhood covariances, as we also have \[\begin{equation*} \operatorname{Var}(W^{\ast}_{Kk}) = \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i, j \in A_{k}) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}). \end{equation*}\] As the covariance forms an inner product on the space of square integrable random variables, the Cauchy-Schwarz inequality gives us \[\begin{equation} \mathbb{I}(i, j \in A_{k}) \left| \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \right| \leq \mathbb{I}(i, j \in A_{k}) \sqrt{\operatorname{Var}(S^{\ast}_{Ki})} \sqrt{\operatorname{Var}(S^{\ast}_{Kj})}. \end{equation}\] Then, as each \(S^{\ast}_{Ki}\) has expectation zero, we know that \[\begin{equation} \operatorname{Var}(S^{\ast}_{Ki}) = \mathbb{E}(S^{\ast 2}_{Ki}) - \mathbb{E}(S^{\ast}_{Ki})^2 = \mathbb{E}(S^{\ast 2}_{Ki}). \end{equation}\] As \(S^{\ast 2}_{Ki} \leq 1\), we know \(\operatorname{Var}(S^{\ast}_{Ki}) \leq 1\) for all tuples \(i\), so we have the bound \[\begin{equation} \mathbb{I}(i, j \in A_{k}) \left| \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \right| \leq \mathbb{I}(i, j \in A_{k}). \end{equation}\] Now all that remains is to apply the Lindeberg-Feller central limit theorem to the double sequence \(W^{\ast}_{K} = {\sum_{k = 1}^{K} W^{\ast}_{K, k}}\). To that end, first note that, as each neighborhood contains at most a finite number of nodes, \(M\), we can show that \[\begin{align} \begin{split} \operatorname{Var}(W^{\ast}_{K,k}) &= \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i,j \in A_k^d) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \\ &= \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i,j \in A_k^d) \mathbb{E}(S^{\ast}_{Ki} S^{\ast}_{Kj}) \\ &\leq \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} 1 \\ &\leq M^{2d}. \end{split} \end{align}\] Now we prove that Lyaponouv’s condition (3.2) holds for the constant in the exponent \(\delta = 2\). So \[\begin{align} \begin{split} \lim_{K \to \infty} \sum_{k = 1}^{K} \frac{1}{\operatorname{Var}(W^{\ast}_{K})^2} \mathbb{E}(|W^{\ast}_{K,k}|^{4}) &= \lim_{K \to \infty} \frac{1}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \mathbb{E}(W^{\ast 2}_{K,k}) \mathbb{E}(W^{\ast 2}_{K,k}) \\ &\leq \lim_{K \to \infty} \frac{M^{2d}}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \mathbb{E}(W^{\ast}_{K,k})^2 \\ &= \lim_{K \to \infty} \frac{M^{2d}}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \operatorname{Var}(W^{\ast}_{K,k}) \\ &= \lim_{K \to \infty} \frac{M^{2d}}{\operatorname{Var}(W^{\ast}_{K})} \\ &= 0. \end{split} \tag{3.2} \end{align}\] where \(\operatorname{Var}(W^{\ast}_{K})\) tends to infinity by assumption. Therefore, Lyaponouv’s condition holds, and so by the Lindeberg-Feller central limit theorem, we have, \[\begin{equation} \frac{W^{\ast}_K}{\sqrt{\operatorname{Var}(W^{\ast}_K)}} \xrightarrow[K \to \infty]{\text{d}} N(0, 1). \end{equation}\] Slutsky’s theorem (3.4) gives the final result for \(S^{\ast}_{K} = W^{\ast}_{K} + B^{\ast}_{K}\). Then we have \[\begin{equation} \frac{S^{\ast}_K}{\sqrt{\operatorname{Var}(S^{\ast}_K)}} \xrightarrow[K \to \infty]{\text{d}} N(0, 1), \end{equation}\] as desired.a product with at most \(q\) terms.
exactly as before, incorporating the function \(h\) into each \(S_{Ki}\) as we did above. Then the binary nature of the graph and the uniform boundedness of \(h\) allow us to once again recenter the \(Y_{ij}\), meaning that we will work with \(V_{ij}h(X_i, X_j) = Y_{ij}h(X_i, X_j) - \mu_{ij}\). We also have \(\mathbb{E}(V_{ij} h(X_i, X_j)) = 0\), so \(\mathbb{E}(S^{\ast}_{Ki}) = 0\) as well.
For the between neighborhood covariances, we once again choose \(V_{l_1 l_2}\), a between neighborhood network variable. Then we once again write \[\begin{align*} \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) &= \mathbb{E}\left( \prod_{m,n \in i} V_{mn} h(X_m, X_n) \prod_{m,n \in j} V_{mn} h(X_m, X_n) \right) \\ &= \mathbb{E}\left( (V_{l_1 l_2} h(X_{l_1}, X_{l_2}))^p \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} h(X_m, X_n) \right), & (p = 1,2) \\ &= \mathbb{E}\left( (V_{l_1 l_2} h(X_{l_1}, X_{l_2}))^p \right) \mathbb{E} \left( \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} h(X_m, X_n) \right), \end{align*}\] by the local dependence property. Then, when \(p = 1\), we have \(\mathbb{E} ( V_{l_1 l_2} h(X_{l_1}, X_{l_2}) ) = 0\) by assumption, so the covariance is identically zero. When \(p = 2\) we have \[\begin{equation*} \mathbb{E}\left( (V_{l_1 l_2} h(X_{l_1}, X_{l_2}))^2 \right) \leq Cn^{-\delta} \end{equation*}\] by sparsity and \[\begin{equation*} \mathbb{E} \left( \prod_{m,n \in (i \cup j) \setminus \{l_1, l_2\}} V_{mn} h(X_m, X_n) \right) \leq (DB)^{2q - 2} \end{equation*}\] almost surely by uniform boundedness and the fact this product has at most \(2q - 2\) terms. This follows from the fact that \(h\) is bounded by \(B\) and that \(V_{mn}\) is bounded by some constant \(D\), by defintion. So \[\begin{equation*} \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \leq B^{2q - 2}Cn^{-\delta}, \end{equation*}\] which tends to zero as \(K\) grows large. So, again by Chebyshev’s inequality, we have \[\begin{equation*} B^{\ast}_{K} \xrightarrow[K \to \infty]{\mathrm{p}} 0. \end{equation*}\] Next we bound the within neighborhood covariances. Now with each \(|S^{\ast}_{Ki}| \leq B^{q}\), we have \[\begin{equation} \mathbb{I}(i, j \in A_{k}) \left| \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \right| \leq \mathbb{I}(i, j \in A_{k}) B^{2q}. \end{equation}\] Now, we show that Lyaponouv’s condition (3.2) holds for the same \(\delta = 2\). Once again note that each neighborhood has at most \(M\) nodes, so \[\begin{align*} \operatorname{Var}(W^{\ast}_{K,k}) &= \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i,j \in A_{k}) \operatorname{Cov}(S^{\ast}_{Ki}, S^{\ast}_{Kj}) \\ &\leq \sum_{i \in \mathcal{N}_{K}^d} \sum_{j \in \mathcal{N}_{K}^d} \mathbb{I}(i,j \in A_{k}) B^{2q} \\ &\leq M^{2d}B^{2q}. \end{align*}\] Then Lyaponouv’s condition is \[\begin{align} \begin{split} \lim_{K \to \infty} \sum_{k = 1}^{K} \frac{1}{\operatorname{Var}(W^{\ast}_{K})^2} \mathbb{E}(|W^{\ast}_{K,k}|^{4}) &= \lim_{K \to \infty} \frac{1}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \mathbb{E}(W^{\ast 2}_{K,k}) \mathbb{E}(W^{\ast 2}_{K,k}) \\ &\leq \lim_{K \to \infty} \frac{M^{2d}B^{2q}}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \mathbb{E}(W^{\ast 2}_{K,k}) \\ &= \lim_{K \to \infty} \frac{M^{2d}B^{2q}}{\operatorname{Var}(W^{\ast}_{K})^2} \sum_{k = 1}^{K} \operatorname{Var}(W^{\ast}_{K,k}) \\ &= \lim_{K \to \infty} \frac{M^{2d}B^{2q}}{\operatorname{Var}(W^{\ast}_{K})} \label{eq:Lya} \\ &= 0. \end{split} \end{align}\] Therefore, by the Lindeberg-Feller central limit theorem and Slutsky’s theorem, we have \[\begin{equation*} \frac{S_{K}^{\ast}}{\sqrt{\operatorname{Var}(S_{K}^{\ast})}} \xrightarrow[K \to \infty]{\mathrm{d}} N(0,1).\qedhere \end{equation*}\]Finally, the last class of statistic is that which depends only on the nodal variates. This result follows directly from a central limit theorem for \(M\)-dependent random variables, which can be found in Billingsley (1995, p. 364). Establishing this theorem requires that we assume that the statistic in question depends only on a single variable across nodes. Therefore we assume that the statistic depends only on a single nodal covariate.
In practice, the hypothesis that the twelfth moment exists is satisfied for most reasonable distributional assumptions about nodal covariates. Furthermore, the assumption that all nodal variates have expectation zero can easily be satisfied by recentering the observed data. Finally, the delta method gives us an asymptotically normal distribution for a differentiable statistic of the nodal variate. The univariate nature of the statistic is a fundamental limitation of this approach, however I am unable to find an analogous multidimensional central limit theorem that would allow us to establish the asymptotic normality of a statistic of multiple nodal variates.
References
Fellows, I., & Handcock, M. S. (2012). Exponential-family random network models. ArXiv Preprint ArXiv:1208.0121.
Schweinberger, M., & Handcock, M. S. (2015). Local dependence in random graph models: Characterization, properties and statistical inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(3), 647–676.
Billingsley, P. (1995). Probability and measure (3. ed., authorized reprint). New Delhi: Wiley India.
Shalizi, C. R., & Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The Annals of Statistics, 41(2), 508–535. http://doi.org/10.1214/12-AOS1044