{"title": "Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG", "book": "Advances in Neural Information Processing Systems", "page_first": 1777, "page_last": 1784, "abstract": "The synchronous brain activity measured via MEG (or EEG) can be interpreted as arising from a collection (possibly large) of current dipoles or sources located throughout the cortex. Estimating the number, location, and orientation of these sources remains a challenging task, one that is significantly compounded by the effects of source correlations and the presence of interference from spontaneous brain activity, sensor noise, and other artifacts. This paper derives an empirical Bayesian method for addressing each of these issues in a principled fashion. The resulting algorithm guarantees descent of a cost function uniquely designed to handle unknown orientations and arbitrary correlations. Robust interference suppression is also easily incorporated. In a restricted setting, the proposed method is shown to have theoretically zero bias estimating both the location and orientation of multi-component dipoles even in the presence of correlations, unlike a variety of existing Bayesian localization methods or common signal processing techniques such as beamforming and sLORETA. Empirical results on both simulated and real data sets verify the efficacy of this approach.", "full_text": "Estimating the Location and Orientation of Complex,\n\nCorrelated Neural Activity using MEG\n\nD.P. Wipf, J.P. Owen, H.T. Attias, K. Sekihara, and S.S. Nagarajan\n\nBiomagnetic Imaging Laboratory\n\nUniversity of California, San Francisco\n\nAbstract\n\nThe synchronous brain activity measured via MEG (or EEG) can be interpreted\nas arising from a collection (possibly large) of current dipoles or sources located\nthroughout the cortex. Estimating the number, location, and orientation of these\nsources remains a challenging task, one that is signi(cid:2)cantly compounded by the\neffects of source correlations and the presence of interference from spontaneous\nbrain activity, sensor noise, and other artifacts. This paper derives an empirical\nBayesian method for addressing each of these issues in a principled fashion. The\nresulting algorithm guarantees descent of a cost function uniquely designed to\nhandle unknown orientations and arbitrary correlations. Robust interference sup-\npression is also easily incorporated. In a restricted setting, the proposed method\nis shown to have theoretically zero bias estimating both the location and orien-\ntation of multi-component dipoles even in the presence of correlations, unlike a\nvariety of existing Bayesian localization methods or common signal processing\ntechniques such as beamforming and sLORETA. Empirical results on both simu-\nlated and real data sets verify the ef(cid:2)cacy of this approach.\n\n1 Introduction\nMagnetoencephalography (MEG) and related electroencephalography (EEG) use an array of sen-\nsors to take electromagnetic (cid:2)eld (or voltage potential) measurements from on or near the scalp\nsurface with excellent temporal resolution. In both cases, the observed (cid:2)eld is generated by the\nsame synchronous, compact current sources located within the brain. Although useful for research\nand clinical purposes, accurately determining the spatial distribution of these unknown sources is\nan open problem. The relevant estimation problem can be posed as follows: The measured electro-\nmagnetic signal is B 2 Rdb(cid:2)dt, where db equals the number of sensors and dt is the number of time\npoints at which measurements are made. Each unknown source Si 2 Rdc(cid:2)dt is a dc-dimensional\nneural current dipole , at dt timepoints, projecting from the i-th (discretized) voxel or candidate lo-\ncation distributed throughout the cortex. These candidate locations can be obtained by segmenting a\nstructural MR scan of a human subject and tesselating the gray matter surface with a set of vertices.\nB and each Si are related by the likelihood model\n\nB =\n\nLiSi + E;\n\n(1)\n\ndsXi=1\n\nwhere ds is the number of voxels under consideration, Li 2 Rdb(cid:2)dc is the so-called lead-(cid:2)eld\nmatrix for the i-th voxel. The k-th column of Li represents the signal vector that would be observed\nat the scalp given a unit current source/dipole at the i-th vertex with a (cid:2)xed orientation in the k-th\ndirection. It is common to assume dc = 2 (for MEG) or dc = 3 (for EEG), which allows (cid:3)exible\nsource orientations to be estimated in 2D or 3D space. Multiple methods based on the physical\nproperties of the brain and Maxwell\u2019s equations are available for the computation of each Li [7].\nFinally, E is a noise-plus-interference term where we assume, for simplicity, that columns are drawn\nindependently from N (0; (cid:6)(cid:15)). However, temporal correlations can easily be incorporated if desired\nusing a simple transformation outlined in [3].\n\n\fTo obtain reasonable spatial resolution, the number of candidate source locations will necessarily\nbe much larger than the number of sensors (ds (cid:29) db). The salient inverse problem then becomes\nthe ill-posed estimation of regions with signi(cid:2)cant brain activity, which are re(cid:3)ected by voxels i\nsuch that kSik > 0; we refer to these as active dipoles or sources. Because the inverse model is\nseverely underdetermined (the mapping from source activity con(cid:2)guration S , [S1; : : : ; Sds ]T to\nsensor measurement B is many to one), all efforts at source reconstruction are heavily dependent\non prior assumptions, which in a Bayesian framework are embedded in the distribution p(S). Such\na prior is often considered to be (cid:2)xed and known, as in the case of minimum current estimation\n(MCE) [10], minimum variance adaptive beamforming (MVAB) [9], and sLORETA [5]. Alterna-\ntively, a number of empirical Bayesian approaches have been proposed that attempt a form of model\nselection by using the data, whether implicitly or explicitly, to guide the search for an appropriate\nprior. Examples include variational Bayesian methods and hierarchical covariance component mod-\nels [3, 6, 8, 12, 13]. While advantageous in many respects, all of these methods retain substantial\nweaknesses estimating complex, correlated source con(cid:2)gurations with unknown orientation in the\npresence of background interference (e.g., spontaneous brain activity, sensor noise, etc.).\nThere are two types of correlations that can potentially disrupt the source localization process. First,\nthere are correlations within dipole components (meaning the individual rows of Si are correlated),\nwhich always exists to a high degree in real data with unknown orientation (i.e., dc > 1). Secondly,\nthere are correlations between different dipoles that are simultaneously active (meaning rows of Si\nare correlated with rows of Sj for some voxels i 6= j). These correlations are more application spe-\nci(cid:2)c and may or may not exist. The larger the number of active sources, the greater the chance that\nboth types or correlation can disrupt the estimation process. This issue can be problematic for two\nreasons. First, failure to accurately account for unknown orientations or correlations can severely\ndisrupt the localization process, leading to a very misleading impression of which brain areas are\nactive. Secondly, the orientations and correlations themselves may have clinical signi(cid:2)cance.\nIn this paper, we present an alternative empirical Bayesian scheme that attempts to improve upon\nexisting methods in terms of source reconstruction accuracy and/or computational robustness and\nef(cid:2)ciency. Section 2 presents the basic generative model which underlies the proposed method and\ndescribes the associated inference problem. Section 3 derives a robust algorithm for estimating the\nsources using this model and proves that each iteration is guaranteed to reduce the associated cost\nfunction. It also describes how interference suppression can be naturally incorporated. Section 4\nthen provides a theoretical analysis of the bias involved in estimating both the location and orien-\ntation of active sources, demonstrating that the proposed method has substantial advantages over\nexisting approaches. Finally, Section 5 contains experimental results using our algorithm on both\nsimulated and real data, followed by a brief discussion in Section 6.\n\n2 Modeling Assumptions\n\nTo begin we invoke the noise model from (1), which fully de(cid:2)nes the assumed likelihood\n\np(BjS) / exp0@(cid:0)\n\n1\n\n2(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\n\nB (cid:0)\n\ndsXi=1\n\n2\n\n(cid:6)(cid:0)1\n\n(cid:15)\n\n1A ;\n\nLiSi(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\n\n(2)\n\nwhere kXkW denotes the weighted matrix normptrace[X T W X]. The unknown noise covariance\n\n(cid:6)(cid:15) will be estimated from the data using a variational Bayesian factor analysis (VBFA) model as\ndiscussed in Section 3.2 below; for now we will consider that it is (cid:2)xed and known. Next we adopt\nthe following source prior for S:\n\np (Sj(cid:0)) / exp (cid:0)\n\n1\n2\n\ntrace\" dsXi=1\n\ni Si#! :\n\ni (cid:0)(cid:0)1\nST\n\n(3)\n\nThis is equivalent to applying independently, at each time point, a zero-mean Gaussian distribution\nwith covariance (cid:0)i to each source Si. We de(cid:2)ne (cid:0) to be the dsdc (cid:2) dsdc block-diagonal matrix\nformed by ordering each (cid:0)i along the diagonal of an otherwise zero-valued matrix. This implies,\n\nequivalently, that p (Sj(cid:0)) / exp(cid:0)(cid:0) 1\n\n2trace(cid:2)ST (cid:0)(cid:0)1S(cid:3)(cid:1).\n\n\fIf (cid:0) were somehow known, then the conditional distribution p(SjB; (cid:0)) / p(BjS)p(Sj(cid:0)) is a fully\nspeci(cid:2)ed Gaussian distribution with mean and covariance given by\n\nEp(SjB;(cid:0)) [S] = (cid:0)LT(cid:0)(cid:6)(cid:15) + L(cid:0)LT(cid:1)(cid:0)1\nCovp(sj jB;(cid:0)) [sj] = (cid:0) (cid:0) (cid:0)LT(cid:0)(cid:6)(cid:15) + L(cid:0)LT(cid:1)(cid:0)1\n\n(4)\n(5)\nwhere sj denotes the j-th column of S and individual columns are uncorrelated. However, since (cid:0)\nis actually not known, a suitable approximation ^(cid:0) (cid:25) (cid:0) must (cid:2)rst be found. One principled way to\naccomplish this is to integrate out the sources S and then maximize\n\nL(cid:0); 8j;\n\nB\n\np(Bj(cid:0)) =Z p(BjS)p(Sj(cid:0))dS / exp(cid:18)(cid:0)\n\n1\n2\n\nThis is equivalent to minimizing the cost function\n\nBT (cid:6)(cid:0)1\n\n(6)\n\nb B(cid:19) ; (cid:6)b , (cid:6)(cid:15) + L(cid:0)LT :\nb (cid:3) + log j(cid:6)b; j ;\n\nL((cid:0)) , (cid:0)2 log p(Bj(cid:0))p((cid:0)) (cid:17) trace(cid:2)Cb(cid:6)(cid:0)1\n\n(7)\nwhere Cb , n(cid:0)1BBT is the empirical covariance, and is sometimes referred to as type-II maximum\nlikelihood, evidence maximization, or empirical Bayes [1].\nThe (cid:2)rst term of (7) is a measure of the dissimilarity between the empirical data covariance Cb and\nthe model data covariance (cid:6)b; in general, this factor encourages (cid:0) to be large. The second term pro-\nvides a regularizing or sparsifying effect, penalizing a measure of the volume formed by the model\ncovariance (cid:6)b.1 Since the volume of any high dimensional space is more effectively reduced by\ncollapsing individual dimensions as close to zero as possible (as opposed to incrementally reducing\nall dimensions isometrically), this penalty term promotes a model covariance that is maximally de-\ngenerate (or non-spherical), which pushes elements of (cid:0) to exactly zero. This intuition is supported\ntheoretically by the results in Section 4.\nGiven some type-II ML estimate ^(cid:0), we obtain the attendant empirical prior p(Sj^(cid:0)). To the extent\nthat this \u2018learned\u2019 prior is realistic, the resulting posterior p(SjB; ^(cid:0)) quanti(cid:2)es regions of signi(cid:2)cant\ncurrent density and point estimates for the unknown source dipoles Si can be obtained by evaluating\nthe posterior mean computed using (4). If a given ^(cid:0)i ! 0 as described above, then the associated\n^Si computed using (4) also becomes zero. It is this pruning mechanism that naturally chooses the\nnumber of active dipoles.\n\n3 Algorithm Derivation\nGiven (cid:6)(cid:15) and (cid:0), computing the posterior on S is trivial. Consequently, determining these unknown\nquantities is the primary estimation task. We will (cid:2)rst derive an algorithm for computing (cid:0) assuming\n(cid:6)(cid:15) is known. Later in Section 3.2, we will describe a powerful procedure for learning (cid:6)(cid:15).\n\n3.1 Learning the Hyperparameters (cid:0)\nThe primary objective of this section is to minimize (7) with respect to (cid:0). Of course one option is\nto treat the problem as a general nonlinear optimization task and perform gradient descent or some\nother generic procedure. Related methods in the MEG literature rely, either directly or indirectly, on\na form of the EM algorithm [3, 8]. However, these algorithms are exceedingly slow when ds is large\nand they have not been extended to handle (cid:3)exible orientations. Consequently, here we derive an\nalternative optimization procedures that expands upon ideas from [8, 12], handles arbitrary/unknown\ndipole orientations, and converges quickly.\nTo begin, we note that L((cid:0)) only depends on the data B through the db(cid:2)db sample correlation matrix\n\nlarge, without altering that actual cost function. It also implies that, for purposes of computing (cid:0),\nthe number of columns of S is reduced to match rank(B). We now re-express the cost function\nL((cid:0)) in an alternative form leading to convenient update rules and, by construction, a proof that\n\nCb. Therefore, to reduce the computational burden, we replace B with a matrix eB 2 Rdb(cid:2)rank(B)\nsuch that eBeBT = Cb. This removes any per-iteration dependency on dt, which can potentially be\nL(cid:0)(cid:0)(k+1)(cid:1) (cid:20) L(cid:0)(cid:0)(k)(cid:1) at each iteration.\n\n1The determinant of a matrix is equal to the product of its eigenvalues, a well-known volumetric measure.\n\n\fFirst, the data (cid:2)t term can be expressed as\n\n(9)\n\ntrace(cid:2)Cb(cid:6)(cid:0)1\n\nb (cid:3) = min\nds(cid:3)T is a matrix of auxiliary variables. Likewise, because the log-\n\ndeterminant term of L((cid:0)) is concave in (cid:0), it can be expressed as a minimum over upper-bounding\nhyperplanes via\n\nwhere X , (cid:2)X T\n\n1 ; : : : ; X T\n\nkXik2\n\n(8)\n\n(cid:0)(cid:0)1\n\n(cid:15)\n\nX 24(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)eB (cid:0)\nZ \" dsXi=1\n\n2\n\n+\n\n(cid:6)(cid:0)1\n\ndsXi=1\n\ndsXi=1\n\nLiXi(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\ni (cid:0)i(cid:1) (cid:0) h(cid:3)(Z)# ;\ntrace(cid:0)Z T\n\ni 35 ;\n\nlog j(cid:6)bj = min\n\nwhere Z , (cid:2)Z T\n\n1 ; : : : ; Z T\n\nL((cid:0); X; Z) =(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)eB (cid:0)\n\nbelow, we will never actually have to compute h(cid:3)(Z). Dropping the minimizations and combining\nterms from (8) and (9) leads to the modi(cid:2)ed cost function\n\nds(cid:3)T and h(cid:3)(Z) is the concave conjugate of log j(cid:6)bj. For our purposes\ndsXi=1eLiXi(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\n\ni (cid:0)i(cid:1)i (cid:0) h(cid:3)(Z);\n\n+ trace(cid:0)Z T\n\ndsXi=1hkXik2\n\n(10)\n\n(cid:6)(cid:0)1\n\n(cid:0)(cid:0)1\n\n+\n\n2\n\n(cid:15)\n\ni\n\nwhere by construction L((cid:0)) = minX minZ L((cid:0); X; Z).\nIt is straightforward to show that if\nf^(cid:0); ^X; ^Zg is a local (global) minimum to L((cid:0); X; Z), then ^(cid:0) is a local (global) minimum to L((cid:0)).\nSince direct optimization of L((cid:0)) may be dif(cid:2)cult, we can instead iteratively optimize L((cid:0); X; Z)\nvia coordinate descent over (cid:0), X, and Z. In each case, when two are held (cid:2)xed, the third can be\nglobally minimized in closed form. This ensures that each cycle will reduce L((cid:0); X; Z), but more\nimportantly, will reduce L((cid:0)) (or leave it unchanged if a (cid:2)xed-point or limit cycle is reached). The\nassociated update rules from this process are as follows.\nThe optimal X (with (cid:0) and Z (cid:2)xed) is just the standard weighted minimum-norm solution given by\n(11)\n\nXnew\n\ni ! (cid:0)iLT\n\ni (cid:6)(cid:0)1\n\nfor each i. The minimizing Z equals the slope at the current (cid:0) of log j(cid:6)bj. As such, we have\n\nb eB\n\nZnew\n\ni ! O(cid:0)i log j(cid:6)bj = LT\n\ni (cid:6)(cid:0)1\n\nb Li:\n\nWith Z and X (cid:2)xed, computing the minimizing (cid:0) is a bit more dif(cid:2)cult because of the constraint\n(cid:0)i 2 H + for all i, where H + is the set of positive-semide(cid:2)nite, symmetric dc (cid:2) dc covariance\nmatrices. To obtain each (cid:0)i, we must solve\n\nAn unconstrained solution will satisfy\n\n(cid:0)new\ni ! arg min\n\n(cid:0)i2H +hkXik2\n\n(cid:0)(cid:0)1\n\ni\n\n+ trace(cid:0)Z T\n\ni (cid:0)i(cid:1)i\n\n(12)\n\n(13)\n\n(14)\nwhich, after computing the necessary derivatives and re-arranging terms gives the equivalent condi-\ntion\n\nO(cid:0)iL((cid:0)i; Xi; Zi) = 0;\n\n(15)\nThere are multiple (unconstrained) solutions to this equation; we will choose the unique one that\nsatis(cid:2)es the constraint (cid:0)i 2 H +. This can be found using\n\ni = (cid:0)iZi(cid:0)i:\n\nXiX T\n\nXiX T\n\ni = Z (cid:0)1=2\n\ni\n\ni XiX T\n\ni Z 1=2\n\n(cid:16)Z 1=2\n(cid:16)Z 1=2\n(cid:16)Z 1=2\n\n= Z (cid:0)1=2\n\ni\n\n= (cid:20)Z (cid:0)1=2\n\ni\n\ni XiX T\n\ni Z 1=2\n\ni XiX T\n\ni XiX T\n\ni Z 1=2\n\nZ (cid:0)1=2\n\ni\n\ni (cid:17) Z (cid:0)1=2\ni (cid:17)1=2(cid:16)Z 1=2\ni (cid:17)1=2\n(cid:16)Z 1=2\n\ni\n\nZ (cid:0)1=2\n\ni\n\ni\n\ni Z 1=2\n\ni (cid:17)1=2\n(cid:21) Zi(cid:20)Z (cid:0)1=2\n(cid:16)Z 1=2\ni (cid:17)1=2\n\ni Z 1=2\n\nZ (cid:0)1=2\n\ni\n\n;\n\n(cid:0)new\ni ! Z (cid:0)1=2\n\ni\n\ni XiX T\n\nThis indicates the solution (or update equation)\n\ni XiX T\n\ni Z 1=2\n\ni (cid:17)1=2\n\n(16)\n\nZ (cid:0)1=2\n\ni\n\n(cid:21) :\n\n(17)\n\n\fwhich is satis(cid:2)es the constraint. And since we are minimizing a convex function of (cid:0)i (over the\nconstraint set), we know that this is indeed a minimizing solution.\nIn summary then, to estimate (cid:0), we need simply iterate (11), (12), and (17), and with each pass we\nare guaranteed to reduce (or leave unchanged) L((cid:0)). The per-iteration cost is linear in the number\nof voxels ds so the computational cost is relatively modest (it is quadratic in db, and cubic in dc,\nbut these quantities are relatively small). The convergence rate is orders of magnitude faster than\nEM-based algorithms such as those in [3, 8] (see Figure 1 (right) ).\n\n3.2 Learning the Interference (cid:6)(cid:15)\nThe learning procedure described in the previous section boils down to (cid:2)tting a structured maximum\nlikelihood covariance estimate (cid:6)b = (cid:6)(cid:15) + F (cid:0)F T to the data covariance Cb. The idea here is that\nF (cid:0)F T will re(cid:3)ect the brain signals of interest while (cid:6)(cid:15) will capture all interfering factors, e.g.,\nspontaneous brain activity, sensor noise, muscle artifacts, etc. Since (cid:6)(cid:15) is unknown, it must some-\nhow be estimated or otherwise accounted for. Given access to pre-stimulus data (i.e., data assumed\nto have no signal/sources of interest), stimulus evoked factor analysis (SEFA) provides a powerful\nmeans of decomposing a data covariance matrix Cb into signal and interference components. While\ndetails can be found in [4], SEFA computes the approximation\nCb (cid:25) (cid:3) + EET + AAT ;\n\n(18)\nwhere E represents a matrix of learned interference factors, (cid:3) is a diagonal noise matrix, and A is a\nmatrix of signal factors. There are two ways to utilize this decomposition (more details can be found\nin [11]). First, we can simply set (cid:6)(cid:15) ! (cid:3) + EET and proceed as in Section 3.1. Alternatively,\nwe can set (cid:6)(cid:15) ! 0 and then substitute AAT for Cb, i.e., run the same algorithm on a de-noised\nsignal covariance. For technical reasons beyond the scope of this paper, it appears that algorithm\nperformance may be superior when the latter paradigm is adopted.\n\n4 Analysis of Theoretical Localization/Orientation Bias\nTheoretical support for the proposed algorithm is possible in the context of estimation bias assuming\nsimpli(cid:2)ed source con(cid:2)gurations. For example, substantial import has been devoted to quantifying\nlocalization bias when estimating a single dipolar source. Recently it has been shown, both empiri-\ncally and theoretically [5, 9], that the MVAB and sLORETA algorithms have zero location bias under\nthis condition at high SNR. This has been extended to include certain empirical Bayesian methods\n[8, 12]. However, these results assume a single dipole with (cid:2)xed, known orientation (or alternatively,\nthat dc = 1), and therefore do not formally handle source correlations or multi-component dipoles.\nThe methods from [6, 13] also purport to address these issues, but no formal analyses are presented.\nIn contrast, despite being a complex, non-convex function, we now demonstrate that L((cid:0)) has very\nattractive bias properties regarding both localization and orientation. We will assume that the full\n\nds(cid:3)T represents a suf(cid:2)ciently high sampling of the source space such that\n\nany active dipole component aligns with some lead-(cid:2)eld columns. Unbiasedness can also be shown\nin the continuous case, but the discrete scenario is more straightforward and of course more relevant\nto any practical task.\nSome preliminary de(cid:2)nitions are required to proceed. We de(cid:2)ne the empirical intra-dipole corre-\nlation matrix at the i-th voxel as Cii , 1\ni Si; non-zero off-diagonal elements imply that corre-\nST\ndt\nlations are present. Except in highly contrived situations, this type of correlation will always exist.\nThe empirical inter-dipole correlation matrix between voxels i and j is Cij , 1\ni Sj; any non-\nST\ndt\nzero element implies the existence of a correlation. In practice, this form of correlation may or may\nnot be present. With regard to the lead-(cid:2)eld L, spark is de(cid:2)ned as the smallest number of linearly\ndependent columns [2]. By de(cid:2)nition then, 2 (cid:20) spark(L) (cid:20) db + 1. Finally, da denotes the number\nof active sources, i.e., the number of voxels whereby kSik > 0.\n\nlead-(cid:2)eld L ,(cid:2)LT\n\n1 ; : : : ; LT\n\nTheorem 1. In the limit as (cid:6)(cid:15) ! 0 (high SNR) and assuming dadc < spark(L) (cid:0) 1, the cost\nfunction L((cid:0)) maintains the following two properties:\n\n1. For arbitrary Cii and Cij, the unique global minimum (cid:0)(cid:3) produces a source estimate S(cid:3) =\nEp(SjB;(cid:0)(cid:3)) [S] computed using (4) that equals the generating source matrix S, i.e., it is\n\n\funbiased in both location and orientation for all active dipoles and correctly zeros out the\ninactive ones.\n\n2. If Cij = 0 for all active dipoles (although Cii is still arbitrary), then there are no local\n\nminima, i.e., the cost function is unimodal.\n\nThe proof has been deferred to [11]. In words, this theorem says that intra-dipole correlations do\nnot disrupt the estimation process by creating local minima, and that the global minimum is always\nunbiased. In contrast, inter-dipole correlations can potentially create local minima, but they do not\naffect the global minimum. Empirically, we will demonstrate that the algorithm derived in Section\n3 is effective at avoiding these local minima (see Section 5). With added assumptions these results\ncan be extended somewhat to handle the inclusion of noise.\nThe cost functions from [8, 12] bear the closest resemblance to L((cid:0)); however, neither possesses\nthe second attribute from Theorem 1. This is a very signi(cid:2)cant failing because, as mentioned previ-\nously, intra-dipole correlations are always present in each active dipole. Consequently, localization\nand orientation bias can occur because of convergence to a local minimum. The iterative Bayesian\nscheme from [13], while very different in structure, also directly attempts to estimate (cid:3)exible ori-\nentations and handle, to some extent, source correlations. While details are omitted for brevity, we\ncan prove that the full model upon which this algorithm is based fails to satisfy the (cid:2)rst property\nof the theorem, so the corresponding global minimum can be biased.\nIn contrast, beamformers\nand sLORETA are basically linear methods with no issue of global or local minima. However, the\npopular sLORETA and MVAB solutions will in general display a bias for multi-component dipoles\n(dc > 1) or when multiple dipoles (da > 1) are present, regardless of correlations.\n\n5 Empirical Evaluation\nIn this section we test the performance of our algorithm on both simulated and real data sets. We\nfocus here on localization accuracy assuming strong source correlations and unknown orientations.\nWhile orientation estimates themselves are not shown for space considerations, accurate localization\nimplicitly indicates that this confound has been adequately handled. More comprehensive experi-\nments, including comparisons with additional algorithms, are forthcoming [11].\nSimulated Data: We (cid:2)rst conducted tests using simulated data with realistic source con(cid:2)gurations.\nThe brain volume was segmented into 5mm voxels and a two orientation (dc = 2) forward lead(cid:2)eld\nwas calculated using a spherical-shell model [7]. The data time course was partitioned into pre- and\npost-stimulus periods. In the pre-stimulus period (263 samples) there is only noise and interfering\nbrain activity, while in the post-stimulus period (437 samples) there is the same (statistically) noise\nand interference factors plus source activity of interest. We used two noise conditions - Gaussian-\nnoise and real-brain noise. In the former case, we seeded voxels with Gaussian noise in each orien-\ntation and then projected the activity to the sensors using the lead(cid:2)eld, producing colored Gaussian\nnoise at the sensors. To this activity, we added additional Gaussian sensor noise. For the real-brain\nnoise case, we used resting-state data collected from a human subject that is presumed to have on-\ngoing and spontaneous activity and sensor noise. In both the Gaussian and real-brain noise cases,\nthe pre-stimulus activity was on-going and continued into the post-stimulus period, where the simu-\nlated source signals were added. Sources were seeded at locations in the brain as damped-sinusoids\nand this voxel activity was projected to the sensors. We could adjust both the signal-to-noise-plus-\ninterefence ratio (SNIR) and the correlations between the different voxel time-courses to examine\nthe algorithm performance on correlated sources and unknown dipole orientations.\nWe ran 100 simulations of three randomly seeded sources at different SNIR levels (-5, 0, 5, 10dB).\nThe sources in these simulations always had an inter-dipole correlation coef(cid:2)cient of 0.5; intra-\ndipole correlations were present as well. We ran the simulation with both Gaussian-noise and real\nbrain noise using a MVAB and our proposed method. In order to evaluate performance, we used the\nfollowing test for a hit or miss. We drew spheres around each seeded source location and obtained the\nmaximum voxel value in each sphere. Then we calculated the maximum voxel activation outside the\nthree spheres. If the maximum inside each sphere was greater than the maximum outside all of the\nspheres, it was counted as a hit (in this way, we are implicitly accounting somewhat for false alarms).\nEach simulation could get a score or 0, 1 ,2 , or 3, with 3 being the best. Figure 1 (left) displays\ncomparative results averaged over 100 trials with standard errors. Our method quite signi(cid:2)cantly\noutperforms the MVAB, which is designed to handle unknown orientations but has dif(cid:2)culty with\n\n\fsource correlations. Figure 1 (middle) shows a sample reconstruction on a much more complex\nsource con(cid:2)guration composed of 10 dipolar sources. Finally, Figure 1 (right) gives an example\nof the relative convergence improvement afforded by our method relative to an EM implementation\nanalogous to [3, 8]. We also wanted to test the performance on perfectly correlated sources with\nunknown orientations and compare it to other state-of-the-art Bayesian methods. An example using\nthree such sources and 5 dB SNIR is given in Figure 2.\n\ns\nn\no\n\ni\nt\n\na\nz\n\ni\nl\n\na\nc\no\n\nl\n \nl\n\nu\n\nf\ns\ns\ne\nc\nc\nu\ns\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n)\n\nm\nm\n\n(\n \nz\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n\u221210\n\n\u221220\n\nGaussian Noise\nProposed Method\n\nReal Brain Noise\nProposed Method\n\nGaussain Noise\nMVAB\n\nReal Brain Noise\nMVAB\n\n\u22126\n\n\u22124\n\n\u22122\n\n0\n\n2\n\n4\n\n6\n\n8\n\n10\n\n12\n\n\u221260\n\n\u221240\n\n\u221220\n\nSNIR\n\n0\n\nx (mm)\n\n20\n\n40\n\nl\n\ne\nu\na\nv\n \n\nn\no\n\ni\nt\nc\nn\nu\n\nf\n \nt\ns\no\nc\n\n\u2212160\n\n\u2212180\n\n\u2212200\n\n\u2212220\n\n\u2212240\n\n\u2212260\n\n\u2212280\n0\n\nEM algorithm\nproposed method\n\n10\n\n20\n\n30\n\n40\n\n50\n\n60\n\n70\n\n80\n\n90\n\n100\n\niteration number\n\nFigure 1: Left: Aggregate localization results for MVAB and the proposed method recovering three\ncorrelated sources with unknown orientations. Middle: Example reconstruction of 10 relatively\nshallow sources (green circles) using proposed method (MVAB performs poorly on this task). Right:\nConvergence rate of proposed method relative to a conventional EM implementation based on [3, 8].\n\n)\n\nm\nm\n\n(\n \nz\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n\u221210\n\n\u221220\n\n)\n\nm\nm\n\n(\n \nz\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n\u221210\n\n\u221220\n\n)\n\nm\nm\n\n(\n \nz\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n\u221210\n\n\u221220\n\n\u221260\n\n\u221240\n\n\u221220\n\n0\n\nx (mm)\n\n20\n\n40\n\n\u221260\n\n\u221240\n\n\u221220\n\n0\n\nx (mm)\n\n20\n\n40\n\n\u221260\n\n\u221240\n\n\u221220\n\n0\n\nx (mm)\n\n20\n\n40\n\nFigure 2: Reconstructions of three perfectly correlated dipoles (green circles) with unknown ori-\nentations using, Left: MVAB, Middle: variational Bayesian method from [13], Right: proposed\nmethod.\n\nReal Data: Two stimulus-evoked data sets were collected from normal, healthy research subjects\non a 275-channel CTF System MEG device. The (cid:2)rst data set was a sensory evoked (cid:2)eld (SEF)\nparadigm, where the subject\u2019s right index (cid:2)nger was tapped for a total of 256 trials. A peak is typ-\nically seen 50ms after stimulation in the contralateral (in this case, the left) somatosensory cortical\narea for the hand, i.e., dorsal region of the postcentral gyrus. The proposed algorithm was able to\nlocalize this activation to the correct area of somatosensory cortex as seen in Figure 3 (left) and the\nestimated time course shows the typical 50ms peak (data not shown). The second data set analyzed\nwas an auditory evoked (cid:2)eld (AEF) paradigm. In this paradigm the subject is presented tones binau-\nrally for a total of 120 trials. There are two typical peaks seen after the presentation of an auditory\nstimulus, one at 50ms and one at 100ms, called the M50 and M100 respectively. The auditory pro-\ncessing of tones is bilateral at early auditory areas and the activations are correlated. The algorithm\nwas able to localize activity in both primary auditory cortices and the time courses for these two\nactivations reveal the M50 and M100. Figure 3 (middle) and (right) displays these results. The\nanalysis of simple auditory paradigms is problematic because many source localization algorithms,\nsuch as the MVAB, do not handle the bilateral correlated sources well. We also ran MVAB on the\nAEF data and it localized activity to the center of the head between the two auditory cortices (data\nnot shown).\n6 Discussion\n\nThis paper derives a novel empirical Bayesian algorithm for MEG source reconstruction that readily\nhandles multiple correlated sources with unknown orientations, a situation that commonly arises\neven with simple imaging tasks. Based on a principled cost function and fast, convergent update\n\n\f)\n0\n0\n0\n1\n-\n0\n(\n \ny\nt\ni\ns\nn\ne\nt\nn\n\nI\n \n\nd\ne\nz\ni\nl\na\nm\nr\no\nN\n\n800\n\n700\n\n600\n\n500\n\n400\n\n300\n\n200\n\n100\n\n0\n\n50\n\nTime (ms)\n\n100\n\n150\n\nFigure 3: Real-world example Left: Somatosensory reconstruction. Middle: Bilateral auditory re-\nconstruction. Right: Recovered timecourse from left auditory cortex (right auditory cortex, not\nshown, is similar).\nrules, this procedure displays signi(cid:2)cant theoretical and empirical advantages over many existing\nmethods. We have restricted most of our exposition and analyses to MEG; however, preliminary\nwork with EEG is also promising. For example, on a real-world passive visual task where subjects\nviewed (cid:3)ashing foreground/background textured images, our method correctly localizes activity to\nthe lateral occipital cortex while two state-of-the-art beamformers fail. This remains an active area\nof research.\nReferences\n[1] J.O. Berger, Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York,\n\n2nd edition, 1985.\n\n[2] D.L. Donoho and M. Elad, (cid:147)Optimally sparse representation in general (nonorthogonal) dictio-\nnaries via \u20181 minimization,(cid:148) Proc. National Academy of Sciences, vol. 100, no. 5, pp. 2197(cid:150)\n2202, March 2003.\n\n[3] K. Friston, L. Harrison, J. Daunizeau, S. Kiebel, C. Phillips, N. Trujillo-Barreto, R. Henson,\nG. Flandin, and J. Mattout, (cid:147)Multiple sparse priors for the MEG/EEG inverse problem,(cid:148) Neu-\nroImage, 2008 (in press).\n\n[4] S.S. Nagarajan, H.T. Attias, K.E. Hild K.E., K. Sekihara, (cid:147)A probabilistic algorithm for robust\ninterference suppression in bioelectromagnetic sensor data,(cid:148) Stat Med. vol. 26, no. 21, pp.\n3886(cid:150)910 Sept. 2007.\n\n[5] R.D. Pascual-Marqui, (cid:147)Standardized low resolution brain electromagnetic tomography (sloreta):\nTechnical details,(cid:148) Methods and Findings in Experimental and Clinical Pharmacology, vol. 24,\nno. Suppl D, pp. 5(cid:150)12, 2002.\n\n[6] M. Sahani and S.S. Nagarajan, (cid:147)Reconstructing MEG sources with unknown correlations,(cid:148)\n\nAdvances in Neural Information Processing Systems 16, 2004.\n\n[7] J. Sarvas, (cid:147)Basic methematical and electromagnetic concepts of the biomagnetic inverse prob-\n\nlem,(cid:148) Phys. Med. Biol., vol. 32, pp. 11(cid:150)22, 1987.\n\n[8] M. Sato, T. Yoshioka, S. Kajihara, K. Toyama, N. Goda, K. Doya, and M. Kawato, (cid:147)Hierarchical\n\nBayesian estimation for MEG inverse problem,(cid:148) NeuroImage, vol. 23, pp. 806(cid:150)826, 2004.\n\n[9] K. Sekihara, M. Sahani, and S.S. Nagarajan, (cid:147)Localization bias and spatial resolution of adaptive\nand non-adaptive spatial (cid:2)lters for MEG source reconstruction,(cid:148) NeuroImage, vol. 25, pp. 1056(cid:150)\n1067, 2005.\n\n[10] K. Uutela, M. Hamalainen, and E. Somersalo, (cid:147)Visualization of magnetoencephalographic\n\ndata using minimum current estimates,(cid:148) NeuroImage, vol. 10, pp. 173(cid:150)180, 1999.\n\n[11] D.P. Wipf, J.P. Owen, H.T. Attias, K. Sekihara, and S.S. Nagarajan (cid:147)Robust Bayesian Estima-\ntion of the Location, Orientation, and Timecourse of Mutliple Correlated Neural Sources using\nMEG,(cid:148) submitted, 2009.\n\n[12] D.P. Wipf, R.R. Ram\u00b7(cid:17)rez, J.A. Palmer, S. Makeig, and B.D. Rao, (cid:147)Analysis of empirical\nBayesian methods for neuroelectromagnetic source localization,(cid:148) Advances in Neural Informa-\ntion Processing Systems 19, 2007.\n\n[13] J.M. Zumer, H.T. Attias, K. Sekihara, and S.S. Nagarajan, (cid:147)A probabilistic algorithm for\ninterference suppression and source reconstruction from MEG/EEG data,(cid:148) Advances in Neural\nInformation Processing System 19, 2007.\n\n\f", "award": [], "sourceid": 863, "authors": [{"given_name": "Julia", "family_name": "Owen", "institution": null}, {"given_name": "Hagai", "family_name": "Attias", "institution": null}, {"given_name": "Kensuke", "family_name": "Sekihara", "institution": null}, {"given_name": "Srikantan", "family_name": "Nagarajan", "institution": null}, {"given_name": "David", "family_name": "Wipf", "institution": null}]}