Towards Bayesian Quantification of Permeability in Micro-scale Porous Structures – The Database of Micro Networks

This article develops a Bayesian framework to quantify the absolute permeability of water in a porous structure from the geometry and clustering parameters of its underlying pore-throat network. These parameters include the network`s diameter, transivity, degree, centrality, assortativity, edge density, K-core decomposition, Kleinberg’s hub centrality scores, Kleinberg's authority centrality scores, length, and porosity. In addition, the incorporated clustering aspects of the networks have been determined with respect to several clustering criteria – edge betweenness, greedy optimization of modularity, multi-level optimization of modularity, and short random walks. As such, the article takes the first footsteps of creating a Database of Micro Networks for micro-scale porous structures, to be used as main input stream for the proposed Bayesian scheme.

In particular, the quantification of the permeability (PE) of a porous structure has gained attractionowing to its significance in the petroleum industry. In this sake, the stochastic modeling [20] and deep learning strategies have notably been used [21][22][23][24][25][26][27][28] with the latter being applied to the log data as well as the 2D/3D CT-scan images. For stochastic modeling, Eugene et al. (2005) [20] applied a combination of lattice Boltzmann method (LBM) and firstorder reliability method (FORM) to construct cumulative distribution functions for randomly-generated porous structures. While reporting on a speed improvement compared to the Monte Carlo simulation, their method bears the same drawbacks inherited with LBMbeing dependent on the choice of the collision operator as well as the relaxation time. For deep learning, the issue was initially treated by applying neural networks to estimate relative permeability curves on a field scale [23], and to further relate the quantity to the petro-physical log data [21]. On a macroscopic scale, Erofeev et al. (2019) [24] tested on several machine learning methods to estimate the change in permeability of porous rocks occurring during desalination process in laboratory, based on which they reported on mixed performance of the employed techniques over parameters considered. Furthermore, the deep learning strategy was applied to core data, on a microscopic scale, where processing of the micro-CT images has been on focus. For permeability prediction, Jinlong et al. (2018) [25] proposed a physics-informed convolutional neural network approach, which involves a series of fluid dynamics simulations in order to build on the training dataset required. On the other hand, geometrical features in binary segmented images were used by some researchers to estimate permeability, applying multilayer neural network (MNN) and convolutional neural network (CNN) methods [26]. Machine learning strategies were also used for the purpose of feature selection to identify the sets of micro structure quantities as well as multiscale complex network features that best characterize permeability [28], and for modeling permeability via a connectivity index attributed to the 3D images of the porous structure [27].
The computational theme used in the pore-scale characterization literature thus far, has followed either of the two major routesdirect or indirect. In the direct scenario, the pore-scale geometry of the structure is explicitly incorporated into the simulation; whereas in the indirect framework a conceptualized interconnected pore-throat network is considered which maintains the same topological features as the original image. In this regards, the former approach has took advantage of both mesh-free and mesh-dependent techniques, such as the lattice Boltzmann method, smoothed particle hydrodynamics (SPH), and finite-volume-based Computational Fluid Dynamics (CFD). In general, the latter approachpore network model (PNM) -puts less computational burden as the solution of partial differential equations for the model reduces to a set of analytical models for flow in each network element [19]. In spite of this simplicity, the PNM results have been successfully validated against the micro-model experiments across a wide range of pore-structure and fluid-flow parameters [19,29]. The fidelity of PNM results against directsimulation counterparts have also been evaluated [19].
As a method being capable in both showing the correlation and causation among different input and output variables, the Bayesian Network (BN) theory has been applied in the context of geosciences for the petro-physical logbased facies and fracture classification, understanding of relationships among geologic features and identification of simple rock facies [30][31][32]. In this sake, Bhattacharya and Mishra (2018) [30] reported on the first usage of Bayesian network for characterization of shale facies and fracture, under limited data availability of common logs. Harnessing the BN capability, the authors were able to reveal the complex interplay amongst multiple petrophysical sensors, based on which Gamma-ray, resistivity, and density logs were determined as being the most influential for the dataset considered. The unique feature of Bayesian Network in demonstrating the complex relationship among parameters, is distinctive to the other machine learning algorithms-posing the theory as being potentially useful in other subsurface applications.
Albeit rich in content, the trend of the current literature on pore-scale modeling is such to apply established simulation techniques or compare their performances. The present article develops upon an intuition of providing an alternative procedure for estimating micro-structural attributes via the Bayesian network theory. As such, the article completes on the first lines of a mega-dataset -the Database of Micro Networks (DMN)which comprises the main input stream for the newly proposed computational scheme. The article makes an added contribution to the existing literature, by conducting the first study to relate the absolute permeability in a porous matrix to the geometry and clustering parameters of its underlying network.

Database of Micro Networks
A cornerstone of the present article is establishment of a Database of Micro Networks, to be used as the main input stream for the Bayesian methods for micro-structural characterization purposes. The initial hypothesis behind this databank was to incorporate different features related to the geometry, and clustering patterns of the networks extracted from micro-scale porous structures.
The DMN entries initially account for several measures on the geometry and clustering parameters of the porethroat network. As for the network geometry, the parameters considered have been the diameter (DI), transivity (TR), degree (DE), centrality (CE), assortativity (AO), edge density (ED), K-core decomposition (KC), Kleinberg's hub centrality scores (HS), and Kleinberg's authority centrality scores (AS) [33]. Table 1 lists a brief description of these parameters. Additional parameters of image length (LE) and porosity (PO) have also been embodied into the DMN entries. As for the clustering; the number of (pore) clusters found in the pore-throat network has been recorded into the DMN. In this regards, several clustering criteria were analyzededge betweenness [55] (NG), greedy optimization of modularity [34] (GO), multi-level optimization of modularity [35] (LV), short random walks [36] (RW). The clusters may signify an important feature in the process of transport within the network. Imaginably, they can be indicative of areas within the network with similar flow regimesareas with similar fluid velocity within laminar domain; nevertheless the issue should be further investigated as the concept is new to the field. For a description of the clustering criteria, however, the reader is referred to the corresponding literature, to keep this article within a reasonable length. The number of links incident upon a node. The CE parameter reported herein is a graph-level centrality score based on node-level centrality measures AO The tendency of nodes to connect to other nodes which are similar on the focal attribute being the node degree ED The ratio of the number of edges and the number of possible edges KC The k-core of graph is a maximal subgraph in which each vertex has at least degree k. The coreness of a vertex is k, if it belongs to the k-core but not to the (k+1)-core

HS
The hub scores of the vertices are defined as the principal eigenvector of A*t(A), where A is the adjacency matrix of the graph.

AS
The authority scores of the vertices are defined as the principal eigenvector of t(A)*A, where A is the adjacency matrix of the graph.
The primitive design of DMN also considers the matter of directionality. Classically, the pore-throat connectivity in an image of a porous media can be established using the standard Maximal-Ball protocol [37]. At this stage, the network obtained is of an undirected nature, since it fails to account for relative positioning of pores with respect to the flow and merely marks them as connected. The modification to this hypothesis comes along with considering the idea of directed networks. In this respect, the connectivity of a given pore to another target one is rendered, only if connected on the undirected-network map as well as meet the directionality requirement. The directionality requirement puts a constraint on the flow -only permitting passage in the direction being considered. As such, for instance, flow passage theoretically occurs only from pores with lower placement in the x-direction to pores with higher placement in the x-direction, when constructing a directed network in the x-direction. The DMN entries report on parameters elicited over both the directed and undirected pore-throat networks. Although borrowed from a supposedly distant topic of network science, these parameters deserve an analysis to study their potential relevance to the transport properties within the porous media.

Bayesian Network Methodology
The central idea behind the present article has been to devise a Bayesian route for estimation of the absolute permeability in porous structure. With a completed DMN databank at hand, this goal is achievable in two steps. At first, the Bayesian Network theory can be used to identify the statistically-significant influencing parameters on PE. Given this information, the BN theory can be applied, a second time, to make approximate inference (on an unknown value). For our problem of interest, the situation is such that a new line of data is appended to the DMN (perhaps through analysis of a new micro-CT scan image), while its absolute permeability is to be predicted.
Since the Bayesian Network theory plays a pivotal role in the present analysis, a description of its methodology is deemed necessary, at this stage. A Bayesian network is an implementation of a graphical model, in which nodes represent (random) variables and arrows represent probabilistic dependencies between the nodes [38]. The BN`s graphical structure is a directed acyclic graph (DAG) which enables estimation of the joint probability distribution. For each variable, DAG defines a factorization of the joint probability distribution, into a set of local probability distributions, where the form of factorization is given by the BN`s Markov propertyassuming a variable to be solely dependent on its parents. In this sake, the methodology seeks to find a structure, along with its parameters. The two classifications of the BN-structure-learning process, either treat the issue by analyzing the probabilistic relationships supervised by the Markov property of Bayesian networks with conditional independence tests and subsequently constructing a graph that satisfies the corresponding d-separation statements (Constraint-based algorithms), or by assigning a score to each BN candidate and maximizing it with a heuristic algorithm (Score-based algorithms) [39].
By taking advantage of the fundamental properties of the Bayesian Networks, approximate inference (on an unknown value) is attainable. This approach should evade the curse of dimensionality, due to its mere usage of the local distributions [40]. Given the BN network structure established, the stochastic simulation can be applied to generate a large number of cases from the distribution network, from which the posterior probability of a target node is estimated. In this regards, the two prominent algorithms are the Logic sampling (LS) and the Likelihood weighting (LW). The former algorithm generates a case by selecting values for each nodeweighed by the probability of that values occurringat random. The nodes are traversed from the parents (root) nodes down to children (leave) nodes. As a consequence, at each step the weighing probability is either the prior or the Conditional Probability Table entry for the sampled parent values. An instantiation of all the nodes in the BN is later on created, once all the structure is visited. The collection of instantiation data enables estimation of the posterior probability for node X given evidence E (Appendix A). The latter algorithm is similar to the former with a slight modificationadding the fractional likelihood of the evidence combination to the run count, instead of one (Appendix B).
In a nutshell, the Bayesian quantification framework proposed in the present article can be visualized according to the flowchart presented in Figure 1. Provided the permeability is sought on a porous structure, the underlying porethroat connectivity is initially assessed (by processing the CT-scan image of the structure). Subsequently, several measures related to the clustering/geometrical characteristics of the pore-throat network (as prescribed in the DMN entries) are determined. The new set of input parameters is later appended to the DMN, to obtain a probabilistic estimate on its unknown (permeability) from the BN theory, using the information recorded in the DMN.

Results
The initial version of the Database of Micro Networks was formed upon the data acquired from different subsections of benchmark micro-CT scan images [41][42][43]. The rationale behind this choice was to facilitate the matter of future comparison for the academics. The subsections were selected by cutting the original images at 50/100-pixel regular intervals in the z-direction. Figure 2 depict the stereolithography surfaces of the pore space of the S1 image, measuring 868.3 micrometers in each direction, which was extracted using an in-house developed code. A subsection of the S1 image -measuring between the first (0-434.15) micrometers in each direction of the image -is also provided in Figure 3. Later, the pore-throat network was detected over each subsection, and the corresponding undirected/directed networks were established.

Start
Obtain the CT-scan data of a porous rock Determine the underlying porenetwock connectivity network Given the undirected/directed networks over the surveyed space, the geometry/clustering parameters of the networks were quantified. For the DE/KC/HS/AS measures, the corresponding mean values were embedded into the body of the DMN. In addition, the total number of clusters detected, under each criterion, in a surveyed network was recorded into the DMN entries. The pore communities were detected for the undirected/directed networks constructed upon the S1-subsection, illustrated in Figure 3, using the GO/LV/RW criteria.
As the DMN requires an initial completion on its absolute permeability entries, the values were computed within selected micro-CT scan images was attempted, using a myriad of PNM-LBM-CFD methods. For PNM, the absolute permeability was estimated using the network model implemented in the OpenPNM code [44]. For each image subsection, the PNM was applied in the three principal directions (x, y, and z) for the constructed directed network, and in the x-direction for the corresponding constructed undirected network. For LBM, the D3Q19 descriptor model [45] was used along with a Bhatnagar-Gross-Krook (BGK) collision operator for the halfway bounce-back scheme at the solid-fluid boundaries; albeit the latter choice may cause numerical instabilities owing to deficiencies such as viscosity-dependent slip at the walls [17,18,46]. The LBM results were generated using the Palabos Parallel Lattice Boltzmann Solver [47]. In addition, the open source CFD toolbox was used to implement a finite-volume CFD scheme to simulate the water flow through selected porous structures [48]. Using the pressure/velocity data collected from the CFD runs, an in-house code was developed to estimate the single-phase permeability after calculating the pressure drop by the method of pressure gradient force proposed by Raeini et al. (2014) [49,56]. Table 2 lists the computed PE results for the S1-subsection (illustrated in Figure 3). In general, the output from the three methods should yield comparable results [50]; yet the values in Table 2 exhibit a degree of discrepancy, which can be accounted to the choice of descriptor models or boundary conditions in the LBM/CFD. For this sake, only the PNM results were incorporated into the body of the DMN framework, as it relies on less influencing parameters. Moreover, the selection of PNM to represent the absolute permeability values should allow speedy generation of data for DMN entries, as it is less computationally expensive. Following the above procedure, the Database of Micro Networks for sandstone were generated and subsequently used as the input stream for the Bayesian network method. The DMN can be obtained from the authors.  [27] for sandstones, were established on the same micro-structures considered in the present work. As such, the PNM entries for the absolute permeability of the sandstone microstructures in this study should be close to the corresponding experimental values, if tested.
With a DMN database available, the Bayesian prediction of absolute permeability was practiced by initially implementing a Bayesian graphical structure-learning. Implementation of the graphical structure-learning of the Bayesian networks was attempted using the bnlearn package [51]. The score-based algorithm was tested, in this work. For the score-based case, a hybrid conditional linear Gaussian log-likelihood score was applied. For BN inference predictions were obtained by applying the LW algorithm and extracting the expected value of the conditional distribution of 500 simulation results. All the available nodes in the structure were taken as evidence, in that situation, except to the node related to the variable being predicted. Figure 4 shows the Bayesian Network of significant/insignificant influencing parameters on PE, obtained through the hybrid conditional linear Gaussian log-likelihood BN method, for the directed networks extracted from micro-CT images. A directed arrow goes from the influencing to the influenced parameter, with its significance level being marked by the thickness of the connecting line. In this BN setting, the significance is rendered on being supported by the data. Assuming a sequence of A→B→C on a Bayesian network graph, then C would be determined based on the probability density function of Cderived from data -while A and B have the specified values. Considering the strength coefficient (between two parameters) as the change (increase/decrease) in the network score caused by the removal of the corresponding arc, its significance is deemed should it fall below a threshold value of zero. As evident, several of the considered parameters form a hierarchy of influencing effect on the absolute permeability. This should be a conspicuous finding as some of the supposedly unrelated parameters, are now found with statistically-significant relationship to the PE parameter. In order to delve more into this issue, the statistical relationship between DMN entries and the PE was also studied by several other methodsthe random forest and entropy filters [52,53]. Nearly the same result was obtained on the set of significant influencing parameters on PE, from the BN and random forest methods applied to the directed networks (Table 3)confirming the robustness of BN methodology to rank feature importance. A similar analogy was also applied to the undirected network mode, in which some disagreement was found on the influencing parameters ( Table 4). As evident, the results comply with the classical view of relating the permeability in a porous matrix to its porosity and length (i.e. pressure drop over length) [54]. In addition, the analysis introduces new parameters to influence the permeability, with relevance to the geometrical/clustering features of the underlying pore-throat network.   Significant influencing parameter   BN  AO, AOU, AS, CE, CEU, DE, DEU, DI, DIU, EDU, GO, GOV, HS, KC, KCU, LE, LV, LVU, NG, NGU, PO, RW, RWU, TR, TRU   Random Forest  ASU, CE, DE, DEU, DI, DIU, GO, HS, HSU, KC, KCU, LV, PO, RW   Entropy  AO, ASU, CE, CEU, DE, DEU, DI, DIU, GO, GOU, HS, HSU, KC, KCU, LV, LVU,

insignificant (dashed lines) of influencing parameters on PE, obtained through the hybrid BN method over the directed networks extracted from micro-CT images
A physical understanding of the newly found parameters can be developed by thinking of the absolute permeability to be constructed over an intelligent path. This path is detected by considering the connectivity of the clusters detected under the GO/NG/RW. Since the number of these clusters has found to be important on PE, the functionality of each cluster can be viewed in a similar mannera replica of the functionality of lungs in the respiratory system, where two lungs work for a same purpose. The results also show the permeability to relate to the corresponding network diameter -the longest of all the shortest paths in the network (Table 1) -which again would fit into the breakthrough concept for its experimental determination. For the new geometrical features detected, the significance of k-core of network (Tables 3 and 4) suggests an analogy between different (supposedly uncorrelated) phenomenonthe passage of flow in a porous matrix, the spreading of an epidemic disease, and the dissemination of information in a social network [57]. The inspection of other geometrical features and their detected statistical significance on PE should signify the existence of some influential spreader nodes within the pore-throat network, for the passage of flow, which can easily be identified given their geometrical scores. This latter finding -existence of influential spreader nodes for flow passage-should strong support the existence of an intelligent path inside the porous matrix, along which the influential spreader nodes are distributed.
Once the influencing parameters are detected, the BN framework is re-applied to make inference on the unknown values. To account for accuracy of the proposed BN methodology, a 10-fold cross validation was implemented, in which the DMN pool was randomly divided into a 75% learning and a 25% test sections, with the PE values to be predicted by the BN method. The process was repeated over different random selections of the learning/test groups. For the case of directed networks, the mean error value obtained from the BN predictions was lower than 7%, which is favorable given the associated subsurface uncertainties. Nevertheless, the BN methodology did not yield a satisfactory result for the networks in the undirected mode pattern. This could be advocated in favor a directed network to truly capture the image of connectivity/flow within micro-structures.

Conclusion
The Database of Micro Networks provides a firm foundation to apply Bayesian methods for quantification of the micro-structural attributes within micro-CT images. The Bayesian Network results unveil the absolute permeability in a micro-structure, to be affected by several parameters amongst the DMN entries. These entries with statisticallysignificant relationship to PE, were found to be amongst the geometry as well as cluster-related parameters of its corresponding network. A 10-fold cross validation of BN-based predictions of PE shows a mean error value of almost 7%, as implemented on the DMN. The BN exhibits less satisfactory results for undirected networks, which can be interpreted in terms of a directed network to better capture the image of connectivity/flow within micro-structures. The DMN is yet to be completed by future research, which enables a better account of the Bayesian predictions within micro-structures

Acknowledgements
The author would like to place his appreciation to Dr. Apostolos Kantzas at the Department of Chemical and Petroleum Engineering at the University of Calgary and Carl Fredrik Berg at the Department of Geoscience and Petroleum at Norwegian University of Science and Technology for providing technical reviews of the article. The author is thankful to the Petroleum Engineering and Rock Mechanics (PERM) group at the Department of Earth Science and Engineering, Imperial College London for providing the CT-scan images of the micro-structures used in the present article.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A: The Logic sampling algorithm
Consider an established Bayesian Network. Assume X to be node in this BN structure, and E as a given evidence. The Logic sampling algorithm for estimation of the posterior probability of node X given evidence E=e, is computed by the following procedure [38]: Step-1 Initialize

Appendix B: The Likelihood weighing algorithm
Consider an established Bayesian Network. Assume X to be node in this BN structure, and E as a given evidence. The Likelihood weighing algorithm for estimation of the posterior probability of node X given evidence E=e, is computed by the following procedure [38]: Step-1 Initialize