Improved well logs clustering algorithm for shale gas identification and formation evaluation

N. P. Szabó; B. A. Braun; M. M. G. Abdelrahman; M. Dobróka

doi:10.1007/s40328-021-00358-0

Improved well logs clustering algorithm for shale gas identification and formation evaluation

Szabó, N. P.; Braun, B. A.; Abdelrahman, M. M. G.; Dobróka, M. 2021-12-01 00:00:00 The identification of lithology, fluid types, and total organic carbon content are of great priority in the exploration of unconventional hydrocarbons. As a new alternative, a fur- ther developed K-means type clustering method is suggested for the evaluation of shale gas formations. The traditional approach of cluster analysis is mainly based on the use of the Euclidean distance for grouping the objects of multivariate observations into different clusters. The high sensitivity of the L norm applied to non-Gaussian distributed meas- urement noises is well-known, which can be reduced by selecting a more suitable norm as distance metrics. To suppress the harmful effect of non-systematic errors and outlying data, the Most Frequent Value method as a robust statistical estimator is combined with the K-means clustering algorithm. The Cauchy-Steiner weights calculated by the Most Fre- quent Value procedure is applied to measure the weighted distance between the objects, which improves the performance of cluster analysis compared to the Euclidean norm. At the same time, the centroids are also calculated as a weighted average (using the Most Frequent Value method), instead of applying arithmetic mean. The suggested statistical method is tested using synthetic datasets as well as observed wireline logs, mud-logging data and core samples collected from the Barnett Shale Formation, USA. The synthetic experiment using extremely noisy well logs demonstrates that the newly developed robust clustering procedure is able to separate the geological-lithological units in hydrocarbon formations and provide additional information to standard well log analysis. It is also shown that the Cauchy-Steiner weighted cluster analysis is affected less by outliers, which allows a more efficient processing of poor-quality wireline logs and an improved evaluation of shale gas reservoirs. Keywords Most Frequent Value · K-means clustering · Robust · Well log · Shale gas · Barnett Shale * N. P. Szabó norbert.szabo.phd@gmail.com Department of Geophysics, University of Miskolc, 3515 Miskolc-Egyetemváros, Hungary Geoengineering Research Group, MTA-ME, University of Miskolc, 3515 Miskolc-Egyetemváros, Hungary 1 3 Vol.:(0123456789) 712 Acta Geodaetica et Geophysica (2021) 56:711–729 1 Introduction The exploration and production of unconventional reservoirs are performed with ever- increasing intensity all over the world. By using massive hydraulic fracturing and horizon- tal drilling technology, low permeability hydrocarbon-bearing formations may be produced easier and more economical in commercial quantities to supply greater regions with energy. Among the great variety of rock types, this paper focuses mainly on the investigation of shale gas reservoirs. The Barnett Shale Formation, studied in this paper, is one of the ear- liest discovered onshore shales having a great amount of producible gas reserves (Jarvie et al. 2007). The presence of strata having the potential to store hydrocarbon does not mean that one has a direct hydrocarbon occurrence, thus, the accurate and reliable determination of fluid content and the volume of reserves is crucial. Since the unconventional reservoirs are usually complex multi-mineral formations, the conventional well-log-analysis methods are rarely applicable, because several petrophysical properties as unknowns may influence the measurements. Empirical estimation methods of water saturation (Archie 1942) and total organic content (Passey et al. 1990) should be revised in the given exploration area (Bibor and Szabó 2016; Xu et al. 2017). Exploratory statistical tools such as principal component, factor and cluster analysis can be used as effective data processing tools for rock typing, the estimation of petrophysical properties and the replacement of missing measurements in well logging applications. For instance, the factor analysis of wireline logs allows an improved characterization of shaly sand formations (Szabó 2011; Szabó and Dobróka 2013). In this study, a novel approach for cluster analysis is presented to subdivide the lithological units of shale gas reservoirs and infer the fluid content based on well logs originated from the Barnett Shale Formation. The traditional method of cluster analysis has been widely used in geosciences, e.g. for the processing of geochemical data acquired from water wells (Vriend et al. 1988). Kazmierc- zuk and Jarzyna (2006) studied its possibilities in lithology and hydrocarbon saturation determination using well logging measurements in Poland. Paasche and Tronicke (2007) used the fuzzy K-means cluster analysis successfully for subsurface zonation and petro- physical modelling. The fuzzy statistics formed the basis of risk and uncertainty analysis in Hungarian bauxite exploration (Bárdossy and Fodor 2004). Sfidari et al. (2014) utilized cluster analysis incorporated in a hybrid approach for lithofacies mapping in sequence stra- tigraphy in the South Pars gas field, Persian Gulf basin. The robust non-hierarchical clustering algorithm proposed in this paper makes use of the Most Frequent Value (MFV) technique, which is known as a highly efficient robust statisti- cal estimator (Steiner 1991). The MFV method relies on the minimization of the informa- tion loss (relative entropy), practically regardless of the given error distribution. Dobróka et al. (1991) developed a geophysical joint inversion method using the MFV procedure for the joint interpretation of seismic and geoelectric data collected in an underground coal mine. Szűcs et al. (2006) showed the application possibilities of the MFV method in hydro- geological modelling. Dobróka et al. (2014) suggested a series expansion-based inversion method for the calculation of the Fourier transform using the Cauchy-Steiner weights esti- mated by the MFV method. It was shown that the MFV method has high noise rejection capability and gives a better estimate of the frequency spectrum of seismic signals than the conventional Discrete Fourier Transformation method. The MFV technique was also implemented in factor analysis of engineering geophysical sounding data by Szabó et al. (2017) to give a reliable estimation for the petrophysical properties of shallow unsaturated formations. Nowadays, astrophysicists and cosmologists are also starting to recognize the 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 713 importance of the MFV method as they apply it in calculating the abundance of primordial Li (Zhang 2017) and the value of the Hubble constant (Zhang 2018). In an earlier study, the combination of the MFV method and cluster analysis was used to separate clayey-shaly coal formations in Hungary (Braun et al. 2016). In this procedure, the distance metric was defined using the MFV method while the centroid coordinates were calculated traditionally as the arithmetic mean. It was demonstrated that the suggested method of cluster analysis gave an outlier-free solution and better resolution of the lithol- ogy compared to the Euclidean norm-based clustering approach. The noise rejection capa- bility of the MFV-based classification procedure was much better than the Euclidean norm- based K-means cluster analysis. In this paper, the previously developed method is improved by using also Cauchy-Steiner weighted mean in determining the centroids of all clusters. This paper aims to prove the feasibility of the new robust MFV-based clustering method in the Barnett Shale Formation as typical organic-rich source rock and to propose the use of it for more robust processing of wireline logging data and advanced evaluation of shale gas and other types of unconventional reservoirs. 2 Non‑hierarchical cluster analysis of well logs Cluster analysis (CA) is an exploratory statistical method that aims to order the objects of multivariate observations into groups using a given distance metric. This classification procedure uses only the information found in the dataset to associate the objects by their similarities. The target is to adjust the objects into non-overlapping groups in a manner that the objects within a group will not differ too much, while they do between the groups. A successful clustering involves a great homogeneity within a group and a large difference between the clusters. (i) (j) In formulating the statistical problem, let vectors x and x denote two multivari- ate objects from the population of N random variables X ,…,X . In well log analysis, X 1 N n indicates a physical variable measured along the borehole by the n-th well logging tool (1 ≤ n ≤ N ). In a more detailed way, the i-th and j-th vectors representing two objects in (i) (i) (i) T (j) (j) (j) T the N-dimensional data space can be written as x = [x ,…,x ] and x = [x ,…,x ] 1 N 1 N (1 ≤ i, j ≤ M , M being the number of depth points). To group the objects into clusters, a measure of the similarity has to be specified. In most of cases, this is an appropriate defini- (i) (j) tion of distance. The Minkowski distance between the x , x vectors is defined as the L norm of their difference N p (i) (j) (1) D = x − x k k k=1 The two most frequently used special case of the Minkowski norm are the L and L 1 2 norm belonging to p = 1 and p = 2, respectively. The L norm (i) (j) D = x − x (2) k k k=1 is called “Manhattan-” or “City block” distance in cluster analysis. The Euclidean or L norm is one of the most frequently used distance metrics in cluster analysis, which is cal- culated as 1 3 714 Acta Geodaetica et Geophysica (2021) 56:711–729 (i) (j) D = (x − x ) (3) k k k=1 The measured variables are usually contaminated by some amount of noise and can be cross-correlated. For this case, the Mahalanobis distance can be preferably used in cluster- ing algorithms 1∕2 (i) (j) −1 (i) (j) D = − C − , (4) where C is the covariance matrix of the measured variables including the data variances in its main diagonal. In a non-correlated case, the Mahalanobis distance can imply the use of the weighted distance � 1 (i) (j) D =� w (x − x ) w k k k � N (5) k=1 q=1 with the weight w ordered to the k-th datum. The Euclidean norm is most successfully used when the data noises follow Gaussian distribution. When the distribution is non- Gaussian (even with outliers in the dataset) the Manhattan distance gives a more robust approach in clustering. Solving geophysical inverse problems, it was proved by Amundsen (1991) that the weighted average using the Cauchy weights w = S + e e is the residual and S is in calculating the misfit function can give a robust result (here the a priori known scale parameter). In the framework of the Most Frequent Values method (MFV), a similar weight was given by Steiner (1988) St w = , (6) 2 2 + e where the called dihesion is derived from the dataset in an inner iteration procedure. It St was proved that solving inverse problems with the use of the w Steiner-weights may result in robust parameter estimation (Dobróka et al. 1991; Szűcs et al. 2006; Szabó et al., 2017). Thus, we introduce the Steiner weights in Eq. (5) and give the robust distance definition as � (i) (j) St D =� w (x − x ) St k k k � N (7) � k=1 St q=1 There are several kinds of techniques to perform cluster analysis. One of the most com- mon hierarchical methods is partitioning, which handles great datasets quickly; however, the results might be affected by the initial selection of centroids. (The cluster center is tra- ditionally defined as the mean of the cluster elements.) In non-hierarchical clustering, one 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 715 of the most popular partitioning algorithms is K-means clustering. This prototype-based technique attempts to find a pre-defined number of clusters (K) including their initial ele- ments and centroids. During the processing of the measured data, each object is assigned to the closest centroid forming a new cluster. This given configuration is iteratively improved by re-calculating the centroids and their Euclidean or other distances from the objects. After a required number of iterations, the centroid positions are no longer change consider- ably and the clustering procedure is stopped. Defining the centroid as the arithmetic mean of the coordinates of the members of the cluster give acceptable results only when the data noise follows Gaussian distribution. A robust centroid definition for the k-th dimension can be given by applying Steiner weights as St St (i) c = w x ∑ (8) i p p i St p=1 q=1 where M is the number of the data belonging to the i-th cluster. With this centroid, the robust (MFV-based) distance definition in Eq. (7) takes the form (i) St st D = w (x − c ) (9) St ∑ k k i St k=1 q=1 q st where c is the Steiner centroid of the i-th cluster in k-th dimension. In K-means clustering, ki the Sum of Squared Error (SSE) is usually calculated for estimating the optimal number of clusters, which measures the robust distance of the observations to their closest centroids K M 2 (i) SSE = D , , (10) St m i=1 m=1 where c represents the centroid of the i-th cluster and M is the total number of objects i i belonging to the same group. The centroid is calculated as the weighted mean of the objects forming a cluster. The SSE versus cluster number plot describes the variation within the clusters. The SSE is zero when K equals to the number of different data objects. The opti- mal number of clusters is indicated on the plot, where there is no considerable variation in the value of SSE, while it converges to zero (known as the elbow method). By choosing a greater value for the number of clusters, hardly adds any more valuable information and makes the interpretation ambiguous or more complicated. 3 Numerical study of the MFV‑based cluster analysis The numerical study of the performance of the MFV-based clustering method can be shown by using synthetic data with predefined model parameters (petrophysical param- eters, layers boundaries). The model assumes that the petrophysical parameters of the four- layered homogenous model are known (Table 1), where the synthetic model supposed the presence of hydrocarbon and water-bearing zones. The calculated well logs (synthetic data) were contaminated with 3% Gaussian noise with artificial outliers to detect the ability to 1 3 716 Acta Geodaetica et Geophysica (2021) 56:711–729 Table 1 The petrophysical model Layer Thickness (m) Ф S V V V w cl c k of an organic rich tight formation for synthetic data calculation. 1 8 0.02 0.9 0.4 0.001 0.002 2 18 0.03 1.0 0.7 0.1 0.017 3 8 0.02 0.7 0.15 0.25 0.12 4 10 0.01 0.6 0.3 0.5 0.032 Denotations: porosity (Ф, [v/v]), water saturation (S , [v/v]), clay vol- ume (V , [v/v]), carbonate volume (V , [v/v]), and kerogen volume cl c (V , [v/v]). (Quartz volume is derived from the material balance equa- tion as Vq = 1-Ф-Vcl-Vc-Vk) Fig. 1 The synthetic well-logging data contaminated by 3% Gaussian distributed noise. Denotations: GR [API] is natural gamma-ray intensity log, SGR [%, ppm] is Spectral gamma-ray intensity log (black curve is potassium (K [%]), the red curve is uranium (U [ppm]), and green is thorium concentration (TH [ppm])), Δt [micros/m] is the compressional acoustic slowness log, Φ [v/v] is neutron log, [g/cm3] is bulk density N b log, PE [barn/e] is photoelectric absorption index, and RT [Ohm-m] is deep resistivity log suppress the effect of outliers. The depth interval between each observation point is 0.1 m. The noisy synthetic well logs are shown in Fig. 1 The relationship between the petrophysical properties and the logging data can be repre- sented by a form of equations called response function. The synthetic data used for testing the traditional and MFV cluster techniques were calculated using the following response functions: −1 GR = V GR + V GR + V GR (11) k k k sh sh sh ma,i ma,i ma,i i=1 −1 K = V K + V K + V K , (12) k k k sh sh sh ma,i ma,i ma,i i=1 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 717 −1 U = V U + V U + V U , (13) k k k sh sh sh ma,i ma,i ma,i i=1 −1 Th = V Th + V Th + V Th , (14) k k k sh sh sh ma,i ma,i ma,i i=1 =Φ ( S )+ 1 − S + V + V + V , (15) b w w g w k k sh sh ma,i ma,i i=1 Φ =Φ (Φ S ) +Φ 1 − S + V + V + V Φ , (16) N w w g w k k sh sh ma,i ma,i i=1 Δt =Φ Δt S + 1 − S Δt + V + V Δt + V Δt , (17) w w w g k k sh sh ma,i ma,i i=1 P =Φ S U + 1 − S U + V + V U + V U , (18) e w w w g k k sh sh ma,i ma,i i=1 (1−0.5V ) sh ⎡ ⎤ V √ Φ n∗ sh ⎢ ⎥ = + S , √ √ √ (19) ⎢ ⎥ R R aR t sh w ⎣ ⎦ Φ+ V + V + V = 1, (20) k sh ma,i i=1 where V , (v/v) refer to the fractional volume of the i-th matrix constituent, V is the ma i k volume of kerogen. The total number of mineral components is defined via n, the frac- tional volume of pore spaces that are free of shale is labelled by Φ (v/v), and the water saturation fraction in the uninvaded area is labelled by S . The physical properties of mud filtrate (mf), hydrocarbon (h), shale (sh), and the rock matrix (ma) are expressed by the zone parameters in Eqs. (11, 12, 13, 14, 15, and 16). Archie’s parameters are the following: a (tortuosity factor), n (saturation exponent), m (cementation exponent). Equation (20) is the material balance equation which, represents a constraint in solving the inverse problem. Which is used to derive the volume of quartz. The layer boundaries can be picked manually or by cutting samples description, but the cluster techniques can detect them automatically. The accuracy and reliability of the results of the cluster technique depend on the initial location of the centroids as well as the avoid- ance of the outlier’s influence. The resulted clusters from both clustering techniques can be shown in Fig. 2. The deduced clusters of the Euclidean do not show a good separation between different clusters, where there are fake clusters at the points of the outliers. In contrast, the Euclidean distance modified by the Steiner weights represents a more smooth- ing solution. The purpose of the clustering is to define the different rock types, where the robustness of the cluster technique can be measured concerning the amount of alteration 1 3 718 Acta Geodaetica et Geophysica (2021) 56:711–729 Fig. 2 The resulted clusters from the Traditional K-means clustering and the MFV-based clustering methods of the data point to define in the different clusters. Consequently, the derivation of the rock typing based on the MFV-based cluster technique can be much more robust than thus derived based on the traditional Euclidean distance clustering. Comparing the results, it can be shown that the MFV-based clustering is more robust and has a better ability to reject the outliers. Besides that, the MFV is independent of the initial location of the centroid. To check the stability of the MFV cluster, the test was repeated 100-times with randomly chosen initial centroids to show the change in the con- vergence of the clustering algorithm and show the independence of the initial location of the centroid. Figure 3 shows remarkable stability of the results of the MFV-based method. Meanwhile, in the case of the traditional cluster analysis method, Fig. 4 shows a high dependency on the initial location of the centroid, furthermore, it proves the high influence of the outlier’s presence. Fig. 3 The mean of the results of 100 times-repeated tests using the MFV-based cluster method 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 719 Fig. 4 The mean of the results of 100 times-repeated tests using the traditional K-means cluster method Table 2 shows the descriptive statistical parameters for both traditional Euclidean dis- tance and Euclidean distance modified by the Steiner weights (Steiner distance). The sta- tistical study shows that the Steiner distance has less SSE lower than half of the Euclidean distance. Moreover, the standard deviation is more than doubled in the case of the tradi- tional Euclidean distance with a very high range. The descriptive statistical study shows stability in the centroid with each iteration whatever the initial location of the centroids. Furthermore, the MFV-based clustering shows a high capability of the outliers’ rejection, as evidenced by Fig. 5, which shows the statistical distribution of the Euclidean distance and Steiner distance within the synthetic well logging data. The tests on synthetic datasets show appreciable stability and outlier resistance of the MFV-based robust clustering procedure. Thus, it is straightforward to apply it in a real field case, which is the Barnett Shale Formation in Texas, USA. 4 Geological setting of the in‑field study area Shale formations mainly consist of quartz, feldspar, clay, carbonate and other miner- als. The relative amount of these components can be various; however, the significant part of the high-quality reservoirs is chiefly silica-bearing. The mineral composition of Table 2 Descriptive statistical Mean Standard Range SSE parameter for comparing the deviation Euclidean and Steiner distances Euclidean distance 3.4 5.5 40.4 1522.3 Steiner distance 1.6 1.5 9 547.7 1 3 720 Acta Geodaetica et Geophysica (2021) 56:711–729 Fig. 5 Frequency plot of both the Euclidean and Steiner distances the greatest shale reservoirs in the USA is illustrated in Fig. 6. The shale-type reser- voir rocks were formed by deposition of fine-grained (clay- to silt-sized) clastic sedi- ments coupled with high organic matter content. Since they were not suffered any kind of migration; thus, they are also a source, a reservoir and a seal themselves (Loucks and Ruppel 2007). The intrinsic permeability of these reservoirs is extremely low, generally smaller than 0.1 mD; their effective porosity is maximum a few percent and the pore- throat diameters range between 2 and 200 nm. The hydrocarbon can be found in the nanometer-sized pore spaces, linked to the residua of organic matter or on the surface of clay minerals (Bjørlykke 2015). Fig. 6 Ternary plot showing the average mineral composition of great shale reservoirs in the USA 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 721 The Barnett Shale Formation investigated in this study is located in the Bend Arch— Fort Worth Basin, (Texas, USA), but some other occurrences are also known in the Harde- man-, Kerr- and Marfa Basins (Jarvie et al. 2007). It is a siliceous shale, calcareous and dolomitic sedimentary formation, which is rich in silicates (30–50%) but poor in clay min- erals (< 30%). The reservoir is thermally matured, Early and Middle Carboniferous in age (Mississippian epoch); mainly recovered by silicate, carbonate and dolomitic sediments (Montgomery et al. 2005) with abundant pyrite and phosphate content (Hickey and Henk, 2007). The formation can be subdivided into five lithofacies based on cores and outcrops, i.e. black shale, calcareous black shale, dolomitic black shale, phosphatic black shale and lime grainstone owing to extensive early microbial alteration of sufficient organic matter (Hickey and Henk, 2007), and to the poor circulation link with the open-ocean and the euxinic ocean bottom waters (Loucks and Ruppel 2007). The average depth of the forma- tion is about 2,550 m and its thickness varies between 15 and 122 m (Montgomery et al. 2005) and it is overlain and underlain by impermeable limestone. It contains dry natural gas and oil, but the latter is in present in much lower quantity. The Forestburg Limestone Formation is breaking up the Barnett Shale Formation into two parts. The lower and upper parts are similar and there is no significant difference in petrology (Jarvie et al. 2007). The general geological setting of the Barnett Shale Formation is shown in Fig. 7. 5 Deterministic interpretation of real well logging data In the Barnett Shale Formation, the following well log types are used for cluster analy- sis: natural gamma-ray intensity (GR [API]), compressional wave slowness (Δt [µs/ft]), bulk density (ρ [g/cm ]), neutron porosity (Φ [v/v]) and deep resistivity (R [ohmm]). b n d The observed length of the investigated borehole is 303 ft (92.35 m), where the general rock type is mainly dolomitic shale with some smaller clean shale interbedded zones. The Fig. 7 Cross-section of wells showing the general lithostratigraphic position of Barnett Shale Formation (Courtesy of Louck and Ruppel 2007) 1 3 722 Acta Geodaetica et Geophysica (2021) 56:711–729 sampling interval of well logs is 0.1 ft (3.048 cm). Besides the input well logs, core meas- urements are also available; among others the porosity, permeability and total organic mat- ter content. The next petrophysical quantities used as reference in cluster analysis are calculated by deterministic well log analysis approaches (Serra 1984). The shale volume (V [v/v]) is sh estimated from the GR log using the non-linear formula of Larionov (1969), which was originally suggested for rocks older than Tertiary 2 i V = 0.033 2 − 1 , (29) sh where i is the natural gamma-ray index. The effective porosity (Φ [v/v]) as a related γ eff,d quantity is derived from the bulk density log Φ =Φ − V Φ , (30) eﬀ ,d d sh sh,d where Φ [v/v] denotes the shale porosity derived from the density log, and Φ [v/v] is sh,d d the density-derived porosity calculated as ma b Φ = , (31) ma f where ρ is the density of the rock matrix and ρ is that of the pore-fluid often assumed to ma f be mud-filtrate density, both measured in g/cm . Several empirical models can be used to calculate the water saturation of hydrocarbon reservoirs (S [v/v]). In this study, the Siman- doux (1963) equation is used ⎧� � ⎫ 1∕2 � � � � 2 2 aR ⎪ V V ⎪ 4Φ w sh sh S = + − , (32) w ⎨ ⎬ 2Φ aR R R R w t sh sh ⎪ ⎪ ⎩ ⎭ where a is the tortuosity factor, m is the cementation exponent, Φ is the neutron-porosity, R is the formation water resistivity, R is the true resistivity, R is the shale resistivity. w t sh In organic-rich shale reservoirs, the total organic matter content (TOC) (and conductive minerals if present) is advised to be integrated into the calculations to give a more accu- rate estimate of water/gas saturation (Xu et al. 2017). The TOC [%] can be related to bulk density TOC = − B, (33) where A and B are gas and formation specific constants (Schmoker and Hester, 1983). The above formula can be used in many types of shale reservoirs mostly when the mineral com- position and porosity show no big variation along with the formation and the organic mat- ter is of low mass density. By using the density and porosity logs apparent matrix density can be easily derived (Asquith and Krygowski 2004) −Φ b f (34) = . a,ma 1 −Φ The permeability of the reservoir (K’ [mD]) is calculated by the model of Coates and Denoo (1981) 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 723 Φ 1 − S w,irr �1∕2 K = 100 , (35) w,irr where S [v/v] is the irreducible water saturation. w,irr 6 The result of cluster analysis The MFV-CA method is tested in the Barnett Shale (Fig. 7). In the first phase, the optimal number of clusters should be specified, which is highly related to the lithological character - istics of the studied formation. To estimate the optimal number of clusters, the SSE curve calculated by Eq. (10) is utilized. The decay rate of the initial SSE in the function of the cluster number is plotted in Fig. 8. The diagram suggests that a suitable cluster number is K ≅ 5 because there is no sig- nificant change in the value of the SSE in the case of clusters more than 5. In preliminary statistical tests, one can select different cluster numbers in the K-means cluster analysis. By choosing four clusters, one could not appropriately separate the pore-space content, while six groups may result in non-interpretable (non-existing) lithology types. A comprehensive approach is applied to well log analysis including deterministic mod- eling and MFV-based cluster analysis. The former is used for calculating the basic petro- physical parameters (Fig. 9), while the latter makes an improved lithological separation of the unconventional hydrocarbon formation. (We assume the following approximation Rd≈Rt.) Shale volume, effective porosity, water (and hydrocarbon) saturation apparent matrix density, permeability and TOC is calculated using Eqs. (29, 30, 31, 32, 33, 34 and 35). The well log of the ordinal number of clusters (last track) agrees well with the litho- logical variations and fluid characteristics in the processed interval (tracks 2 and 6). The evaluated values of effective porosity, permeability and TOC are acceptably confirmed by laboratory core measurements in tracks 4, 7 and 8. The resultant clusters linked to the estimated petrophysical parameters are shown in Fig. 10. The plots present the vertical distribution of petrophysical parameters, where every measur- ing point is colored according to its cluster for the hydrocarbon saturation, shale volume and Fig. 8 The Sum of Squared Error diagram applied to find the optimal number of clusters in the Barnett Shale Formation 1 3 724 Acta Geodaetica et Geophysica (2021) 56:711–729 Fig. 9 The input well logs, the estimated petrophysical parameters and the result of MFV-CA cluster analy- sis in the Barnett Shale Formation TOC. As it can be seen in Figs. 9 and 10, the amount of shaliness has a great correlation with the result of cluster analysis. In the case of the hydrocarbon saturation, the correlation is also good, however, cluster 3 shows relatively high uncertainty (i.e. covers a wide range of satura- tion values). The MFV-CA method allows effective separation of different lithological units, despite having sometimes overlap in the individual values of petrophysical parameters. The clusters related to the lithology and pore content is summarized in Table 3. The numerical val- ues also show a good correlation between the shale volume and the gas saturation. In case of 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 725 Fig. 10 Petrophysical parameters associated with clusters determined by the MFV-CA method Table 3 Shale volume and the saturation of dominant pore fluid-types related to clusters determined by the MFV-CA method Ordinal num- Color of cluster Shale volume (v/v) Dominant HC type HC saturation (v/v) ber of cluster 1 Green High (0.47 ± 0.13) Natural gas High (0.77 ± 0.09) 2 Yellow Low (0.13 ± 0.03) Oil Medium (0.3 ± 0.15) 3 Blue Low (0.18 ± 0.02) Natural gas Low to high (0.4 ± 0.4) 4 Red Medium (0.27 ± 0.07) Natural gas High (0.7 ± 0.13) 5 Black Very high (0.8 ± 0.2) Natural gas High (0.7 ± 0.1) shale volumes higher than 0.2 v/v, the gas saturation significantly increases, too. The oil phase can only be found below a shale volume of 0.15 v/v. In the intervals of 8550 and 8640 ft, and 8740 and 8790 ft, the cluster analysis reveals a change in shale composition relative to its environment. The MFV-CA method is applicable not just to separate the shale type, but the fluid types, too. Cluster 2 shows well the oil intervals, where some smaller uncertainty may be caused by the varying absorbed gas content of the Barnett Shale Formation. 1 3 726 Acta Geodaetica et Geophysica (2021) 56:711–729 7 Discussion The performance of the newly developed MFV-based clustering (MFV-CA) method is demonstrated in the previous section. In many cases, the classical Euclidean norm-based clustering (E-CA) process shows high sensitivity to the outliers, thus, it categorizes the geologically similar data points into different clusters. According to our tests made in the Barnett Shale Formation, the result of the MFV-CA method stands closer to those of deterministic evaluation of well logs. The results of the two clustering methods can be compared in Fig. 11. In the figure, two greater zones are emphasized to show that the MFV-CA method usually gives better vertical resolution than Euclidean-CA. In addition to it, because of its high noise sensitivity, there are some other smaller misleading inter- vals along with the classical Euclidean-CA log. It must be mentioned that both cluster- ing results were compared to the water saturation log calculated by Eq. (32), which is not an optimal method, because of the complexity of the given unconventional reservoir. Fig. 11 The evaluation results and the vertical resolution of the MFV-CA and the classical E-CA method 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 727 8 Conclusions A novel multivariate statistical approach is suggested for unconventional oilfield well log analysis. The non-hierarchical K-means cluster analysis was improved by the Most Frequent Value method, which provides the exploration of unconventional, geologically complex reservoirs with additional petrophysical information and a robust solution. By choosing a distance metric using the Cauchy-Steiner weights, one can exclude the harm- ful effect of outliers and make the cluster analysis of well logs more reliable. One of the advantages of the suggested method is that it does not need the preset of weighting coefficients, since they are automatically defined for any dataset in course of the itera- tion process. Besides effective rock typing, the new technique gives promising results in defining the sub-intervals within the shale formation. In our example, a significant correlation was found between the results of MFV-CA and the lithology and pore con- tent. The water saturation is calculated utilizing a conventional resistivity model, which might influence the results referring to hydrocarbon type and saturation. For a better interpretation, one needs to consider the TOC during the calculation of saturation. The suggested method is relatively quick, in the case of big datasets, as well; the CPU time for the presented dataset was ~ 25 s on a dual-core personal computer. In the future, the application of more types of well logs may give even more reliable interpretation results, and by the multi-well application of the MFV-CA method, a fast and automated 2D or 3D evaluation of shale gas formations can be performed. Acknowledgements Our investigation was supported by the National Scientific Research Fund, Project numbers OTKA K-135323. The second author thanks to Michael Holmes, Antony Holmes, Dominic Hol- mes and Texas Gas Service Plc for the permission to use their digital well logging dataset. Author contributions Tamás Fancsik: conceptualization, mathematical derivations, inversion. Endre Turai: Induced Polarization methodology. Norbert Péter Szabó: Inversion methodology, review and editing. Judit Somogyiné Molnár: English text editing, visualization. Tünde Edit Dobróka: software developments, tests and figure editing. Mihály Dobróka: inversion, mathematical details, text editing. Funding Open access funding provided by University of Miskolc. The research was carried out in a project No. K-135323 supported by the National Research, Development and Innovation Office (NKFIH). Declarations Conflicts of interest The authors have no conflicts of interest to declare that are relevant to the content of this article. Availability of data and material Because of the data confidentiality, the experimental data is not published. Code availability Because of the data confidentiality, the code is not published. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com- mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. 1 3 728 Acta Geodaetica et Geophysica (2021) 56:711–729 References Amundsen L (1991) Comparison of the least-squares criterion and the Cauchy criterion in frequency- wavenumber inversion. Geophysics 56:2027–2038 Archie GE (1942) The electrical resistivity log as an aid in determining some reservoir characteristics. Transactions of the AIME 146:54–62 nd Asquith G, Krygowski D (2004) Basic Well Log Analysis, 2 Edition. AAPG Methods in Exploration Series, no. 16. Bibor I, Szabó NP (2016) Unconventional shale characterization using improved well logging methods. Geosciences and Engineering 5(8):32–50 Bjørlykke K (2015) Unconventional hydrocarbons: oil shales heavy oil, tar sands, shale oil, shale gas and gas hydrates petroleum geoscience - from sedimentary environments to rock physics. Springer- Verlag, Berlin, pp 581–590 Braun BA, Abordán A, Szabó NP (2016) Lithology determination in a coal exploration drill hole using Steiner weighted cluster analysis. Geosciences and Engineering, A Publication of the University of Miskolc 5(8):51–64 Coates GR, Denoo S (1981) The Producibility Answer Product. Schlumberger Technical Review 29(2):55–63 Dobróka M, Gyulai Á, Ormos T, Csókás J, Dresen L (1991) Joint inversion of seismic and geoelectric data recorded in an underground coal mine. Geophys Prospect 39:643–665 Dobróka M, Szegedi H, Somogyiné MJ (2014) A new robust inversion method using cauchy-steiner th weights and its application in data processing. Near Surface Geoscience 2014 - 20 European Meeting of Environmental and Engineering Geophysics, Athens. Bárdossy Gy., Fodor J. 2004. Evaluation of Uncertainties and Risks in Geology. New Mathematical Approaches for their Handling. Springer, 221. Hickey JJ, Henk B (2007) Lithofacies summary of the Mississippian Barnett shale Mitchell 2 T.P. Sims well, wise county, Texas. AAPG Bull 91(4):437–443 Jarvie MD, Hill JR, Ruble ET, Pollastro MR (2007) Unconventional shale-gas systems: the Missis- sippian Barnett shale of north-central Texas as one model for thermogenic shale-gas assessment. AAPG Bull 91(4):475–499 Kazmierczuk M, Jarzyna J (2006) Improvement of lithology and saturation determined from well-log- ging using statistical methods. Acta Geophys 54(4):378–398 Larionov VV (1969) Radiometry of boreholes. Nedra, Moscow (in Russian) Loucks RG, Ruppel SC (2007) Mississippian Barnett Shale: Lithofacies and depositional setting of a deep-water shale-gas succession in the Fort Worth Basin. Texas AAPG Bulletin 91(4):579–601 Montgomery LS, Jarvie MD, Bowker AK, Pollastro MR (2005) Mississippian Barnett Shale, fort worth basin, north-central Texas: Gas-shale play with multi–trillion cubic foot potential. AAPG Bull 89(2):155–175 Paasche H, Tronicke J (2007) Cooperative inversion of 2D geophysical data sets: A zonal approach based on fuzzy c-means cluster analysis. Geophysics 72(3):A35–A39 Passey QR, Creaney S, Kulla BJ, Moretti FJ, Stroud JD (1990) A practical model for organic richness from porosity and resistivity logs. AAPG Bull 74(12):1777–1794 Schmoker JW, Hester TC (1983) Organic carbon in bakken formation, United States portion of Williston Basin. AAPG Bulletin 67(12):2165–2174 Serra O (1984) Fundamentals of well-log interpretation. Elsevier, Amsterdam Sfidari E, K-Ilkichi A, R-Bbonab H, Soltani B (2014) A hybrid approach for litho-facies characterization in the framework of sequence stratigraphy: A case study from the South Pars gas field, the Persian Gulf basin. J Petrol Sci Eng 121:87–102 Simandoux P (1963) Dielectric measurements in porous media and application to shaly formation: Revue de L’Institut Français du Pétrole. Supplementary Issue 18:193–215 Steiner F (1988) Most frequent value procedures (a short monograf). Geophys Trans 34:139–260 Steiner F (1991) The most frequent value. Introduction to a modern conception of statistics. Academic Press, Budapest Szabó NP (2011) Shale volume estimation based on the factor analysis of well-logging data. Acta Geo- phys 59:935–953 Szabó NP, Dobróka M (2013) Extending the application of a shale volume estimation formula derived from factor analysis of wireline logging data. Math Geosci 45:837–850 Szabó NP, Balogh GP, Stickel J (2017) Most frequent value based factor analysis of direct-push logging data. Geophys Prospect 66(3):530–548 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 729 Szűcs P, Civan F, Virág M (2006) Applicability of the most frequent value method in groundwater mod- elling. Hydrogeol J 14:31–43 Vriend SP, van Gaans PFM, Middelburg JJ, de Nijs T (1988) The application of fuzzy c-means cluster analysis and non-linear mapping to geochemical datasets: Examples from Portugal. Appl Geochem 3(2):213–224 Xu J, Xu L, Qin Y (2017) Two effective methods for calculating water saturations in shale-gas reservoirs. Geophysics 82(3):D187–D197 Zhang J (2017) Most frequent value statistics and distribution of Li abundance observations. Mon Not R Astron Soc 468(4):5014–5019 Zhang J (2018) Most frequent value statistics and the hubble constant. Publ Astron Soc Pac 130(990):1538–3873 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png "Acta Geodaetica et Geophysica" Springer Journals http://www.deepdyve.com/lp/springer-journals/improved-well-logs-clustering-algorithm-for-shale-gas-identification-EUfZJrKuGe

Loading next page...

References (38)

S. Montgomery, D. Jarvie, K. Bowker, R. Pollastro (2005)
Mississippian Barnett Shale, Fort Worth basin, north-central Texas: Gas-shale play with multi–trillion cubic foot potential
AAPG Bulletin, 89
H. Szegedi, M. Dobróka, J. Molnár (2014)
A New Robust Inversion Method Using Cauchy-Steiner Weights – And Its Application in Data Processing
J. Schmoker, T. Hester (1983)
Organic Carbon in Bakken Formation, United States Portion of Williston Basin
AAPG Bulletin, 67
G. Archie (1942)
The electrical resistivity log as an aid in determining some reservoir characteristics
Transactions of the AIME, 146
Bibor István (2017)
Unconventional Shale Characterization Using Improved Well Logging Methods
, 5
(2004)
Basic Well Log Analysis, 2nd Edition
S. Vriend, P. Gaans, J. Middelburg, A. Nijs (1988)
The application of fuzzy c-means cluster analysis and non-linear mapping to geochemical datasets: examples from Portugal
Applied Geochemistry, 3
Q. Passey, S. Creaney, J. Kulla, F. Moretti, J. Stroud (1990)
A Practical Model for Organic Richness from Porosity and Resistivity Logs
AAPG Bulletin, 74
Jiangyi Zhang (2017)
Most frequent value statistics and distribution of 7Li abundance observations
Monthly Notices of the Royal Astronomical Society, 468
N. Szabó (2011)
Shale volume estimation based on the factor analysis of well-logging data
Acta Geophysica, 59
G. Bárdossy (2010)
Evaluation of Uncertainties and Risks in Geology: New Mathematical Approaches for their Handling
J. Hickey, Bo Henk (2007)
Lithofacies summary of the Mississippian Barnett Shale, Mitchell 2 T.P. Sims well, Wise County, Texas
AAPG Bulletin, 91
P. Szűcs, F. Civan, M. Virág (2006)
Applicability of the most frequent value method in groundwater modeling
Hydrogeology Journal, 14
(2017)
2017) Most frequent value statistics and distribution
N. Szabó, M. Dobróka (2013)
Extending the Application of a Shale Volume Estimation Formula Derived from Factor Analysis of Wireline Logging Data
Mathematical Geosciences, 45
F. Steiner, J. Verö, P. Kerékfy, G. Korvin (1993)
The most frequent value : introduction to a modern conception of statistics
Biometrics, 49
L. Dresen, M. Dobróka, Á. Gyulai, T. Ormos, J. Csokas (1991)
Joint inversion of seismic and geoelectric data recorded in an underground coal mine
K. Bjørlykke (2015)
Unconventional Hydrocarbons: Oil Shales, Heavy Oil, Tar Sands, Shale Oil, Shale Gas and Gas Hydrates
(1988)
Most frequent value procedures (a short monograf)
H. Paasche, J. Tronicke (2007)
Cooperative inversion of 2D geophysical data sets: A zonal approach based on fuzzy c-means cluster analysis
Geophysics, 72
M. Dobróka, Á. Gyulai, T. Ormos, J. Csokas, L. Dresen (1991)
JOINT INVERSION OF SEISMIC AND GEOELECTRIC DATA RECORDED IN AN UNDERGROUND COAL MINE1
Geophysical Prospecting, 39
G. Bárdossy, J. Fodor, G. Bárdossy (2004)
Evaluation of Uncertainties and Risks in Geology
M. Kaźmierczuk, J. Jarzyna (2006)
Improvement of lithology and saturation determined from well logging using statistical methods
Acta Geophysica, 54
D. Jarvie, R. Hill, T. Ruble, R. Pollastro (2007)
Unconventional shale-gas systems: The Mississippian Barnett Shale of north-central Texas as one model for thermogenic shale-gas assessment
AAPG Bulletin, 91
L. Amundsen (1991)
Comparison of the least-squares criterion and the Cauchy criterion in frequency-wavenumber inversion
Geophysics, 56
(2016)
Lithology determination in a coal exploration drill hole using Steiner weighted cluster analysis. Geosciences and Engineering
Jiang Zhang (2018)
Most Frequent Value Statistics and the Hubble Constant
Publications of the Astronomical Society of the Pacific, 130
(1981)
The Producibility Answer Product
E. Sfidari, A. Kadkhodaie-Ilkhchi, Hossain Rahimpour-Bbonab, B. Soltani (2014)
A hybrid approach for litho-facies characterization in the framework of sequence stratigraphy: A case study from the South Pars gas field, the Persian Gulf basin
Journal of Petroleum Science and Engineering, 121
N. Szabó, Gergely Balogh, J. Stickel (2018)
Most frequent value‐based factor analysis of direct‐push logging data
Geophysical Prospecting, 66
(1981)
The Producibility Answer Product. Schlumberger Technical Review
(1963)
Dielectric measurements in porous media and application to shaly formation: Revue de L’Institut Français du Pétrole
O. Serra, H. Abbott (1984)
Fundamentals of well-log interpretation
(1969)
Radiometry of boreholes
(2016)
Lithology determination in a coal exploration drill hole using Steiner weighted cluster analysis. Geosciences and Engineering, A Publication of the University of Miskolc
Jingling Xu, Lei Xu, Yuxing Qin (2017)
Two effective methods for calculating water saturations in shale-gas reservoirs
Geophysics, 82
A. Bence (2017)
Lithology Determination in a Coal Exploration Drillhole Using Steiner Weighted Cluster Analysis
, 5
R. Loucks, S. Ruppel (2007)
Mississippian Barnett Shale: Lithofacies and depositional setting of a deep-water shale-gas succession in the Fort Worth Basin, Texas
AAPG Bulletin, 91

Publisher: Springer Journals
Copyright: Copyright © The Author(s) 2021
ISSN: 2213-5812
eISSN: 2213-5820
DOI: 10.1007/s40328-021-00358-0
Publisher site: See Article on Publisher Site

Abstract

The identification of lithology, fluid types, and total organic carbon content are of great priority in the exploration of unconventional hydrocarbons. As a new alternative, a fur- ther developed K-means type clustering method is suggested for the evaluation of shale gas formations. The traditional approach of cluster analysis is mainly based on the use of the Euclidean distance for grouping the objects of multivariate observations into different clusters. The high sensitivity of the L norm applied to non-Gaussian distributed meas- urement noises is well-known, which can be reduced by selecting a more suitable norm as distance metrics. To suppress the harmful effect of non-systematic errors and outlying data, the Most Frequent Value method as a robust statistical estimator is combined with the K-means clustering algorithm. The Cauchy-Steiner weights calculated by the Most Fre- quent Value procedure is applied to measure the weighted distance between the objects, which improves the performance of cluster analysis compared to the Euclidean norm. At the same time, the centroids are also calculated as a weighted average (using the Most Frequent Value method), instead of applying arithmetic mean. The suggested statistical method is tested using synthetic datasets as well as observed wireline logs, mud-logging data and core samples collected from the Barnett Shale Formation, USA. The synthetic experiment using extremely noisy well logs demonstrates that the newly developed robust clustering procedure is able to separate the geological-lithological units in hydrocarbon formations and provide additional information to standard well log analysis. It is also shown that the Cauchy-Steiner weighted cluster analysis is affected less by outliers, which allows a more efficient processing of poor-quality wireline logs and an improved evaluation of shale gas reservoirs. Keywords Most Frequent Value · K-means clustering · Robust · Well log · Shale gas · Barnett Shale * N. P. Szabó norbert.szabo.phd@gmail.com Department of Geophysics, University of Miskolc, 3515 Miskolc-Egyetemváros, Hungary Geoengineering Research Group, MTA-ME, University of Miskolc, 3515 Miskolc-Egyetemváros, Hungary 1 3 Vol.:(0123456789) 712 Acta Geodaetica et Geophysica (2021) 56:711–729 1 Introduction The exploration and production of unconventional reservoirs are performed with ever- increasing intensity all over the world. By using massive hydraulic fracturing and horizon- tal drilling technology, low permeability hydrocarbon-bearing formations may be produced easier and more economical in commercial quantities to supply greater regions with energy. Among the great variety of rock types, this paper focuses mainly on the investigation of shale gas reservoirs. The Barnett Shale Formation, studied in this paper, is one of the ear- liest discovered onshore shales having a great amount of producible gas reserves (Jarvie et al. 2007). The presence of strata having the potential to store hydrocarbon does not mean that one has a direct hydrocarbon occurrence, thus, the accurate and reliable determination of fluid content and the volume of reserves is crucial. Since the unconventional reservoirs are usually complex multi-mineral formations, the conventional well-log-analysis methods are rarely applicable, because several petrophysical properties as unknowns may influence the measurements. Empirical estimation methods of water saturation (Archie 1942) and total organic content (Passey et al. 1990) should be revised in the given exploration area (Bibor and Szabó 2016; Xu et al. 2017). Exploratory statistical tools such as principal component, factor and cluster analysis can be used as effective data processing tools for rock typing, the estimation of petrophysical properties and the replacement of missing measurements in well logging applications. For instance, the factor analysis of wireline logs allows an improved characterization of shaly sand formations (Szabó 2011; Szabó and Dobróka 2013). In this study, a novel approach for cluster analysis is presented to subdivide the lithological units of shale gas reservoirs and infer the fluid content based on well logs originated from the Barnett Shale Formation. The traditional method of cluster analysis has been widely used in geosciences, e.g. for the processing of geochemical data acquired from water wells (Vriend et al. 1988). Kazmierc- zuk and Jarzyna (2006) studied its possibilities in lithology and hydrocarbon saturation determination using well logging measurements in Poland. Paasche and Tronicke (2007) used the fuzzy K-means cluster analysis successfully for subsurface zonation and petro- physical modelling. The fuzzy statistics formed the basis of risk and uncertainty analysis in Hungarian bauxite exploration (Bárdossy and Fodor 2004). Sfidari et al. (2014) utilized cluster analysis incorporated in a hybrid approach for lithofacies mapping in sequence stra- tigraphy in the South Pars gas field, Persian Gulf basin. The robust non-hierarchical clustering algorithm proposed in this paper makes use of the Most Frequent Value (MFV) technique, which is known as a highly efficient robust statisti- cal estimator (Steiner 1991). The MFV method relies on the minimization of the informa- tion loss (relative entropy), practically regardless of the given error distribution. Dobróka et al. (1991) developed a geophysical joint inversion method using the MFV procedure for the joint interpretation of seismic and geoelectric data collected in an underground coal mine. Szűcs et al. (2006) showed the application possibilities of the MFV method in hydro- geological modelling. Dobróka et al. (2014) suggested a series expansion-based inversion method for the calculation of the Fourier transform using the Cauchy-Steiner weights esti- mated by the MFV method. It was shown that the MFV method has high noise rejection capability and gives a better estimate of the frequency spectrum of seismic signals than the conventional Discrete Fourier Transformation method. The MFV technique was also implemented in factor analysis of engineering geophysical sounding data by Szabó et al. (2017) to give a reliable estimation for the petrophysical properties of shallow unsaturated formations. Nowadays, astrophysicists and cosmologists are also starting to recognize the 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 713 importance of the MFV method as they apply it in calculating the abundance of primordial Li (Zhang 2017) and the value of the Hubble constant (Zhang 2018). In an earlier study, the combination of the MFV method and cluster analysis was used to separate clayey-shaly coal formations in Hungary (Braun et al. 2016). In this procedure, the distance metric was defined using the MFV method while the centroid coordinates were calculated traditionally as the arithmetic mean. It was demonstrated that the suggested method of cluster analysis gave an outlier-free solution and better resolution of the lithol- ogy compared to the Euclidean norm-based clustering approach. The noise rejection capa- bility of the MFV-based classification procedure was much better than the Euclidean norm- based K-means cluster analysis. In this paper, the previously developed method is improved by using also Cauchy-Steiner weighted mean in determining the centroids of all clusters. This paper aims to prove the feasibility of the new robust MFV-based clustering method in the Barnett Shale Formation as typical organic-rich source rock and to propose the use of it for more robust processing of wireline logging data and advanced evaluation of shale gas and other types of unconventional reservoirs. 2 Non‑hierarchical cluster analysis of well logs Cluster analysis (CA) is an exploratory statistical method that aims to order the objects of multivariate observations into groups using a given distance metric. This classification procedure uses only the information found in the dataset to associate the objects by their similarities. The target is to adjust the objects into non-overlapping groups in a manner that the objects within a group will not differ too much, while they do between the groups. A successful clustering involves a great homogeneity within a group and a large difference between the clusters. (i) (j) In formulating the statistical problem, let vectors x and x denote two multivari- ate objects from the population of N random variables X ,…,X . In well log analysis, X 1 N n indicates a physical variable measured along the borehole by the n-th well logging tool (1 ≤ n ≤ N ). In a more detailed way, the i-th and j-th vectors representing two objects in (i) (i) (i) T (j) (j) (j) T the N-dimensional data space can be written as x = [x ,…,x ] and x = [x ,…,x ] 1 N 1 N (1 ≤ i, j ≤ M , M being the number of depth points). To group the objects into clusters, a measure of the similarity has to be specified. In most of cases, this is an appropriate defini- (i) (j) tion of distance. The Minkowski distance between the x , x vectors is defined as the L norm of their difference N p (i) (j) (1) D = x − x k k k=1 The two most frequently used special case of the Minkowski norm are the L and L 1 2 norm belonging to p = 1 and p = 2, respectively. The L norm (i) (j) D = x − x (2) k k k=1 is called “Manhattan-” or “City block” distance in cluster analysis. The Euclidean or L norm is one of the most frequently used distance metrics in cluster analysis, which is cal- culated as 1 3 714 Acta Geodaetica et Geophysica (2021) 56:711–729 (i) (j) D = (x − x ) (3) k k k=1 The measured variables are usually contaminated by some amount of noise and can be cross-correlated. For this case, the Mahalanobis distance can be preferably used in cluster- ing algorithms 1∕2 (i) (j) −1 (i) (j) D = − C − , (4) where C is the covariance matrix of the measured variables including the data variances in its main diagonal. In a non-correlated case, the Mahalanobis distance can imply the use of the weighted distance � 1 (i) (j) D =� w (x − x ) w k k k � N (5) k=1 q=1 with the weight w ordered to the k-th datum. The Euclidean norm is most successfully used when the data noises follow Gaussian distribution. When the distribution is non- Gaussian (even with outliers in the dataset) the Manhattan distance gives a more robust approach in clustering. Solving geophysical inverse problems, it was proved by Amundsen (1991) that the weighted average using the Cauchy weights w = S + e e is the residual and S is in calculating the misfit function can give a robust result (here the a priori known scale parameter). In the framework of the Most Frequent Values method (MFV), a similar weight was given by Steiner (1988) St w = , (6) 2 2 + e where the called dihesion is derived from the dataset in an inner iteration procedure. It St was proved that solving inverse problems with the use of the w Steiner-weights may result in robust parameter estimation (Dobróka et al. 1991; Szűcs et al. 2006; Szabó et al., 2017). Thus, we introduce the Steiner weights in Eq. (5) and give the robust distance definition as � (i) (j) St D =� w (x − x ) St k k k � N (7) � k=1 St q=1 There are several kinds of techniques to perform cluster analysis. One of the most com- mon hierarchical methods is partitioning, which handles great datasets quickly; however, the results might be affected by the initial selection of centroids. (The cluster center is tra- ditionally defined as the mean of the cluster elements.) In non-hierarchical clustering, one 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 715 of the most popular partitioning algorithms is K-means clustering. This prototype-based technique attempts to find a pre-defined number of clusters (K) including their initial ele- ments and centroids. During the processing of the measured data, each object is assigned to the closest centroid forming a new cluster. This given configuration is iteratively improved by re-calculating the centroids and their Euclidean or other distances from the objects. After a required number of iterations, the centroid positions are no longer change consider- ably and the clustering procedure is stopped. Defining the centroid as the arithmetic mean of the coordinates of the members of the cluster give acceptable results only when the data noise follows Gaussian distribution. A robust centroid definition for the k-th dimension can be given by applying Steiner weights as St St (i) c = w x ∑ (8) i p p i St p=1 q=1 where M is the number of the data belonging to the i-th cluster. With this centroid, the robust (MFV-based) distance definition in Eq. (7) takes the form (i) St st D = w (x − c ) (9) St ∑ k k i St k=1 q=1 q st where c is the Steiner centroid of the i-th cluster in k-th dimension. In K-means clustering, ki the Sum of Squared Error (SSE) is usually calculated for estimating the optimal number of clusters, which measures the robust distance of the observations to their closest centroids K M 2 (i) SSE = D , , (10) St m i=1 m=1 where c represents the centroid of the i-th cluster and M is the total number of objects i i belonging to the same group. The centroid is calculated as the weighted mean of the objects forming a cluster. The SSE versus cluster number plot describes the variation within the clusters. The SSE is zero when K equals to the number of different data objects. The opti- mal number of clusters is indicated on the plot, where there is no considerable variation in the value of SSE, while it converges to zero (known as the elbow method). By choosing a greater value for the number of clusters, hardly adds any more valuable information and makes the interpretation ambiguous or more complicated. 3 Numerical study of the MFV‑based cluster analysis The numerical study of the performance of the MFV-based clustering method can be shown by using synthetic data with predefined model parameters (petrophysical param- eters, layers boundaries). The model assumes that the petrophysical parameters of the four- layered homogenous model are known (Table 1), where the synthetic model supposed the presence of hydrocarbon and water-bearing zones. The calculated well logs (synthetic data) were contaminated with 3% Gaussian noise with artificial outliers to detect the ability to 1 3 716 Acta Geodaetica et Geophysica (2021) 56:711–729 Table 1 The petrophysical model Layer Thickness (m) Ф S V V V w cl c k of an organic rich tight formation for synthetic data calculation. 1 8 0.02 0.9 0.4 0.001 0.002 2 18 0.03 1.0 0.7 0.1 0.017 3 8 0.02 0.7 0.15 0.25 0.12 4 10 0.01 0.6 0.3 0.5 0.032 Denotations: porosity (Ф, [v/v]), water saturation (S , [v/v]), clay vol- ume (V , [v/v]), carbonate volume (V , [v/v]), and kerogen volume cl c (V , [v/v]). (Quartz volume is derived from the material balance equa- tion as Vq = 1-Ф-Vcl-Vc-Vk) Fig. 1 The synthetic well-logging data contaminated by 3% Gaussian distributed noise. Denotations: GR [API] is natural gamma-ray intensity log, SGR [%, ppm] is Spectral gamma-ray intensity log (black curve is potassium (K [%]), the red curve is uranium (U [ppm]), and green is thorium concentration (TH [ppm])), Δt [micros/m] is the compressional acoustic slowness log, Φ [v/v] is neutron log, [g/cm3] is bulk density N b log, PE [barn/e] is photoelectric absorption index, and RT [Ohm-m] is deep resistivity log suppress the effect of outliers. The depth interval between each observation point is 0.1 m. The noisy synthetic well logs are shown in Fig. 1 The relationship between the petrophysical properties and the logging data can be repre- sented by a form of equations called response function. The synthetic data used for testing the traditional and MFV cluster techniques were calculated using the following response functions: −1 GR = V GR + V GR + V GR (11) k k k sh sh sh ma,i ma,i ma,i i=1 −1 K = V K + V K + V K , (12) k k k sh sh sh ma,i ma,i ma,i i=1 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 717 −1 U = V U + V U + V U , (13) k k k sh sh sh ma,i ma,i ma,i i=1 −1 Th = V Th + V Th + V Th , (14) k k k sh sh sh ma,i ma,i ma,i i=1 =Φ ( S )+ 1 − S + V + V + V , (15) b w w g w k k sh sh ma,i ma,i i=1 Φ =Φ (Φ S ) +Φ 1 − S + V + V + V Φ , (16) N w w g w k k sh sh ma,i ma,i i=1 Δt =Φ Δt S + 1 − S Δt + V + V Δt + V Δt , (17) w w w g k k sh sh ma,i ma,i i=1 P =Φ S U + 1 − S U + V + V U + V U , (18) e w w w g k k sh sh ma,i ma,i i=1 (1−0.5V ) sh ⎡ ⎤ V √ Φ n∗ sh ⎢ ⎥ = + S , √ √ √ (19) ⎢ ⎥ R R aR t sh w ⎣ ⎦ Φ+ V + V + V = 1, (20) k sh ma,i i=1 where V , (v/v) refer to the fractional volume of the i-th matrix constituent, V is the ma i k volume of kerogen. The total number of mineral components is defined via n, the frac- tional volume of pore spaces that are free of shale is labelled by Φ (v/v), and the water saturation fraction in the uninvaded area is labelled by S . The physical properties of mud filtrate (mf), hydrocarbon (h), shale (sh), and the rock matrix (ma) are expressed by the zone parameters in Eqs. (11, 12, 13, 14, 15, and 16). Archie’s parameters are the following: a (tortuosity factor), n (saturation exponent), m (cementation exponent). Equation (20) is the material balance equation which, represents a constraint in solving the inverse problem. Which is used to derive the volume of quartz. The layer boundaries can be picked manually or by cutting samples description, but the cluster techniques can detect them automatically. The accuracy and reliability of the results of the cluster technique depend on the initial location of the centroids as well as the avoid- ance of the outlier’s influence. The resulted clusters from both clustering techniques can be shown in Fig. 2. The deduced clusters of the Euclidean do not show a good separation between different clusters, where there are fake clusters at the points of the outliers. In contrast, the Euclidean distance modified by the Steiner weights represents a more smooth- ing solution. The purpose of the clustering is to define the different rock types, where the robustness of the cluster technique can be measured concerning the amount of alteration 1 3 718 Acta Geodaetica et Geophysica (2021) 56:711–729 Fig. 2 The resulted clusters from the Traditional K-means clustering and the MFV-based clustering methods of the data point to define in the different clusters. Consequently, the derivation of the rock typing based on the MFV-based cluster technique can be much more robust than thus derived based on the traditional Euclidean distance clustering. Comparing the results, it can be shown that the MFV-based clustering is more robust and has a better ability to reject the outliers. Besides that, the MFV is independent of the initial location of the centroid. To check the stability of the MFV cluster, the test was repeated 100-times with randomly chosen initial centroids to show the change in the con- vergence of the clustering algorithm and show the independence of the initial location of the centroid. Figure 3 shows remarkable stability of the results of the MFV-based method. Meanwhile, in the case of the traditional cluster analysis method, Fig. 4 shows a high dependency on the initial location of the centroid, furthermore, it proves the high influence of the outlier’s presence. Fig. 3 The mean of the results of 100 times-repeated tests using the MFV-based cluster method 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 719 Fig. 4 The mean of the results of 100 times-repeated tests using the traditional K-means cluster method Table 2 shows the descriptive statistical parameters for both traditional Euclidean dis- tance and Euclidean distance modified by the Steiner weights (Steiner distance). The sta- tistical study shows that the Steiner distance has less SSE lower than half of the Euclidean distance. Moreover, the standard deviation is more than doubled in the case of the tradi- tional Euclidean distance with a very high range. The descriptive statistical study shows stability in the centroid with each iteration whatever the initial location of the centroids. Furthermore, the MFV-based clustering shows a high capability of the outliers’ rejection, as evidenced by Fig. 5, which shows the statistical distribution of the Euclidean distance and Steiner distance within the synthetic well logging data. The tests on synthetic datasets show appreciable stability and outlier resistance of the MFV-based robust clustering procedure. Thus, it is straightforward to apply it in a real field case, which is the Barnett Shale Formation in Texas, USA. 4 Geological setting of the in‑field study area Shale formations mainly consist of quartz, feldspar, clay, carbonate and other miner- als. The relative amount of these components can be various; however, the significant part of the high-quality reservoirs is chiefly silica-bearing. The mineral composition of Table 2 Descriptive statistical Mean Standard Range SSE parameter for comparing the deviation Euclidean and Steiner distances Euclidean distance 3.4 5.5 40.4 1522.3 Steiner distance 1.6 1.5 9 547.7 1 3 720 Acta Geodaetica et Geophysica (2021) 56:711–729 Fig. 5 Frequency plot of both the Euclidean and Steiner distances the greatest shale reservoirs in the USA is illustrated in Fig. 6. The shale-type reser- voir rocks were formed by deposition of fine-grained (clay- to silt-sized) clastic sedi- ments coupled with high organic matter content. Since they were not suffered any kind of migration; thus, they are also a source, a reservoir and a seal themselves (Loucks and Ruppel 2007). The intrinsic permeability of these reservoirs is extremely low, generally smaller than 0.1 mD; their effective porosity is maximum a few percent and the pore- throat diameters range between 2 and 200 nm. The hydrocarbon can be found in the nanometer-sized pore spaces, linked to the residua of organic matter or on the surface of clay minerals (Bjørlykke 2015). Fig. 6 Ternary plot showing the average mineral composition of great shale reservoirs in the USA 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 721 The Barnett Shale Formation investigated in this study is located in the Bend Arch— Fort Worth Basin, (Texas, USA), but some other occurrences are also known in the Harde- man-, Kerr- and Marfa Basins (Jarvie et al. 2007). It is a siliceous shale, calcareous and dolomitic sedimentary formation, which is rich in silicates (30–50%) but poor in clay min- erals (< 30%). The reservoir is thermally matured, Early and Middle Carboniferous in age (Mississippian epoch); mainly recovered by silicate, carbonate and dolomitic sediments (Montgomery et al. 2005) with abundant pyrite and phosphate content (Hickey and Henk, 2007). The formation can be subdivided into five lithofacies based on cores and outcrops, i.e. black shale, calcareous black shale, dolomitic black shale, phosphatic black shale and lime grainstone owing to extensive early microbial alteration of sufficient organic matter (Hickey and Henk, 2007), and to the poor circulation link with the open-ocean and the euxinic ocean bottom waters (Loucks and Ruppel 2007). The average depth of the forma- tion is about 2,550 m and its thickness varies between 15 and 122 m (Montgomery et al. 2005) and it is overlain and underlain by impermeable limestone. It contains dry natural gas and oil, but the latter is in present in much lower quantity. The Forestburg Limestone Formation is breaking up the Barnett Shale Formation into two parts. The lower and upper parts are similar and there is no significant difference in petrology (Jarvie et al. 2007). The general geological setting of the Barnett Shale Formation is shown in Fig. 7. 5 Deterministic interpretation of real well logging data In the Barnett Shale Formation, the following well log types are used for cluster analy- sis: natural gamma-ray intensity (GR [API]), compressional wave slowness (Δt [µs/ft]), bulk density (ρ [g/cm ]), neutron porosity (Φ [v/v]) and deep resistivity (R [ohmm]). b n d The observed length of the investigated borehole is 303 ft (92.35 m), where the general rock type is mainly dolomitic shale with some smaller clean shale interbedded zones. The Fig. 7 Cross-section of wells showing the general lithostratigraphic position of Barnett Shale Formation (Courtesy of Louck and Ruppel 2007) 1 3 722 Acta Geodaetica et Geophysica (2021) 56:711–729 sampling interval of well logs is 0.1 ft (3.048 cm). Besides the input well logs, core meas- urements are also available; among others the porosity, permeability and total organic mat- ter content. The next petrophysical quantities used as reference in cluster analysis are calculated by deterministic well log analysis approaches (Serra 1984). The shale volume (V [v/v]) is sh estimated from the GR log using the non-linear formula of Larionov (1969), which was originally suggested for rocks older than Tertiary 2 i V = 0.033 2 − 1 , (29) sh where i is the natural gamma-ray index. The effective porosity (Φ [v/v]) as a related γ eff,d quantity is derived from the bulk density log Φ =Φ − V Φ , (30) eﬀ ,d d sh sh,d where Φ [v/v] denotes the shale porosity derived from the density log, and Φ [v/v] is sh,d d the density-derived porosity calculated as ma b Φ = , (31) ma f where ρ is the density of the rock matrix and ρ is that of the pore-fluid often assumed to ma f be mud-filtrate density, both measured in g/cm . Several empirical models can be used to calculate the water saturation of hydrocarbon reservoirs (S [v/v]). In this study, the Siman- doux (1963) equation is used ⎧� � ⎫ 1∕2 � � � � 2 2 aR ⎪ V V ⎪ 4Φ w sh sh S = + − , (32) w ⎨ ⎬ 2Φ aR R R R w t sh sh ⎪ ⎪ ⎩ ⎭ where a is the tortuosity factor, m is the cementation exponent, Φ is the neutron-porosity, R is the formation water resistivity, R is the true resistivity, R is the shale resistivity. w t sh In organic-rich shale reservoirs, the total organic matter content (TOC) (and conductive minerals if present) is advised to be integrated into the calculations to give a more accu- rate estimate of water/gas saturation (Xu et al. 2017). The TOC [%] can be related to bulk density TOC = − B, (33) where A and B are gas and formation specific constants (Schmoker and Hester, 1983). The above formula can be used in many types of shale reservoirs mostly when the mineral com- position and porosity show no big variation along with the formation and the organic mat- ter is of low mass density. By using the density and porosity logs apparent matrix density can be easily derived (Asquith and Krygowski 2004) −Φ b f (34) = . a,ma 1 −Φ The permeability of the reservoir (K’ [mD]) is calculated by the model of Coates and Denoo (1981) 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 723 Φ 1 − S w,irr �1∕2 K = 100 , (35) w,irr where S [v/v] is the irreducible water saturation. w,irr 6 The result of cluster analysis The MFV-CA method is tested in the Barnett Shale (Fig. 7). In the first phase, the optimal number of clusters should be specified, which is highly related to the lithological character - istics of the studied formation. To estimate the optimal number of clusters, the SSE curve calculated by Eq. (10) is utilized. The decay rate of the initial SSE in the function of the cluster number is plotted in Fig. 8. The diagram suggests that a suitable cluster number is K ≅ 5 because there is no sig- nificant change in the value of the SSE in the case of clusters more than 5. In preliminary statistical tests, one can select different cluster numbers in the K-means cluster analysis. By choosing four clusters, one could not appropriately separate the pore-space content, while six groups may result in non-interpretable (non-existing) lithology types. A comprehensive approach is applied to well log analysis including deterministic mod- eling and MFV-based cluster analysis. The former is used for calculating the basic petro- physical parameters (Fig. 9), while the latter makes an improved lithological separation of the unconventional hydrocarbon formation. (We assume the following approximation Rd≈Rt.) Shale volume, effective porosity, water (and hydrocarbon) saturation apparent matrix density, permeability and TOC is calculated using Eqs. (29, 30, 31, 32, 33, 34 and 35). The well log of the ordinal number of clusters (last track) agrees well with the litho- logical variations and fluid characteristics in the processed interval (tracks 2 and 6). The evaluated values of effective porosity, permeability and TOC are acceptably confirmed by laboratory core measurements in tracks 4, 7 and 8. The resultant clusters linked to the estimated petrophysical parameters are shown in Fig. 10. The plots present the vertical distribution of petrophysical parameters, where every measur- ing point is colored according to its cluster for the hydrocarbon saturation, shale volume and Fig. 8 The Sum of Squared Error diagram applied to find the optimal number of clusters in the Barnett Shale Formation 1 3 724 Acta Geodaetica et Geophysica (2021) 56:711–729 Fig. 9 The input well logs, the estimated petrophysical parameters and the result of MFV-CA cluster analy- sis in the Barnett Shale Formation TOC. As it can be seen in Figs. 9 and 10, the amount of shaliness has a great correlation with the result of cluster analysis. In the case of the hydrocarbon saturation, the correlation is also good, however, cluster 3 shows relatively high uncertainty (i.e. covers a wide range of satura- tion values). The MFV-CA method allows effective separation of different lithological units, despite having sometimes overlap in the individual values of petrophysical parameters. The clusters related to the lithology and pore content is summarized in Table 3. The numerical val- ues also show a good correlation between the shale volume and the gas saturation. In case of 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 725 Fig. 10 Petrophysical parameters associated with clusters determined by the MFV-CA method Table 3 Shale volume and the saturation of dominant pore fluid-types related to clusters determined by the MFV-CA method Ordinal num- Color of cluster Shale volume (v/v) Dominant HC type HC saturation (v/v) ber of cluster 1 Green High (0.47 ± 0.13) Natural gas High (0.77 ± 0.09) 2 Yellow Low (0.13 ± 0.03) Oil Medium (0.3 ± 0.15) 3 Blue Low (0.18 ± 0.02) Natural gas Low to high (0.4 ± 0.4) 4 Red Medium (0.27 ± 0.07) Natural gas High (0.7 ± 0.13) 5 Black Very high (0.8 ± 0.2) Natural gas High (0.7 ± 0.1) shale volumes higher than 0.2 v/v, the gas saturation significantly increases, too. The oil phase can only be found below a shale volume of 0.15 v/v. In the intervals of 8550 and 8640 ft, and 8740 and 8790 ft, the cluster analysis reveals a change in shale composition relative to its environment. The MFV-CA method is applicable not just to separate the shale type, but the fluid types, too. Cluster 2 shows well the oil intervals, where some smaller uncertainty may be caused by the varying absorbed gas content of the Barnett Shale Formation. 1 3 726 Acta Geodaetica et Geophysica (2021) 56:711–729 7 Discussion The performance of the newly developed MFV-based clustering (MFV-CA) method is demonstrated in the previous section. In many cases, the classical Euclidean norm-based clustering (E-CA) process shows high sensitivity to the outliers, thus, it categorizes the geologically similar data points into different clusters. According to our tests made in the Barnett Shale Formation, the result of the MFV-CA method stands closer to those of deterministic evaluation of well logs. The results of the two clustering methods can be compared in Fig. 11. In the figure, two greater zones are emphasized to show that the MFV-CA method usually gives better vertical resolution than Euclidean-CA. In addition to it, because of its high noise sensitivity, there are some other smaller misleading inter- vals along with the classical Euclidean-CA log. It must be mentioned that both cluster- ing results were compared to the water saturation log calculated by Eq. (32), which is not an optimal method, because of the complexity of the given unconventional reservoir. Fig. 11 The evaluation results and the vertical resolution of the MFV-CA and the classical E-CA method 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 727 8 Conclusions A novel multivariate statistical approach is suggested for unconventional oilfield well log analysis. The non-hierarchical K-means cluster analysis was improved by the Most Frequent Value method, which provides the exploration of unconventional, geologically complex reservoirs with additional petrophysical information and a robust solution. By choosing a distance metric using the Cauchy-Steiner weights, one can exclude the harm- ful effect of outliers and make the cluster analysis of well logs more reliable. One of the advantages of the suggested method is that it does not need the preset of weighting coefficients, since they are automatically defined for any dataset in course of the itera- tion process. Besides effective rock typing, the new technique gives promising results in defining the sub-intervals within the shale formation. In our example, a significant correlation was found between the results of MFV-CA and the lithology and pore con- tent. The water saturation is calculated utilizing a conventional resistivity model, which might influence the results referring to hydrocarbon type and saturation. For a better interpretation, one needs to consider the TOC during the calculation of saturation. The suggested method is relatively quick, in the case of big datasets, as well; the CPU time for the presented dataset was ~ 25 s on a dual-core personal computer. In the future, the application of more types of well logs may give even more reliable interpretation results, and by the multi-well application of the MFV-CA method, a fast and automated 2D or 3D evaluation of shale gas formations can be performed. Acknowledgements Our investigation was supported by the National Scientific Research Fund, Project numbers OTKA K-135323. The second author thanks to Michael Holmes, Antony Holmes, Dominic Hol- mes and Texas Gas Service Plc for the permission to use their digital well logging dataset. Author contributions Tamás Fancsik: conceptualization, mathematical derivations, inversion. Endre Turai: Induced Polarization methodology. Norbert Péter Szabó: Inversion methodology, review and editing. Judit Somogyiné Molnár: English text editing, visualization. Tünde Edit Dobróka: software developments, tests and figure editing. Mihály Dobróka: inversion, mathematical details, text editing. Funding Open access funding provided by University of Miskolc. The research was carried out in a project No. K-135323 supported by the National Research, Development and Innovation Office (NKFIH). Declarations Conflicts of interest The authors have no conflicts of interest to declare that are relevant to the content of this article. Availability of data and material Because of the data confidentiality, the experimental data is not published. Code availability Because of the data confidentiality, the code is not published. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com- mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. 1 3 728 Acta Geodaetica et Geophysica (2021) 56:711–729 References Amundsen L (1991) Comparison of the least-squares criterion and the Cauchy criterion in frequency- wavenumber inversion. Geophysics 56:2027–2038 Archie GE (1942) The electrical resistivity log as an aid in determining some reservoir characteristics. Transactions of the AIME 146:54–62 nd Asquith G, Krygowski D (2004) Basic Well Log Analysis, 2 Edition. AAPG Methods in Exploration Series, no. 16. Bibor I, Szabó NP (2016) Unconventional shale characterization using improved well logging methods. Geosciences and Engineering 5(8):32–50 Bjørlykke K (2015) Unconventional hydrocarbons: oil shales heavy oil, tar sands, shale oil, shale gas and gas hydrates petroleum geoscience - from sedimentary environments to rock physics. Springer- Verlag, Berlin, pp 581–590 Braun BA, Abordán A, Szabó NP (2016) Lithology determination in a coal exploration drill hole using Steiner weighted cluster analysis. Geosciences and Engineering, A Publication of the University of Miskolc 5(8):51–64 Coates GR, Denoo S (1981) The Producibility Answer Product. Schlumberger Technical Review 29(2):55–63 Dobróka M, Gyulai Á, Ormos T, Csókás J, Dresen L (1991) Joint inversion of seismic and geoelectric data recorded in an underground coal mine. Geophys Prospect 39:643–665 Dobróka M, Szegedi H, Somogyiné MJ (2014) A new robust inversion method using cauchy-steiner th weights and its application in data processing. Near Surface Geoscience 2014 - 20 European Meeting of Environmental and Engineering Geophysics, Athens. Bárdossy Gy., Fodor J. 2004. Evaluation of Uncertainties and Risks in Geology. New Mathematical Approaches for their Handling. Springer, 221. Hickey JJ, Henk B (2007) Lithofacies summary of the Mississippian Barnett shale Mitchell 2 T.P. Sims well, wise county, Texas. AAPG Bull 91(4):437–443 Jarvie MD, Hill JR, Ruble ET, Pollastro MR (2007) Unconventional shale-gas systems: the Missis- sippian Barnett shale of north-central Texas as one model for thermogenic shale-gas assessment. AAPG Bull 91(4):475–499 Kazmierczuk M, Jarzyna J (2006) Improvement of lithology and saturation determined from well-log- ging using statistical methods. Acta Geophys 54(4):378–398 Larionov VV (1969) Radiometry of boreholes. Nedra, Moscow (in Russian) Loucks RG, Ruppel SC (2007) Mississippian Barnett Shale: Lithofacies and depositional setting of a deep-water shale-gas succession in the Fort Worth Basin. Texas AAPG Bulletin 91(4):579–601 Montgomery LS, Jarvie MD, Bowker AK, Pollastro MR (2005) Mississippian Barnett Shale, fort worth basin, north-central Texas: Gas-shale play with multi–trillion cubic foot potential. AAPG Bull 89(2):155–175 Paasche H, Tronicke J (2007) Cooperative inversion of 2D geophysical data sets: A zonal approach based on fuzzy c-means cluster analysis. Geophysics 72(3):A35–A39 Passey QR, Creaney S, Kulla BJ, Moretti FJ, Stroud JD (1990) A practical model for organic richness from porosity and resistivity logs. AAPG Bull 74(12):1777–1794 Schmoker JW, Hester TC (1983) Organic carbon in bakken formation, United States portion of Williston Basin. AAPG Bulletin 67(12):2165–2174 Serra O (1984) Fundamentals of well-log interpretation. Elsevier, Amsterdam Sfidari E, K-Ilkichi A, R-Bbonab H, Soltani B (2014) A hybrid approach for litho-facies characterization in the framework of sequence stratigraphy: A case study from the South Pars gas field, the Persian Gulf basin. J Petrol Sci Eng 121:87–102 Simandoux P (1963) Dielectric measurements in porous media and application to shaly formation: Revue de L’Institut Français du Pétrole. Supplementary Issue 18:193–215 Steiner F (1988) Most frequent value procedures (a short monograf). Geophys Trans 34:139–260 Steiner F (1991) The most frequent value. Introduction to a modern conception of statistics. Academic Press, Budapest Szabó NP (2011) Shale volume estimation based on the factor analysis of well-logging data. Acta Geo- phys 59:935–953 Szabó NP, Dobróka M (2013) Extending the application of a shale volume estimation formula derived from factor analysis of wireline logging data. Math Geosci 45:837–850 Szabó NP, Balogh GP, Stickel J (2017) Most frequent value based factor analysis of direct-push logging data. Geophys Prospect 66(3):530–548 1 3 Acta Geodaetica et Geophysica (2021) 56:711–729 729 Szűcs P, Civan F, Virág M (2006) Applicability of the most frequent value method in groundwater mod- elling. Hydrogeol J 14:31–43 Vriend SP, van Gaans PFM, Middelburg JJ, de Nijs T (1988) The application of fuzzy c-means cluster analysis and non-linear mapping to geochemical datasets: Examples from Portugal. Appl Geochem 3(2):213–224 Xu J, Xu L, Qin Y (2017) Two effective methods for calculating water saturations in shale-gas reservoirs. Geophysics 82(3):D187–D197 Zhang J (2017) Most frequent value statistics and distribution of Li abundance observations. Mon Not R Astron Soc 468(4):5014–5019 Zhang J (2018) Most frequent value statistics and the hubble constant. Publ Astron Soc Pac 130(990):1538–3873 1 3

Journal

"Acta Geodaetica et Geophysica" – Springer Journals

Published: Dec 1, 2021

Keywords: Most Frequent Value; K-means clustering; Robust; Well log; Shale gas; Barnett Shale

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Improved well logs clustering algorithm for shale gas identification and formation evaluation

Improved well logs clustering algorithm for shale gas identification and formation evaluation

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Improved well logs clustering algorithm for shale gas identification and formation evaluation

Improved well logs clustering algorithm for shale gas identification and formation evaluation

References (38)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies