Access the full text.
Sign up today, get DeepDyve free for 14 days.
X Hu, S H Wu, X H Wu (2013)
Combined preconditioning with applications in reservoir simulationSIAM Multiscale Modeling and Simulation, 11
K Stüben (2001)
Multigrid
J. Wallis (1983)
Incomplete Gaussian Elimination as a Preconditioning for Generalized Conjugate Gradient Acceleration
C Feng, S Shu, J Xu (2014)
21st International Conference on Domain Decomposition Methods (2012, INRIA Rennes-Bretagne-Atlantique)
Xiaozhe Hu, W. Liu, G. Qin, Jinchao Xu, Zhensong Zhang (2011)
Development of A Fast Auxiliary Subspace Pre-conditioner for Numerical Reservoir Simulators
K Stüben, T Clees, H Klie (2007)
Paper SPE 105832 presented at the SPE Reservoir Simulation Symposium, Houston, TX, USA
G. Behie, P. Forsyth (1984)
Incomplete Factorization Methods for Fully Implicit Simulation of Enhanced Oil RecoverySiam Journal on Scientific and Statistical Computing, 5
Z Chen, G Huan, Y Ma (2006)
Society for Industrial Mathematics
Xiaozhe Hu, Shuhong Wu, Xiao-hui Wu, Jinchao Xu, Chensong Zhang, Shiquan Zhang, L. Zikatanov (2012)
Combined Preconditioning with Applications in Reservoir SimulationMultiscale Model. Simul., 11
S. Lacroix, Y. Vassilevski, M. Wheeler (2001)
Decoupling preconditioners in the implicit parallel accurate reservoir simulator (IPARS)Numerical Linear Algebra with Applications, 8
M A Christie (2001)
10.2118/72469-PASPE Reservoir Evaluation & Engineering, 4
Tareq Al-Shaalan, H. Klie, A. Dogru, M. Wheeler (2009)
Studies of Robust Two Stage Preconditioners for the Solution of Fully Implicit Multiphase Flow Problems
R. Falgout (2006)
An Introduction to Algebraic Multigrid ComputingComputing in Science & Engineering, 8
J H Abdel-Qader, R S Walker (2010)
Proceedings of the 14th WSEAS International Conference on Computers
Ehtesham Hayder, Majdi Baddourah (2012)
Challenges in High Performance Computing for Reservoir SimulationEurosurveillance
(1987)
Algebraic multigrid, in multigrid methods
(2014)
Multilevel Iterative Methods and Solvers for Reservoir Simulation on CPU-GPU Heterogenous Computers
S. Lacroix, Y. Vassilevski, J. Wheeler, M. Wheeler (2003)
Iterative Solution Methods for Modeling Multiphase Flow in Porous Media Fully ImplicitlySIAM J. Sci. Comput., 25
T. Dupont, R. Kendall, H. Rachford (1968)
An Approximate Factorization Procedure for Solving Self-Adjoint Elliptic Difference EquationsSIAM Journal on Numerical Analysis, 5
Wu Shuhong, Li Xiaobo, Li Qiaoyun, Li Hua, Wang Baohua, Y. Wen (2013)
A Dynamic Hybrid Model to Simulate Fractured Reservoirs
M. Lam, E. Rothberg, Michael Wolf (1991)
The cache performance and optimizations of blocked algorithms
J. Meyerink (1983)
Iterative methods for the solution of linear equations based on incomplete block factorization of the matrix
J Ruge, K Stüben (1987)
Frontiers Appl. Math
Shuhong Wu, Jinchao Xu, Chensong Zhang, Qiaoyun Li, Baohua Wang, Xiaobo Li, Hua Li (2013)
Multilevel Preconditioners for a New Generation Reservoir Simulator
R. Bank, T. Chan, W. Coughran, R. Smith (1989)
The alternate-block-factorization procedure for systems of partial differential equationsBIT Numerical Mathematics, 29
Steven Leon (1980)
Linear Algebra With Applications
Jareer Abdel-Qader, R. Walker (2010)
Performance evaluation of OpenMP benchmarks on intel's quad core processors
Xiaozhe Hu, Jinchao Xu, Chensong Zhang (2013)
Application of auxiliary space preconditioning in field-scale reservoir simulationScience China Mathematics, 56
Beatrice Riviere (2007)
Computational methods for multiphase flows in porous mediaMath. Comput., 76
J A Meijerink (1983)
Paper SPE12262 presented at the SPE Reservoir Simulation Symposium, Lubbock, TX. Nov. 14–15
L. Oliker, X. Li, P. Husbands, R. Biswas (2013)
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix ComputationsSIAM Rev., 44
Y. Saad (2003)
Iterative methods for sparse linear systems
M. Christie, M. Blunt (2001)
Tenth SPE Comparative Solution Project: a comparison of upscaling techniques
S H Wu, X B Li, Q Y Li (2013)
Paper IPTC 16521 presented at the International Petroleum Technology Conference, Beijing, China
J. Douglas, D. Peaceman, H. Rachford (1959)
A Method for Calculating Multi-Dimensional Immiscible DisplacementTransactions of the AIME, 216
X B Li, S H Wu, Q Y Li (2013)
Paper SPE166665 prensented at the Asia Pacific Oil & Gas Conference and Exhibition, Jakarta, Indonesia
P. Concus, G. Golub, G. Meurant (1985)
Block Preconditioning for the Conjugate Gradient MethodSiam Journal on Scientific and Statistical Computing, 6
P. Bjørstad, F. Manne, T. Sørevik, M. Vajtersic (1992)
Efficient Matrix Multiplication on SIMD ComputersSIAM J. Matrix Anal. Appl., 13
Baohua Wang, Wu Shuhong, Da-kuang Han, G. Huan, Li Qiaoyun, Li Xiaobo, Li Hua, Jiuning Zhou (2013)
Block compressed storage and computation in the large-scale reservoir simulationPetroleum Exploration and Development, 40
Chunsheng Feng, S. Shu, Jinchao Xu, Chensong Zhang (2014)
A Multi-Stage Preconditioner for the Black Oil Model and Its OpenMP Implementation
D K Han (1998)
Proceedings of the 15th World Petroleum Congress
Wang Baohu (2013)
Applications of BILU0-GMRES in reservoir numerical simulationActa Petrologica Sinica
R. Falgout (2006)
An Introduction to Algebraic MultigridComput. Sci. Eng., 8
A Dogru, L Fung, U Middya (2009)
Paper SPE 119272 presented at SPE Reservoir Simulation Symposium
E. Pavlas (2002)
Fine-Scale Simulation of Complex Water Encroachment in a Large Carbonate Reservoir in Saudi ArabiaSpe Reservoir Evaluation & Engineering, 5
(2003)
Iterative solution methods for Sci
M D Lam, E E Rothberg, M E Wolf (1991)
Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Asplos Iv)
A. Dogru, L. Fung, Usuf Middya, Tareq Al-Shaalan, J. Pita (2009)
A Next-Generation Parallel Reservoir Simulator for Giant Reservoirs
K. Stueben, T. Clees, H. Klie, Bo Lu, M. Wheeler (2007)
Algebraic Multigrid Methods (AMG) for the Efficient Solution of Fully Implicit Formulations in Reservoir Simulation
J. Wallis, R. Kendall, T. Little (1985)
Constrained Residual Acceleration of Conjugate Residual Methods
X Hu, W Liu, G Qin (2011)
Paper SPE 148388 presented at SPE Reservoir Characterization and Simulation Conference
J. Trangenstein, J. Bell (1989)
Mathematical structure of the black-oil model for petroleum reservoir simulationSiam Journal on Applied Mathematics, 49
(1985)
Algebraic Multigrid (AMG) for Sparse Matrix Equations, Sparsity and Its Applications
S H Wu, J Xu, C S Zhang (2013)
Paper SPE 166011 presented at SPE Reservoir Characterisation and Simulation Conference and Exhibition held in Abu Dhabi, UAE
Qiaoyun Li, Shuhong Wu, Baohua Wang, Xiaobo Li, Li Hua, Jiqun Zhang, Lixin Meng (2013)
A New Generation Reservoir Simulator and Its Application in a Mature Water-Flooding Oilfield
Changhui Lin, V. Nagarajan, Rajiv Gupta, Bharghava Rajaram (2012)
Efficient sequential consistency via conflict ordering
J R Appleyard, I M Cheshire (1983)
Paper SPE 12264 presented at Proc. 7th SPE Symposium on Reservoir Simulation
H. Dakuang (1997)
[5]4 The Achievements and Challenges of EOR Technology for Onshore Oil Fields in China
Chunsheng Feng, S. Shu, X. Yue (2012)
An Improvement to the OpenMP Version of BoomerAMG
J. Meijerink, H. Vorst (1977)
An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrixMathematics of Computation, 31
J Douglas, D W Peaceman, H H Rachford (1959)
A method of calculating multi-dimensional immiscible displacementSPE AIME., 216
Y Saad (2003)
Society for Industrial and Applied Mathematics
A. Behie, P. Vinsome (1982)
Block iterative methods for fully implicit reservoir simulationSociety of Petroleum Engineers Journal, 22
J R Appleyard, I M Cheshire, R K Pollard (1981)
Proc. European Symposium on enhanced oil recovery, Bournemouth, England
R. Vuduc, J. Demmel (2003)
Automatic performance tuning of sparse matrix kernels
Da-kuang Han, Cheng-zhi Yang, Zheng-Qing Zhang, Z. Lou, You‐Im Chang (1999)
Recent development of enhanced oil recovery in ChinaJournal of Petroleum Science and Engineering, 22
C Feng, S Shu, X Yue (2012)
Proceedings of CCF HPC China 2012, Zhangjiajie, China
(2013)
An improved approach to simulate lowpermeability fractured reservoir with dynamic hybrid dual-porosity Conference and Exhibition, Jakarta, Indonesia, 2013b
J. Watts, Jason Shaw (2005)
A New Method for Solving the Implicit Reservoir Simulation Matrix Equation
Feng Wang, Jinchao Xu (1999)
A Crosswind Block Iterative Method for Convection-Dominated ProblemsSIAM J. Sci. Comput., 21
540 Pet.Sci.(2014)11:540-549 DOI 10.1007/s12182-014-0370-1 A multilevel preconditioner and its shared memory implementation for a new generation reservoir simulator 1, 2 3 4 5 Wu Shuhong , Xu Jinchao , Feng Chunsheng , Zhang Chen-Song , Li 2 4 2 2 2 Qiaoyun , Shu Shi , Wang Baohua , Li Xiaobo and Li Hua State Key Laboratory of Enhanced Oil Recovery, Beijing 100083, China Research Institute of Petroleum Exploration and Development, PetroChina, Beijing 100083, China Department of Mathematics, Penn State University, Univerisity Park, USA School of Mathematics and Computational Science, Xiangtan University, Xiangtan, Hunan 411105, China Academy of Mathematics and System Sciences, Beijing 100190 China © China University of Petroleum (Beijing) and Springer-Verlag Berlin Heidelberg 2014 Abstract: As a result of the interplay between advances in computer hardware, software, and algorithm, ZHDUHQRZLQDQHZHUDRIJHVFDOHODUUHVHUYRLUVLPXODWLRQZKLFKIRFXVHVRQDFFXUDWHÀRZGHVFULSWLRQ fine reservoir characterization, efficient nonlinear/linear solvers, and parallel implementation. In this paper, we discuss a multilevel preconditioner in a new-generation simulator and its implementation on multicore computers. This preconditioner relies on the method of subspace corrections to solve large-scale linear systems arising from fully implicit methods in reservoir simulations. We investigate the parallel HI¿FLHQF\DQGUREXVWQHVVRIWKHSURSRVHGPHWKRGE\DSSO\LQJLWWRPLOOLRQFHOOEHQFKPDUNSUREOHPV Key words: Multilevel, preconditioner, shared memory, large-scale linear system, reservoir simulation computing resources (such as high-performance clusters), 1 Introduction desktop computers and workstations still dominate the work Simulation-based scientific discovery and engineering environment for reservoir simulation engineers. Because GHVLJQGHPDQGH[WUHPHFRPSXWLQJSRZHUDQGKLJKO\HI¿FLHQW of the interplay of the three “walls” —the memory wall, algorithms (as well as their implementations). This demand is the instruction level parallelism wall, and the power wall the main driving force for developing extreme-scale computer (the chip’s overall temperature and power consumption)— hardware/software during the last few decades. Reservoir the peak performance of a single core has almost stopped simulation plays an important role both in designing an improving. Even worse, single-core performance has started efficient development process and in improving recovery to deteriorate in some cases. There is a trend toward using factors. Significant research effort has been devoted to multicore processors, which helps CPU designers to avoid the creating fine-scale reservoir models and efficient reservoir high power-consumption problem that comes with increasing simulators on high-performance computers; see Hayder and chip frequency. As CPU speeds rise into the 3-4 GHz range, Baddourah, 2012 and references therein for details. the amount of electrical power required is prohibitive. Reservoir engineers are building high-resolution reservoir Hence, the trend toward multicore processors started and models (Pavlas, 2002; Dogru et al, 2009), on which numerous will continue into the foreseeable future. OpenMP is an simulation runs are performed for the purpose of history application program interface that can be used to explicitly matching and model validation. According to a previous case direct multicore (shared memory) parallelism. It is a study by Saudi Aramco (Pavlas, 2002), a high-resolution VSHFL¿FDWLRQIRUDVHWRIFRPSLOHUGLUHFWLYHVOLEUDU\URXWLQHV model can preserve reservoir heterogeneity at a fine scale and environment variables that can be used to specify shared and thus maintain reservoir character and describe complex memory parallelism in Fortran and C/C++ programs. water encroachment. Saudi Aramco obtained an excellent Several difficulties can arise when using multithread historical validation using a 128-node cluster and the resulting implementation for preconditioned Krylov subspace methods: predictions were in line with the company’s expectations. i) Some preconditioners use sequential algorithms, like Despite of the increasing availability of more powerful Gauss-Seidel; ii) OpenMP programs sometimes require more memory space than their corresponding sequential versions *Corresponding author. email: wush@petrochina.com.cn do. When a numerical algorithm is implemented in OpenMP Received January 22, 2014 or any other multithread computer language, it is important Pet.Sci.(2014)11:540-549 541 to maintain the convergence rate of the corresponding in porous media in the standard surface conditions, subject sequential algorithm. However, this is not always possible as to appropriate initial and boundary conditions (Chen et al, many numerical algorithms are sequential in nature. When 2006). The material balance of the hydrocarbon gaseous (gas), working with sparse matrices in compressed formats, like liquid (oil), and water components is described, respectively, the Compressed Sparse Row format, we sometimes need by to introduce auxiliary memory space. This becomes an ªº §· S § · g RS R w 1 so s increasingly heavy burden as the number of threads increases. « I » uu q ¨¸ ¨ ¸ (1) go G ¨¸ ¨ ¸ wtb b b b «» go g o We will analyze the parallel interpolation and coarse-grid ©¹ © ¹ ¬¼ operators in the setup phase of the algebraic multigrid (AMG) method based on the fact that the coefficient matrices we §· § · w S 1 consider are banded. (2) I uq ¨¸ ¨ ¸ oO wtb b In order to meet the increasing demand for high- ©¹oo© ¹ resolution reservoir simulations in low-end desktop computing environments (multicore CPUs, sometimes w §· S § 1· with heterogeneous co-processors), we design and develop I uq (3) ¨¸ ¨ ¸wW wtb b ww new cost-effective reservoir simulation techniques, such ©¹ © ¹ as different fluid models and discretization methods, for W\SLFDOGHVNWRSFRPSXWHUV,QWKLVSDSHUZHGLVFXVVHI¿FLHQW ,WLVDVVXPHGWKDWWKHOLQHDU'DUF\ODZJRYHUQVWKHÀXLG implementation of multilevel preconditioners for solving ÀRZRIHDFKSKDVHLQSRURXVPHGLD large-scale fully-implicit simulations of the black oil model. kk rĮ Some of the materials in this paper have been previously uP Ugz , Į o, g, w (4) ĮĮ Į presented in two conference proceedings by the authors (Wu et al, 2013a; Feng et al, 2014) and we repeat them The phase saturations appear to satisfy the condition for completeness. The main contribution of this paper is the new numerical study on the OpenMP performance of SS S 1 (5) g w the multilevel FIM solver, which has not been seen in the literature. We further assume that the capillary pressures characterize The rest of this paper is organized as follows: In Sec. the pressure differences between phases: 2, we briefly review the mathematical model and its fully PP P , P P P implicit discretization method. We discuss the data structure gcgo (6) ow cow o for block sparse matrices in Sec. 3, after which we introduce a preconditioner based on a successive subspace correction Remark 1 In this paper, we focus on the fully implicit framework in Sec. 4. We discuss several implementation method for solving the above black oil model. The methods issues of the proposed algorithm using OpenMP. We then discussed here can be readily extended to other models and numerical discretization schemes. For example, a new- and multicore speed-up of the proposed preconditioner in generation simulator can also handle the well-known volatile Sec. 6. Finally, we summarize the discussion with a few oil model in which the oil mass balance equation (Eq. (2)) is concluding remarks in Sec. 7. replaced by 2 Mathematical model and its fully implicit ªº §· § · RS wSR 1 v g ov « I¨¸» ¨ uu ¸ q (7) og O discretization ¨¸ ¨ ¸ wtb b b b «» og o g ©¹ © ¹ ¬¼ Most of China’s oil fields are located in continental basins and many are characterized by serious heterogeneity, Before we start to discuss how to solve the above low permeability, and high oil viscosity (Han, 1998; Han et al, 1999). Water breakthrough occurs at an early stage of rate and the total liquid production rate constraints can development in these fields, which results in low recovery be applied for the wells. When the bottom-hole pressure HI¿FLHQF\HYHQZKHQHQKDQFHGZDWHULQMHFWLRQWHFKQLTXHVDUH P cannot sustain the fixed flow rate, the well equation bh employed. Higher resolution reservoir modelling is needed to DQDO\]HFRPSOH[ÀRZSKHQRPHQDLQWKHVH¿HOGV For simplicity, we will not present details pertaining to the The black oil model is often applied in the primary and treatment of well constraints. When the reservoir pressure secondary oil-recovery stages. In this model, the fluid is drops below the bubble-point pressure (undersaturated state), assumed to have three quasi-components (Water, Oil, and the hydrocarbon phase splits into a liquid (oil) phase and a Gas) and they form three respective phases (water, oil, and gaseous (gas) phase at the thermodynamical equilibrium. In gas): The water phase does not exchange mass with the this case, we choose P , S , and S as the primary variables, o w g other phases, and the liquid and gaseous phases exchange with the rest of the unknowns represented by the primary mass with each other. As a widely accepted approach, the variables using Eqs. (5) and (6). On the other hand, if the gas isothermal black oil model solves the three-dimensional phase is not present (saturated state), we use R instead of three-phase equations of the conservation of mass (volume) S as a primary variable. However, we will not consider the DXWRPDWLFDOO\VZLWFKHVWRWKH¿[HGERWWRPKROHSUHVVXUHFDVH HTXDWLRQVZH¿UVWPDNHDIHZFRPPHQWV7KHSKDVHLQMHFWLRQ SHUIRUPDQXPHULFDOH[SHULPHQWWRWHVWHI¿FLHQF\UREXVWQHVV 542 Pet.Sci.(2014)11:540-549 saturated case when presenting the algorithm in this paper. the same time, we pad the right-hand side with zeros at the Among many possible methods for the above model, FRUUHVSRQGLQJSRVLWLRQVRIWKHVHDUWL¿FLDOYDULDEOHVWRREWDLQ mN() M we will consider only the fully implicit method (FIM). The f V R . We then group the oil pressure together fully implicit method (Douglas et al, 1959) is a discretization with the well bottom-hole pressure together and further write method that is often used to solve the black oil model in the local Jacobian matrix (for one grid cell) in the following petroleum reservoir simulators. In FIM, Newton linearization form: is combined with first-order upstream-weighting finite JJ §· PP PS mm u AJ R (9) ¨¸ difference spatial discretization (for details, see Chapter 8 ij JJ ©¹ SP SS in Chen et al, 2006). This scheme is accurate and stable, as proved by several decades of practical usage. The main where P denotes the pressure variables (oil pressure and disadvantage of FIM is the computational cost associated the well bottom-hole pressure) and S denotes the saturation with solving the Jacobian systems arising from Newton’s variables (including physical water and oil saturations for the method. Very often, solving such linear systems with direct UHVHUYRLUEORFNVDQGDUWL¿FLDOVDWXUDWLRQVIRUWKHLPSOLFLWZHOO RULWHUDWLYHVROYHUVWDNHVPRUHWKDQRIWKHFRPSXWDWLRQDO blocks). time in reservoir simulation. Furthermore, the demand for This way we can store the expanded coefficient matrix more accurate computer simulation has led to larger and, in nn u AA () R with nm () N M in a uniform BSR format. ij WXUQPRUHKHWHURJHQHRXV¿HOGVFDOHUHVHUYRLUPRGHOV6XFK To store the coefficient matrix in a cost-effective way, we JHUDQGPRUHGLI¿FXOWOLQHDUV\VWHPVPRGHOVHQWDLOODU employ a block sparse matrix data structure, which is often used for numerically simulating PDE systems, i.e., the block 3 Storage format for block sparse matrices compressed sparse row (BSR) data structure. The BSR format The Jacobian systems arising from the Newton is a generalization of the well-known compressed sparse linearization in FIM are usually large, sparse, nonsymmetric, row (CSR) format and is used in many numerical software and ill conditioned. Krylov subspace methods (Saad, 2003), packages, including the Intel MKL sparse direct solver library VXFKDV%L&*VWDEDQG*05HVDUHHI¿FLHQWLWHUDWLYHPHWKRGV and the NIST sparse BLAS library. The difference between for solving these Jacobian systems. Many preconditioning BSR and CSR is that in BSR, each nonzero entry is an array techniques have been proposed for reservoir simulation (see, real numbers of size m , instead of one real number as in for example, Dupont et al, 1968; Meijerink and van der Vorst, CSR. This array represents the small, dense Jacobian matrix 1977; Appleyard et al, 1981; Behie and Vinsome, 1982; in each grid cell. Of the several variants of the BSR format, Appleyard and Cheshire, 1983; Meijerink, 1983; Wallis, the following triple-array definition is used in the present 1983; Behie and Forsyth, 1984; Concus et al, 1985; Wallis et paper as well as in a new-generation simulator: al, 1985; Lacroix et al, 2001; 2003; Watts and Shaw, 2005; ƔYDO$UHDODUUD\WKDWFRQWDLQVWKHHOHPHQWVRIWKHQRQ Stüben et al, 2007; Al-Shaalan et al, 2009; Hu et al, 2013b; zero blocks of a sparse matrix. The elements are stored block Wang et al, 2013a; 2013b). by block in row-major order. All the elements of the non-zero When FIM is combined with the cell-center finite blocks are stored, even the elements that are equal to zero. difference method, a fully coupled linear algebraic system Within each non-zero block, the elements are stored in row major order. AA u f §·§·§· ResRes ResWel Res Res ƔFRO(QWU\ i of this integer array is the number of the (8) Au f , i.e. ¨¸¨¸¨¸ AA u f column in the block matrix that contains the i-th non-zero ©¹ WelRes WelWel© We¹ l©¹ Wel block. must be solved in each Newton step. Here, the subscripts ƔURZ(QWU\ j of this integer array is the index of the ‘Res’ and ‘Wel’ stand for the reservoir and implicit well parts, HQWU\LQFROWKDWLVWKH¿UVWQRQ]HUREORFNLQWKH j -th row of respectively, of the main solution variables. Let m be the the block matrix. number of unknowns in each grid cell. For example, in FIM $VDSLFWRULDOH[DPSOHZHVKRZDVLPSOHîJULG for the black oil model, m is equal to 3; and m is equal to 2 block system (Fig. 1) with a vertical well located at (2, 5, 8) for the dead oil case (no gas phase). Assume that there are with 2 and 5 perforated. After expansion, we obtain a block N active grid cells and M implicit wells. Then the size of the VSDUVHPDWUL[ZKHUHHDFKQRQ]HUREORFNLVDîPDWUL[IRU Jacobian matrix is mN+M. Specifically, the solution vector the black oil model. We then store this matrix in the BSR mN+M space is V=R . IRUPDWGHVFULEHGDERYH1RWHWKDWWKLVPRGL¿FDWLRQZLOOQRW Remark 2 Because the well part and the main reservoir introduce much extra storage or computational cost as the part differ in regard to shape, the Jacobian matrix A is number of implicit wells is usually small compared to the size sometimes referred to as the bordered matrix. In practice, we of the reservoir system. have found that many iterative solvers converge slowly or An important reason why we choose the BSR format even fail to converge for practical problems. The coupling for our implementation is that it can improve the parallel between the reservoir equations and the well constraints scalability for sparse matrix-vector multiplications (SpMV), is usually strong. Based on this observation, we extend which is the most time-consuming part in iterative linear all the implicit well blocks such that they have the same solvers and it usually takes most of the CPU time. A lot of dimension as the reservoir block by introducing artificial research has been devoted to improve SpMV; see Oliker auxiliary saturation variables to each implicit well block. At et al, 2002 and references therein for related discussions. Pet.Sci.(2014)11:540-549 543 79 8 46 5 13 2 Fig. 1 Left: A 2D 3-by-3 grid with a vertical well at the center; Right: Expansion of Jacobian matrix (the last row and column for well constraint are expanded to the BSR format) Usually different computer architectures require different al, 1989; Lacroix et al, 2001; Stüben et al, 2007; Al-Shaalan SpMV implementation in order to maximize performance. et al, 2009), in order to weaken the strong coupling between Here we focus on the sparse Jacobian matrices from the fully the pressure and saturation unknowns, a decoupling step has implicit reservoir simulation. For general purpose SpMV to be applied to Eq. (8). This decoupling procedure should implementations, interested readers are referred to Lam be computationally cheap and weaken the coupling between et al, 1991; Bjørstad et al, 1993; Vuduc, 2003; Wang et al the pressure and saturation unknowns. Here, we choose 2013a. In Table 1, we use Jacobian matrices, arising from a the so-called alternative block factorization (ABF) strategy three-phase black oil simulation on a mesh with 3.2 million (Bank et al, 1989). This strategy is basically block diagonal active celles (about 9.6 million degrees of freedom). In the preconditioning: and f Df . Here, D stands A DA table, “Ratio CSR/BSR” means the ratio between wall times for the block diagonal matrix of the expanded matrix , nn u taken by 100 times of CSR and BSR sparse matrix-vector i.e., . In the rest of the paper, we will use DA () R ii nn u multiplication operaions. From this table, we immediately see the notation and write the new matrix as with A A R the advantages of the BSR format over the CSR format. The n=m(N+M). The same convention also applies to the right- BSR SpMV not only takes less computation time, but also hand side vector f R and to the solution . u R yields better parallel speed-up. For the black oil model, we consider two subspace spaces: VV and VV . Here, V is the vector space S P Table 1 SpMV (100 times) using the CSR and BSR sparse matrix formats for the pressure variables (including P for the oil phase and the bottom-hole pressure for the implicit wells) and V Number of OpenMP CSR format BSR format Ratio is the vector space for the saturation variables (S and S , w g threads N Wall time, s Speed-up Wall time, s Speed-up CSR/BSR IRUWKHUHVHUYRLUJULGFHOOVDQGDUWL¿FLDOYDULDEUHVSHFWLYHO\ OHV for the implicit wells). We have the following multiplicative 1 29.52 1.00 26.40 1.00 1.12 version of the MSC algorithm: 2 17.43 1.69 13.17 2.00 1.32 Algorithm 1 (MSC Preconditioner) Given a vector u, we 4 10.68 2.76 8.77 3.01 1.22 GH¿QHDSUHFRQGLWLRQLQJDFWLRQ Bu as follows: 8 8.21 3.60 6.61 3.99 1.24 1) uu 4 A successive subspace correction 2) uu 3 B3 () f Au 10 SS S 0 preconditioner for FIM It is self-evident that different parts of A have different 3) uu 3 B3 () f Au 21 PP P 1 algebraic properties (Trangenstein and Bell, 1989). The part corresponding to the pressure unknowns is elliptic 4) uu R() f Au and the part corresponding to the saturation unknowns is 32 2 mainly hyperbolic. Based on this understanding, CPR- type preconditioners (Wallis, 1983; Wallis et al, 1985) 5) Bu u take advantage of this property and become a competitive alternative in reservoir simulation. A subspace space method 3o :VV 3o :VV Here and are the inclusion PP SS (MSC) has been proposed and discussed by Hu et al (2013b) operators and the superscript denotes the adjoint operator, where each auxiliary space solver takes these algebraic which is simply the transpose operator if applied to matrices. properties into account. In this paper, we focus on the multi- For example, 3o :VV is the injection operator from P P thread implementation of this preconditioner for solving the whole space to the pressure variable space. Note that we large-scale linear systems arising from the fully implicit only apply the operator B as a preconditioner and that there reservoir simulations analyzed by Hu et al (2013b). We now is no reason to solve the sub-problems exactly. Therefore, EULHÀ\UHYLHZWKLVSUHFRQGLWLRQHUIRUFRPSOHWHQHVV in practice, we replace and by the preconditioners A A PP SS Remark 3 As suggested by many researchers (Bank et (or simple iterative methods) B and B , respectively. In P S 544 Pet.Sci.(2014)11:540-549 Step 4), we introduce a smoother R for the original solution parallel programs are relatively easy to implement, as each space. Note that different subspace solvers yield different processor has a global view of the entire memory. Parallelism preconditioners. We should choose appropriate subspace can be achieved by inserting standard compiler directives into solvers according to the characteristic of the problem and the the code to distribute loop iterations among the processors. computer hardware. However, performance may suffer from poor spatial locality The saturation variables S=(S , S ) have hyperbolic of physically distributed shared data. We now focus to the w g characteristics. Due to this fact, we solve the saturation setup stage of the Classical AMG method. Notice that the block by the block Gauss-Seidel method. To improve the AMG method is applied to the pressure equation only and we convergence rate, one can apply the Gauss-Seidel method XVHWKHVWDQGDUG&65VSDUVHPDWUL[IRUPDWIRUWKHFRHI¿FLHQW with downwind ordering and crosswind blocks (see Wang matrices. Feng et al (2014) proposed a simple but efficient and Xu, 1999 for details). This method orders the gridblocks algorithm for constructing standard prolongation and coarse- according to the direction of the multiphase flow and it level operators using OpenMP. If the bandwidth of the sparse has been shown to be efficient for convection-dominated coefficient matrix A is relatively small, this algorithm can problems. However, in order to obtain better parallel speed- save a large amount of memory. up, we use a simple block Gauss-Seidel method with multi- RVLPSOLI\7WKHQRWDWLRQZHGHQRWHWKHFRHI¿FLHQWPDWUL[ nn u color ordering (see Feng, 2014 for details). Note that this A as A R in this section. Let G (V, E) be the graph of PP A choice is for better parallel scalability instead of improving A, where V is the set of vertices (i.e., unknowns) and E is the convergence rate. From this we can see the infrenence of set of edges (i.e., connections that correspond to nonzero off- computer architecture on the choice of numerical algorithms. diagonal entries of A). Assume that the index set of vertices It is well-known that the equations describing the is split into two sets: a set C of coarse-level vertices and a set mass balance in terms of pressure unknowns P are mainly FRI¿QHOHYHOYHUWLFHVVXFKWKDW elliptic (Wallis et al, 1985; Lacrois et al, 2003; Stüben et al, 2007; Al-Shaalan et al, 2009; Hu et al, 2011). Therefore, VC * F and C F we use the algebraic multigrid (AMG) methods (Brandt et al, 1985; Ruge and Stüben, 1987; Stüben, 2001; Falgout, We denote n as the cardinality of C, i.e., the number of 2006) to solve the pressure block A . In this paper, we use C-vertices. Assume that F is the map from F-vertices to PP the classical AMG method for simplicity. In practice, the CHYHUWLFHVGH¿QH:WKHVHWRIQHLJKERULQJYDULDEOHVRI i as NV :{ jAz : 0,zj i} performance and efficiency of AMG may degenerate when ij the physical and geometric properties of the problems become T [0, 1) )RUD¿[HGUHDOQXPEHU , we denote the strong- more complicated. In order to improve the performance connected variables as of the AMG solver, Hu et al (2013a) have developed an Sj ()TT : N :At max (A ) approach that combines an iterative method with some other (10) ^` i i ij kz i ik preconditioner to obtain a new solver for the pressure block. Fs,, C s wC,,s Fs The smoother R in the algorithm resolves the coupling Let D :( SFTT ) ,D : S( ) C , DN:) \ (D * D . ii i i ii i i between the pressure unknowns and the saturation unknowns, HFDQQRZGH¿QH: as well as the coupling between the reservoir unknowns Fs , and well unknowns. The line successive over-relaxation FC:c jiDj: s and without the same depended -vertie ^` ii (LSOR) method and the block incomplete factorization ˆ ˆ Let A :0 if AA ! 0 A : A , and , otherwise. We ij ii ij ij ij (BILU) methods have been applied in reservoir simulations and are often used in practice. The convergence rate of both denote the standard prolongation (or interpolation) matrix nn u LSOR and BILU are noticed to deteriorate when the size of as , where its entries can be determined as PP () R ij the problems increases, or when the porous media become follows: more heterogeneous. LSOR requires geometric information AA ik kj from the underlying mesh. BILU(k ZLWKDJHODU¿OOLQOHYHO ()AA/( A) ij¦¦ ii ik Fs , A w k may become too expensive (in terms of memory usage) kD \F ¦ km kD * F ii i i Cs , mD in practice for large-scale simulations. Therefore, to reduce ° ij Cs , C computational cost (both CPU time cost and memory cost), i Fj,, D j F[ j] ic the block Gauss-Seidel method is used as the smoother in this 1.0,iF Cj , [i] paper. ° ° 0.0, otherwise. 5 OpenMP implementation and shared The matrix P is sparse and is usually stored in the CSR format, we need an auxiliary integer marker called M to memory paradigm locate the column index of each non-zero entry. To generate In this section, we discuss OpenMP parallel the i-th row of PZHGH¿QHIRU 01 ddjn , that implementation of the proposed preconditioner in Algorithm Cs , C J,[ jD , j F j] ji c 1 on typical desktop computers with multicore CPU’s. c Fs , Compared to message-passing implementations like MPI, Mj []: 2,ij D \F (11) Pi i the shared memory paradigm can greatly simplify the 1, otherwise programming task in a multicore environment. OpenMP ¯ n Pet.Sci.(2014)11:540-549 545 J noticing the characteristic of the banded sparse matrices of where j is the position of P entry in the column c i index array of the CSR storage of P. In the OpenMP the coarse operator, we can get the estimation formula for the implementation, we have to allocate an integer array for the lengths of M and M . The actual needed length and the A P L marker M for each OpenMP thread. The length of each M offset can be calculated using the following formulas P P M () A is n , and the total length of M of all threads is then N î n P T nn tt Lnd min( , 2b ) and M (At ) tb where N is the total number of threads. (14) An l n NN TT bb b Assume that is the bandwidth of A, where nl r b and bDUHWKHOHIWDQGULJKWEDQGZLGWKRIWKHFRHI¿FLHQW t ~ t l r M (P) M (P) l M matrix A, respectively. When the parallel partition of V + + is continuously distributed in a balanced fashion to each OpenMP thread (i.e., the size difference on each thread does not exceed one), we can easily see that the number of entries of M that are actually used by the program is much smaller i i 3 3 i j 3 3 than n (see Fig. 2 for an example). Taking into account the fact that the matrix is banded, we can get the following i i 1 1 t A j i1 1 estimates of the length and the minimal offset L M () P (Feng et al, 2014) t-th A A i i i j 2 2 2 2 nn tt Lnd min( , 2b ) and M (Pt ) max() 0, (t 1) 2b P nl n NN TT (12) A i i 4 4 A i , j 4 4 The coarse grid operator of multigrid methods can be built c T using the Galerkin relation A ()AP : AP , where cijn un cc A PA P ,,i j 1, L,n . (13) ij ki kl l j c ¦¦ 111 1 A n × n kl + + + Similar to the implementation of the prolongation t t Construction of the prolongation for A. and are operator, we need to allocate two auxiliary integer arrays Fig. 2 M () P M () P l u the lower and upper, respectively, column indices of the non-zero entries called M and M (see Fig. 3 for a pictorial demonstration). A P of A of the t-th OpenMP thread. The length of M and M are n and n , respectively. By A P c t t t ~ ~ t M (A) M (P) M M M (A) M (P) l A u l + M + + M + k k P 1 1 m m 1 1 k m 2 1 P P m n l k 1 1 1 1 l l 1 1 k m t-th 2 2 k k T 2 2 P P l l l k 2 2 2 2 P P m m m n 2 2 2 2 n x A P c n n n × n × c T t t Fig. 3 Construction of the Galerkin coarse-level operator A PAP . M () A and M () A are the lower and upper column c l u indices of the non-zero entries in A of the t-th OpenMP thread. îîIW 7KHUHLVRQHLQMHFWRUDWWKHFHQWHURIWKH 6 Numerical expriments ¿HOGDQGIRXUSURGXFHUVRQHDWHDFKRIWKHIRXUFRUQHUV7KH We use the second model from the Tenth SPE total simulation time is 2,000 days. Comparative Solution Project (Christie and Blunt, 2001), The model has no top structure or faults and has a uniform which was designed to compare the ability of upscaling initial water-oil interface. The depth of the reservoir is 3,657.6 approaches used by various participants to predict the m, and the initial field pressure is 41.37 MPa. Oil density performance of water flooding in a highly heterogeneous at the standard condition is 0.849 g/cm and oil viscosity EODFNRLOUHVHUYRLUZLWKVLPSOHJHRPHWU\GHVFULEHGE\D¿QH VFDOHîî UHJXODU&DUWHVLDQJHRORJLFDO at the reservoir condition is 3 mPa·s. The field has a low model. The model described herein was originally generated saturation pressure, and there are only two phases (water and IRUXVHLQ7KHWKH3814SUREOHPVWDWHPHQWVSHFL¿HGSURMHFW oil) during the whole simulation. The top 21.35 m (35 layers) that the competition’s purpose was to compare the respective represents the Tarbert formation, and the bottom 30.5 m (50 layers) represents the Upper Ness. The top part of the model solutions in regard to accuracy. The model dimensions are 546 Pet.Sci.(2014)11:540-549 represents a prograding near-shore environment and the lower horizontal permeabilities, k /k , varies from 0.001 to 0.3. The v h SDUWLVÀXYLDO,QWKLVPRGHOSUREOHPWKHSHUPHDELOLW\UDQJH DYHUDJHDQGPD[LPDOSRURVLW\YDOXHVRIWKH¿HOGDUH -3 -3 2 LVî î ȝP and the average permeability and 0.5, respectively. See Fig. 4 for the porosity of each of -3 2 LVî ȝP . The ratio between the vertical and the four sample horizontal layers. PORO PORO PORO PORO 0.000 0.125 0.250 0.375 0.500 0.000 0.125 0.250 0.375 0.500 0.000 0.125 0.250 0.375 0.500 0.000 0.125 0.250 0.375 0.500 Fig. 4 Porosity of four sample horizontal layers in SPE10 (Model 2) The total wall time for a single simulation run using a with Intel Core i7 3.33 GHz CPU (4 cores) and 8 GB DDR3 new-generation simulator, HiSim, is less than 45 minutes RAM. This test platform (Platform A) cost about $1,250 USD for the SPE10 problem (1.1 M grid cells, 2.2 M degrees of when bought new in early 2011. The Intel Core i7 utilizes the freedom) using one single thread (detailed numerical results hypre-threading (HT) technology, which was developed to will be reported in Table 2 and further discussed later). HiSim improve parallel performance by duplicating certain sections is an in-house reservoir simulator, developed by RIPED, of the processor. However, some experiments have indicated 3HWUR&KLQDZLWKVHYHUDOÀXLGÀRZPRGHOVDQGWKHPXOWLOHYHO preconditioner discussed in this paper implemented therein; (Abdel-Qader and Walker, 2010). In our experiments, we see, Li et al, 2013a; 2013b; Wu et al, 2013a; 2013b. And, the disable the HT feature of the i7 CPU. It is well-known OLQHDUVROYHUDORQHWDNHVDERXWRIWKHWRWDOVLPXODWLRQ that the parallel efficiency heavily depends on algorithm, time of a new-generation simulator when using one core implementation, and hardware architecture. We use another only. We compared our numerical results with the benchmark computer (Platform B) for comparison: HP Z800 server with results by Landmark, Geoquest, Chevron, and Streamsim two Intel Xeon X5590 CPU (4 cores) and 24 GB DDR3 RAM. reported in Christie and Blunt, 2001 (Figs. 5-6). The curves This computer was purchased early in 2010 and the market of field oil rate, field average pressure, well oil rate, and price at that time was $7,000 USD. A single core of Intel Xeon ZHOOZDWHUFXWLQWKH¿JXUHVVKRZJRRGDJUHHPHQWZLWKWKH X5590 is much less powerful than the Intel i7 CPU. reported results using other simulators. Landmark Landmark Geoquest Geoquest 800 40 Chevron Chevron Streamsim Streamsim HiSim HiSim 200 25 0 20 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time, days Time, days Fig. 5 IHUHQWVLPXODWRUV&RPSDULVRQRI¿HOGRLOUDWH/HIW DQGDYHUDJHSUHVVXUH5LJKW E\GLI Field oil rate, m /d Field average pressure, MPa DSSOLFDWLRQVVRPHIRUHI¿FLHQF\RIORVVFDXVHPLJKW+7WKDW 3&GHVNWRS'HOODRQ¿UVWSHUIRUPHGZDVVLPXODWLRQ7KH Pet.Sci.(2014)11:540-549 547 1.0 Landmark Landmark Geoquest 0.8 160 Geoquest Chevron Chevron Streamsim Streamsim HiSim 0.6 HiSim 0.4 0.2 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time, days Time, days Fig. 6 Comparison of oil rate (Left) and water cut (Right) in Producer 1 by different simulators The large problem size and heterogenous nature of the method if the global smoother R is chosen to be the ILU or benchmark make it very challenging; as a result, it is suitable Block ILU method. for testing algorithm efficiency, robustness, and parallel We set the stopping criteria to be the relative residual in -3 speed-up of the proposed preconditioner. As we mentioned the Euclidian norm less than 10 . In Table 2, we summarize earlier, Algorithm 1 results in various preconditioning the performance of a new-generation simulator, in which strategies by choosing different subspace solvers or #Timesteps is the total number of time steps, #Newton is smoothers. In this section, we only compare the performance the total number of Newton iterations, #Linear is the total of three simple choices: the original preconditioner B in number of linear iterations, wall time is the total wall-time Algorithm 1, a simplified version B by neglecting Step 4 for the whole simulation (including I/O operations), Average of Algorithm 1 (i.e., without the global smoothing step or R #Newton is the average number of Newton iterations in each DQGDQRWKHUVLPSOL¿HGYHUVLRQ B by neglecting Step 2 time step, and Average #Linear is the average number of of Algorithm 1 (i.e. B =0). We note that B is in fact the CPR linear iterations in each Newton iteration. S 2 Comparison of the preconditioned GMRes methods for SPE10 Table 2 Preconditioner #Timesteps #Newton #Linear Average #Newton Average #Linear Wall time, min B 161 254 2508 1.58 9.87 41.58 B 161 286 3773 1.78 13.19 53.50 B 161 269 4462 1.67 16.59 59.19 From Table 2, we find that each component of the implementation is done by adding OpenMP directives to preconditioner B plays a role in the convergence of the our simulator code and it has only been done for the most iterative method. Removing the smoother B or R will not time-consuming part of the linear solver. The numerical only cause the average number of linear iterations (#Linear) results (total number of Newton steps, average number of to increase, but also cause the total number of nonlinear linear iterations, total wall time in minutes, and parallel iterations (#Newton) to increase slightly. Although our choice speed-up) are reported in Table 3. The parallel speed-up for the components of the proposed algorithm might not yield (the ratio between the simulation time using one core over the best preconditioner for all problems, it is quite efficient the simulation time on multiple cores) is 1.37 when using 4 and robust for this challenging benchmark problem. threads on the 4-core i7 CPU. And speed-up of the solver part Next we investigate the OpenMP speed-up of the is about 1.5 folds. Note that the solver part is the only place preconditioned GMRes method discussed in Sec. 5. The where OpenMP directives are employed. Table 3 OpenMP performance of the preconditioned GMRes solver for SPE10 on Platform A Number of OpenMP threads Wall time Linear solver time Total Newton steps Average linear iterations Parallel speed-up (Linear solver) N min min 1 254 9.87 41.58 35.36 1.00 2 262 10.19 32.60 25.61 1.38 4 260 10.00 30.45 23.23 1.52 Well oil rate, m /d Well water cut 548 Pet.Sci.(2014)11:540-549 We can expect that, when using one thread, the simulation on Platform B will take more CPU time than on Platform A. improved; see Table 4. This example shows the importance for The numerical results confirm this expectation; see Tables 3 users to take full advantage of modern computers by explore and 4. However, we also notice that Platform B has much better parallelism in their algorithms and implementations. Table 4 OpenMP performance of the preconditioned GMRes solver for SPE10 on Platform B Number of OpenMP threads Wall time Linear solver time Total Newton steps Average linear iterations Parallel speed-up (Linear solver) N min min 1 254 9.87 50.08 46.13 1.00 2 262 10.19 41.25 30.28 1.52 4 260 10.00 32.78 20.82 2.22 8 261 10.67 32.40 20.42 2.26 7 Summary and conclusions Nomenclature We discussed a practical and efficient preconditioner Į oil (o), gas (g) and water (w) phases for large sparse linear systems arising from the black oil ȕ Oil (O), Gas (G) and Water (W) components model discretized by the fully implicit method. The method P SUHVVXUHRIĮSKDVH03D of subspace corrections was used to construct a new S saturation, fraction preconditioner of the original highly coupled Jacobian system u velocity, m/s by several sub-problems and suitable solution techniques Į to approximate these sub-problems according to their porosity, fraction -3 2 analytic characteristics. The new method can be used as a k absolute permeability, 10 ȝP preconditioner for Krylov subspace iterative methods. The k relative permeability, fraction UĮ results of the preliminary numerical experiments show that ȝ viscosity, mPa s the linear algebraic solver is quite efficient and robust for 3 3 b formation value factor, m /m highly heterogeneous benchmark and field-scale problems. ȡ NJPÀXLGGHQVLW\ This new solution technique can achieve a turnaround time on a new-generation simulator for a million-cell model of q source/sink term (wells), m /d less than an hour on a mainstream desktop computer. The P , P capillary pressures, MPa cow cgo performance of the solution method on a shared memory 3 3 R solution gas-oil ratio, m /m multicore environment is reasonably good for relatively large 3 3 R oil volatility, m /m reservoir simulation models. Further code optimization is g gravitational acceleration, m/s UHTXLUHGWRLPSURYHSDUDOOHOHI¿FLHQF\ z depth,m Acknowledgements References The authors would like to thank RIPED, PetroChina, for providing data for the numerical tests and support Abd el-Qader J H and Walker R S. Performance evaluation of OpenMP through PetroChina New-generation Reservoir Simulation benchmarks on Intel’s quad core processors, Proceedings of the 14th Software (2011A-1010), the Program of Research on WSEAS International Conference on Computers, 348-355, 2010 Al- Shaalan T M, Klie H, Dogru A H, et al. Studies of robust two Continental Sedimentary Oil Reservoir Simulation stage preconditioners for the solution of fully implicit multiphase (z121100004912001) founded by Beijing Municipal Science flow problems. Ppaer SPE 118722 presented at the SPE Reservoir & Technology Commission and PetroChina Joint Research Simulation Symposium, Woodlands, TX, USA, 2009 Funding12HT1050002654. Feng is partially supported App leyard J R, Cheshire I M and Pollard R K. Special techniques for by the NSFC Grant 11201398, and Hunan Provincial fully implicit simulators. Proc. European Symposium on enhanced Natural Science Foundation of China Grant 14JJ2063 and oil recovery, Bournemouth, England, 395-408, 1981 Specialized Research Fund for the Doctoral Program of App leyard J R and Cheshire I M. Nested factorization. Paper SPE 12264 Higher Education of China Grant 20124301110003. Zhang is presented at Proc. 7th SPE Symposium on Reservoir Simulation, partially supported by the Dean’s Startup Fund, Academy of Mathematics and System Sciences and the State High Tech Ban k R E, Chan T F, Coughran J W M, et al. The alternate-block- Development Plan of China (863 Program) 2012AA01A309. factorization procedure for systems of partial differential equations. BIT. 1989. 29(4): 938-954 Shu is partially supported by NSFC Grant 91130002 and Beh ie A and Forsyth P A Jr. Incomplete factorization methods for fully Program for Changjiang Scholars and Innovative Research implicit simulation of enhanced oil recovery. SIAM J. Sci. Stat. Team in University of China Grant IRT1179 and by the Comp. 1984. 5: 543-561 6FLHQWL¿F5HVHDUFK)XQGRIWKH+XQDQ3URYLQFLDO(GXFDWLRQ Beh ie G and Vinsome P. Block iterative methods for fully implicit Department of China Grant 12A138. reservoir simulation. Soc. Pet. Eng. J. 1982. 22(5): 658-668 LDOO\ SDUDOOHOHI¿FLHQF\DQGWKHVROYHUSDUDOOHOVSHHGXSLVVXEVWDQW Pet.Sci.(2014)11:540-549 549 Bjø rstad P E, Manne F, SøreYLN7HWDO(I¿FLHnt matrix multiplication Languages and Operating Systems (Asplos Iv), 1991. 63-74 on SIMD computers. SIAM J. Matrix Anal. Appl. 1992. 13(1): 386- Li Q Y, Wu S H, Wang B H, et al. A new generation reservoir simulator 401 DQGLWVDSSOLFDWLRQLQDPDWUXUHÀRRGLQJZDWHURLO¿HOG3DSHU63( Bra ndt A, McCormick S and Ruge J. Algebraic Multigrid (AMG) for SUHQVHQWHGDWWKH$VLD3DFL¿F2LO *DV&RQIHUHQFHDQG Sparse Matrix Equations, Sparsity and Its Applications. Cambridge Exhibition, Jakarta, Indonesia, 2013a Univ. Press, Cambridge. 1985. 257-284 Li X B, Wu S H, Li Q Y, et al. An improved approach to simulate low- < permeability fractured reservoir with dynamic hybrid dual-porosity in porous media. Society for Industrial Mathematics, 2006 PRGHO3DSHU63(SUHQVHQWHGDWWKH$VLD3DFL¿F2LO *DV Chr istie M A and Blunt M J. Tenth SPE comparative solution project: a Conference and Exhibition, Jakarta, Indonesia, 2013b comparison of upscaling techniques. SPE Reservoir Evaluation & Mei jerink J A and van der Vorst H A. An iterative solution method Engineering. 2001. 4: 308-317 (paper SPE 72469) for linear systems of which the coefficient matrix is a symmetric Con cus P, Golub G H and Meurant G. Block preconditioning for the M-matrix. Math. Comp. 1977. 31: 148-162 conjugate gradient method. SIAM J. Sci. Stat. Comput. 1985. 6: Mei jerink J A. Iterative methods for the solution of linear equations 220-252 based on the incomplete block factorization of the Matrix. Paper Dog ru A, Fung L, Middya U, et al. A next-generation parallel reservoir SPE12262 presented at the SPE Reservoir Simulation Symposium, simulator for giant reservoirs. Paper SPE 119272 presented at SPE Lubbock, TX. Nov. 14-15, 1983 Reservoir Simulation Symposium, 2009 Oli ker L, Li X, Husbands P, et al. Effects of ordering strategies and Dou glas J Jr, Peaceman D W and Rachford H H Jr. A method of programming paradigms on sparse matrix computations. SIAM calculating multi-dimensional immiscible displacement. SPE AIME. Review. 2002. 44(3): 373-393 1959. 216: 297-396 Pav las E J Jr. Fine-scale simulation of complex water encroachment in a Dup ont T, Kendall R P and Rachford H H Jr. An approximate large carbonate reservoir in Saudi Arabia. SPE Reservoir Evaluation factorization procedure for solving self-adjoint elliptic difference & Engineering. 2002. 5(5): 346-354 (paper SPE 79718) equations. SIAM J. Numer. Anal. 1968. 5: 559-573 Rug e J and Stüben K. Algebraic multigrid, in multigrid methods. In: Fal gout R. An introduction to algebraic multigrid. Computing in Science Frontiers Appl. Math. Vol. 3, 73-130. SIAM, Philadelphia, PA, 1987 and Engineering. 2006. 8: 24-33 Saa d Y. Iterative Methods for Sparse Linear Systems. Society for Fen g C, Shu S and Yue X. An improvement for the OpenMP version Industrial and Applied Mathematics, 2003 BoomerAMG. Proceedings of CCF HPC China 2012, Zhangjiajie, Stü ben K. An introduction to algebraic multigrid. In: Trottenberg U, China. 2012. 321-328 Oosterlee C and Schüller A. Multigrid. Academic Presss. 2001. 413- Fen g C, Shu S, Xu J, et al. A multi-stage preconditioner for the black 532 oil model and its OpenMP implementation. 21st International Stü ben K, Clees T, Klie H, et al. Algebraic multigrid methods (AMG) Conference on Domain Decomposition Methods (2012, INRIA for the efficient solution of fully implicit formulations in reservoir Rennes-Bretagne-Atlantique), in LNCSE, Springer Berlin simulation. Paper SPE 105832 presented at the SPE Reservoir Heidelberg, 2014. 127-138 Simulation Symposium, Houston, TX, USA, 2007 Fen g C. Multilevel Iterative Methods and Solvers for Reservoir Tra ngenstein J A and Bell J B. Mathematical structure of the black-oil Simulation on CPU-GPU Heterogenous Computers. Ph.D. Thesis, model for petroleum reservoir simulation. SIAM Journal on Applied Xiangtan University, Hunan, China, 2014 Mathematics. 1989. 49: 749-783 Han D K. The achievements and challenges of EOR technology for Vud uc R. Automatic Performance Tuning of Sparse Matrix Kernels. th RQVKRUHRLO¿HOGVLQ&KLQD3URFHHGLQJVRIWKH World Petroleum Ph.D. Thesis. University of California, Berkeley, CA, USA, 2003 Congress, 363-372, 1998 Wal lis J R. Incomplete Gaussian elimination as a preconditioning for Han D K, Yang C Z, Zhang Z Q, et al. Recent development of generalized conjugate gradient acceleration. Paper SPE 12265 enhanced oil recovery in China. Journal of Petroleum Science and presented at the SPE Reservoir Simulation Symposium, San Engineering. 1999. 22: 181-188 Francisco, California, November 15-18, 1983 Hay der M E and Baddourah M. Challenges in high performance Wal lis J R, Kendall R P and Little T E. Constrained residual acceleration computing for reservoir simulation. Ppaer SPE 152414 presented of conjugate residual methods. Paper SPE 13536 presented at the at the EAGE Annual Conference & Exhibition incorporating SPE SPE Reservoir Simulation Symposium, Dallas, TX, February 10-13, Europec, Copenhagen, Denmark, 4-7, June 2012 1985 Hu X, Liu W, Qin G, et al. Development of a fast auxiliary subspace Wan g B H, Wu S H, Han D K, et al. Block compressed storage and preconditioner for numerical reservoir simulators. Paper SPE computation in the large-scale reservoir simulation. Petroleum 148388 presented at SPE Reservoir Characterization and Simulation Exploration and Development. 2013a. 40: 495-500 (inChinese) Conference, 2011 Wan g B H, Wu S H, Li Q Y, et al. Applications of BILU0-GMRES in Hu X, Wu S H, Wu X H, et al. Combined preconditioning with reservoir numerical simulation. ACTA Petrolei Sinica. 2013b. 34: applications in reservoir simulation. SIAM Multiscale Modeling and 954-958 (inChinese) Simulation. 2013a. 11: 507-521 Wan g F and Xu J. A crosswind block iterative method for convection- Hu X, Xu J and Zhang C S. Application of auxiliary space GRPLQDWHGSUREOHPV6,$0-RXUQDORQ6FLHQWL¿F&RPSXWLQJ preconditioning in field-scale reservoir simulation. Science China 21: 620-645 Mathematics. 2013b. 56: 2737-2751 Wat ts J W and Shaw J S. A new method for solving the implicit reservoir Hyp re: A scalable linear solver library. URL: http://acts.nersc.gov/hypre/ simulation matrix equation. Paper SPE 93068 presented at the SPE Lacroix S, Vassilevski Y and Wheeler M. Decoupling preconditioners Reservoir Simulation Symposium, Texas, TX, USA, 2005 in the implicit parallel accurate reservoir simulator (IPARS). Numer. Wu S H, Xu J, Zhang C S, et al. Multilevel preconditioners for a new Linear Algebra with Applications. 2001. 8: 537-549 generation reservoir simulator. Paper SPE 166011 presented at Lac roix S, Vassilevski Y, Wheeler J, et al. Iterative solution methods for SPE Reservoir Characterisation and Simulation Conference and PRGHOLQJPXOWLSKDVHÀRZLQSRURXVPHGLDIXOO\6,$0LPSOLFLWO\- Exhibition held in Abu Dhabi, UAE 2013a Sci. Comput. 2003. 25: 905-926 Wu S H, Li X B, Li Q Y, et al. A dynamic hybrid model to simulation Lam M D, Rothberg E E and Wolf M E. The cache performance and fractured reservoirs. Paper IPTC 16521 presented at the International optimizations of blocked algorithms. Proceedings of the Fourth Petroleum Technology Conference, Beijing, China, 2013b International Conference on Architectural Support for Programming (Edited by Sun Yanhua) ÀRZV &KH Q=+XDQ*DQG0D&RPSXWDWLRQDOPHWKRGVIRUPXOWLSKDVH
Petroleum Science – Springer Journals
Published: Oct 4, 2014
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.