Access the full text.
Sign up today, get DeepDyve free for 14 days.
P. Carns, Yushu Yao, K. Harms, R. Latham, R. Ross, K. Antypas (2013)
Production I / O Characterization on the Cray XE 6
C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, Geoff Lowney, S. Wallace, V. Reddi, K. Hazelwood (2005)
Pin: building customized program analysis tools with dynamic instrumentation
(2016)
Analyzing File Create Performance in IBM Spectrum Scale
Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, M. Abd-El-Malek, J. López-Hernández, G. Ganger (2006)
Stardust: tracking activity in a distributed storage system
Lustre (2016)
Lustre File SystemRetrieved from http://www.lustre.org.
(2006)
Problem solving with systemtap
A. Uselton, Mark Howison, N. Wright, D. Skinner, N. Keen, J. Shalf, K. Karavanic, L. Oliker (2010)
Parallel I/O performance: From events to ensembles2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
M. Noeth, Prasun Ratn, F. Mueller, M. Schulz, B. Supinski (2008)
ScalaTrace: Scalable compression and replay of communication traces for high-performance computingJ. Parallel Distributed Comput., 69
Ú. Erlingsson, Marcus Peinado, Simon Peter, M. Budiu (2011)
Fay: extensible distributed tracing from kernels to clustersProceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
M. Mesnier, Matthew Wachs, Raja Sambasivan, J. López-Hernández, James Hendricks, G. Ganger, D. O'Hallaron (2007)
//TRACE: Parallel Trace Replay with Approximate Causal Events
Gluster storage
D. Hildebrand, F. Schmuck (2015)
On making GPFS truly general;login: The USENIX Mag., 40
John Bent, Garth Gibson, G. Grider, Ben McClelland, P. Nowoczynski, J. Nunez, Milo Polte, Meghan Wingate (2009)
PLFS: a checkpoint filesystem for parallel applicationsProceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Akshat Aranya, Charles P. Wright, Erez Zadok (2004)
Tracefs: A file system to trace them allProceedings of the USENIX Conference on File and Storage Technologies (FAST’04)
Jonathan Mace, Ryan Roelke, Rodrigo Fonseca (2015)
Pivot tracing: dynamic causal monitoring for distributed systemsProceedings of the 25th Symposium on Operating Systems Principles
K. Shvachko, Hairong Kuang, S. Radia, R. Chansler (2010)
The Hadoop Distributed File System2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
R. Krishnakumar (2005)
Kernel korner: kprobes-a kernel debuggerLinux Journal, 2005
S. Ghemawat, H. Gobioff, Shun-Tak Leung (2003)
The Google file systemOperating Systems Review
Ceph tracing (2016)
Ceph Logging and DebuggingRetrieved from http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/?highlight=dout.
Raja Sambasivan, A. Zheng, Michael Rosa, E. Krevat, Spencer Whitman, Michael Stroucken, William Wang, Lianghong Xu, G. Ganger (2011)
Diagnosing Performance Changes by Comparing Request Flows
(2005)
Gaining insight into the linux kernel with kprobes
Frank Ch (2006)
Eigler
Lmbench-cache (2015)
lmbench cache benchmarkRetrieved from http://lmbench.sourceforge.net/cgi-bin/man?keyword=cache§ion===8.
(2016)
IBM Spectrum Scale Version 4 Release 2.0. Administration and Programming Reference
Daniel Ellard, J. Ledlie, Pia Malkani, M. Seltzer
Proceedings of Fast '03: 2nd Usenix Conference on File and Storage Technologies 2nd Usenix Conference on File and Storage Technologies Passive Nfs Tracing of Email and Research Workloads
B. Gregg, J. Mauro (2011)
Dtrace: Dynamic Tracing in Oracle Solaris, Mac OS X and Freebsd
Marc-André Vef, Vasily Tarasov, Dean Hildebrand, André Brinkmann (2016)
Tracing of Complex Production Systems: Obstacles and SolutionsSystem Analytics and Characterization. Retrieved from https://drive.google.com/open?id=0B-75gd4swZPMZ1pOUFBJeWxfVjQ.
Vasily Tarasov, Santhosh Kumar, Jack Ma, Dean Hildebrand, A. Povzner, G. Kuenning, E. Zadok (2012)
Extracting flexible, replayable models from large block traces
D. Ellard, J. Ledlie, P. Malkani, M. Seltzer (2003)
Passive NFS tracing of email and research workloadsProceedings of the USENIX Conference on File and Storage Technologies (FAST’03). USENIX Association
Swapnil Patil, Kai Ren, Garth Gibson (2012)
A Case for Scaling HPC Metadata Performance through De-specialization2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Steven Wright, S. Hammond, S. Pennycook, R. Bird, J. Herdman, I. Miller, A. Vadgama, A. Bhalerao, S. Jarvis (2013)
Parallel File System Analysis Through Application I/O TracingComput. J., 56
Kimberly Keeton, Cipriano Santos, Dirk Beyer, Jeffrey Chase, John Wilkes (2004)
Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004
Huong Luu, M. Winslett, W. Gropp, R. Ross, P. Carns, K. Harms, Mr Prabhat, Suren Byna, Yushu Yao (2015)
A Multiplatform Study of I/O Behavior on Petascale SupercomputersProceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
Sysdig (2016)
SysdigRetrieved from http://www.sysdig.org/.
S. Sharma, M. Dagenais (2016)
Enhanced Userspace and In-Kernel Trace Filtering for Production SystemsJournal of Computer Science and Technology, 31
2005. I/O Traces, Tools and Analysis
Jing Xing, Jin Xiong, Ninghui Sun, Jie Ma (2009)
Adaptive and scalable metadata management to support a trillion filesProceedings of the Conference on High Performance Computing Networking, Storage and Analysis
S. Snyder, P. Carns, K. Harms, R. Ross, Glenn Lockwood, N. Wright (2016)
Modular HPC I/O Characterization with Darshan2016 5th Workshop on Extreme-Scale Programming Tools (ESPT)
(2017)
IBM Elastic Storage Server delivers flash-based storage models and Power I/O and server enhancements
Songnian Zhou, H. Costa, A. Smith (1985)
A File System Tracing Package for Berkeley UNIX
(2017)
s3fs home page. Retrieved from https://github.com/s3fs-fuse/s3fs-fuse
Kai Ren, Qing Zheng, Swapnil Patil, Garth Gibson (2014)
IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk InsertionSC14: International Conference for High Performance Computing, Networking, Storage and Analysis
R. Haskin, Frank Schmuck (1996)
The Tiger Shark file systemCOMPCON '96. Technologies for the Information Superhighway Digest of Papers
Hildebrand D. (2015)
On making GPFS truly general. ;loginThe USENIX Mag., 40
Frank Schmuck, R. Haskin (2002)
GPFS: A Shared-Disk File System for Large Computing Clusters
(2011)
Kernel summit 2016. Kernel Summit
Jeff Bonwick, M. Ahrens, Val Henson, M. Maybee, Mark Shellenbaum (2003)
The Zettabyte File System
Swapnil Patil, Garth Gibson (2011)
Scale and Concurrency of GIGA+: File System Directories with Millions of Files
Bryan Cantrill, Michael Shapiro, Adam Leventhal (2004)
Dynamic Instrumentation of Production Systems
Dennis Geels, Gautam Altekar, S. Shenker, I. Stoica (2006)
Replay debugging for distributed applications
Strace 2001. strace software home page
(2005)
I/O Traces, Tools and Analysis. Retrieved from www.usenix.org/events/fast05/bofs. html#io
Using the Linux Kernel Tracepoints
Dean Berris, Alistair Veitch, Nevin Heintze, Eric Anderson, Ninghan Wang (2016)
XRay: A Function Call Tracing System
Kernel summit (2016)
Kernel Summit 2011 SummaryRetrieved from http://lwn.net/Articles/464268/.
S. Lee, C. Shields (2001)
Tracing the Source of Network Attack: A Technical, Legal and Societal Problem
P. Braam, Philipp Schwan (2002)
Lustre: The intergalactic file system
S. Weil, S. Brandt, E. Miller, D. Long, C. Maltzahn (2006)
Ceph: a scalable, high-performance distributed file system
Swapnil Patil, Kai Ren, Garth Gibson (2012)
A case for scaling HPC metadata performance through de-specializationProceedings of the High Performance Computing
O. Rodeh, Josef Bacik, Chris Mason (2013)
BTRFS: The Linux B-Tree FilesystemACM Trans. Storage, 9
Lustre tracing (2016)
Lustre Diagnostic and Debugging ToolsRetrieved from http://wiki.lustre.org/index.php/Diagnostic_and_Debugging_Tools.
S. Kim, S. Son, W. Liao, M. Kandemir, R. Thakur, A. Choudhary (2012)
IOPin: Runtime Profiling of Parallel I/O in HPC Systems2012 SC Companion: High Performance Computing, Networking Storage and Analysis
W. Frings, F. Wolf, Ventsislav Petkov (2009)
Scalable massively parallel I/O to task-local filesProceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Xuanjia Qiu, Hongxing Li, Chuan Wu, Zongpeng Li, F. Lau (2012)
Cost-minimizing dynamic migration of content distribution services into hybrid clouds2012 Proceedings IEEE INFOCOM
(1997)
Method and apparatus for determining a communications path between two nodes in an Internet Protocol
(2016)
Tracing on Linux
Strace (2001)
strace software home pageRetrieved from https://strace.io/.
(2015)
MDtest metadata benchmark
P. Carns, R. Latham, R. Ross, K. Iskra, S. Lang, Katherine Riley (2009)
24/7 Characterization of petascale I/O workloads2009 IEEE International Conference on Cluster Computing and Workshops
IBM. (2016)
IBM Spectrum Scale Version 4 Release 2Administration and Programming Reference.
Xiaoqing Luo, F. Mueller, P. Carns, John Jenkins, R. Latham, R. Ross, S. Snyder (2017)
ScalaIOExtrap: Elastic I/O Tracing and Extrapolation2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Rodrigo Fonseca, G. Porter, R. Katz, S. Shenker, I. Stoica (2007)
X-Trace: A Pervasive Network Tracing Framework
(2016)
Spectrum Scale Enhancements for CORAL. Presentation slides at Supercomputing’16
(2015)
Glog -C++ implementation of the Google logging module
Daniel Becker, F. Wolf, W. Frings, M. Geimer, B. Wylie, B. Mohr (2007)
Automatic Trace-Based Performance Analysis of Metacomputing Applications2007 IEEE International Parallel and Distributed Processing Symposium
M. Desnoyers, M. Dagenais (2006)
The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux
(2017)
Managing Red Hat storage logs
Lustre tracing 2016. Lustre Diagnostic and Debugging Tools
Dean Hildebrand, Frank Schmuck (2014)
Chapter 9—GPFSHigh Performance Parallel I/O
Aggarwal A. (1997)
Method and apparatus for determining a communications path between two nodes in an Internet Protocol (IP) network. (Oct. 7 1997)Patent
(2014)
Chapter 9—GPFS. In High Performance Parallel I/O, Quincey Koziol Prabhat (Ed.)
Lmbench (2013)
lmbenchRetrieved from https://sourceforge.net/projects/lmbench/.
John Meehan, Cansu Aslantas, S. Zdonik, Nesime Tatbul, Jiang Du (2017)
Data Ingestion for the Connected World
Shuangyang Yang, W. Ligon, Elaine Quarles (2011)
Scalable Distributed Directory Implementation on Orange File System
H. Brunst, Hans-Christian Hoppe, W. Nagel, M. Winkler (2001)
Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach
s (2017)
s3fs home pageRetrieved from https://github.com/s3fs-fuse/s3fs-fuse.
An introduction to BeeGFS
BeeGFS configurations (2017)
BeeGFS logging configurationsRetrieved from https://git.beegfs.com/pub/v6/blob/6.1/fhgfs_client_module/build/dist/etc/beegfs-client.conf#L224.
Sarp Oral, Gautam Shah (2016)
Spectrum Scale Enhancements for CORALPresentation slides at Supercomputing’16. Retrieved from http://files.gpfsug.org/presentations/2016/SC16/11_Sarp_Oral_Gautam_Shah_Spectrum_Scale_Enhancements_for_CORAL_v2.pdf.
Dean Hildebrand, Frank Schmuck (2015)
On Making GPFS Truly Generallogin Usenix Mag., 40
Y. Qian, Xi Li, Shu Ihara, Lingfang Zeng, J. Kaiser, Tim Süß, A. Brinkmann (2017)
A Configurable Rule based Classful Token Bucket Filter Network Request Scheduler for the Lustre File SystemSC17: International Conference for High Performance Computing, Networking, Storage and Analysis
K. Pollack, A. Veitch (2005)
I/O Traces, Tools and AnalysisRetrieved from www.usenix.org/events/fast05/bofs.html#io.
(2016)
Scale developers
(2015)
Linux Perf
Daniel Ellard, M. Seltzer (2003)
New NFS Tracing Tools and Techniques for System Analysis
(2016)
Tracing of Complex Production Systems: Obstacles and Solutions. System Analytics and Characterization. Retrieved from https://drive.google.com/ open?id=0B-75gd4swZPMZ1pOUFBJeWxfVjQ
(2009)
Finding origins of latencies using ftrace
(2013)
BF tracing filters
IBM Spectrum Scale’s parallel file system General Parallel File System (GPFS) has a 20-year development history with over 100 contributing developers. Its ability to support strict POSIX semantics across more than 10K clients leads to a complex design with intricate interactions between the cluster nodes. Tracing has proven to be a vital tool to understand the behavior and the anomalies of such a complex software product. However, the necessary trace information is often buried in hundreds of gigabytes of by-product trace records. Further, the overhead of tracing can significantly impact running applications and file system performance, limiting the use of tracing in a production system. In this research article, we discuss the evolution of the mature and highly scalable GPFS tracing tool and present the exploratory study of GPFS’ new tracing interface, FlexTrace, which allows developers and users to accurately specify what to trace for the problem they are trying to solve. We evaluate our methodology and prototype, demonstrating that the proposed approach has negligible overhead, even under intensive I/O workloads and with low-latency storage devices.
ACM Transactions on Storage (TOS) – Association for Computing Machinery
Published: Apr 12, 2018
Keywords: GPFS
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.