Wednesday, August 24, 2011

Fetching hard drive temperatures in FreeBSD

#!/usr/local/bin/bash
for i in `gls --sort=none /dev/da{?,??}`; do
    echo -n -e $i "\t";
    smartctl -A $i | 
    grep -i temperature_celsius | 
    cut -d '-' -f2 | 
    cut -d "(" -f1 |
    sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//';
done

/dev/da0        33
/dev/da1        31
/dev/da2        32
/dev/da3        32
/dev/da4        32
/dev/da5        27
/dev/da6        25
/dev/da7        24
/dev/da8        30
/dev/da9        31
/dev/da10       32
/dev/da11       30
/dev/da12       23
/dev/da13       23
/dev/da14       30
/dev/da16       36
/dev/da17       40
/dev/da18       42
/dev/da19       41
/dev/da20       40

Sunday, June 26, 2011

FreeBSD 8-STABLE ZFS v28 Benchmarks

I decided to run iozone on my file server today to see what performance was looking like using the latest 8-STABLE and ZFS v28, across 15 disks. I'm pretty satisfied with the results, and it gives me a good starting point to evaluate performance as I add disks. I have 5 more 1TB disks laying around that I need to hook up, and room for an additional 25 disks.

Let's start with the zpool configuration, which is 3 raidz sets of 5 drives each, striped:

config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
da8 ONLINE 0 0 0
da13 ONLINE 0 0 0
da12 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
da11 ONLINE 0 0 0
da10 ONLINE 0 0 0
da14 ONLINE 0 0 0
da9 ONLINE 0 0 0
da5 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da3 ONLINE 0 0 0
da2 ONLINE 0 0 0
da4 ONLINE 0 0 0

And here's the command used to benchmark:

iozone -R -a -i 0 -i 1 -g 16g -f /tank/test/test -b /root/raidz.xls

And the pretty graphs which I managed to throw together in Excel 2010 ...

From iozone 06262011

From iozone 06262011

From iozone 06262011

From iozone 06262011

Sunday, March 6, 2011

ZFS Benchmarks; Prefetch enabled vs disabled

FreeBSD foghornleghorn.res.openband.net 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #13: Tue Feb 22 17:39:03 EST 2011 root@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN amd64

/boot/loader.conf
======================
vfs.zfs.zil_disable="1"
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"
vm.kmem_size="8192M"
vfs.zfs.arc_max=6144M
vfs.zfs.prefetch_disable="0"
vfs.zfs.txg.timeout="5"

/etc/sysctl.conf
======================
kern.maxfiles=65536
kern.maxfilesperproc=32768
vfs.read_max=32
vfs.ufs.dirhash_maxmem=16777216
kern.maxvnodes=250000
vfs.zfs.txg.write_limit_override=1073741824

ZFS details
======================
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank 18.2T 7.73T 10.4T 42% ONLINE -

# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 6.17T 8.10T 36.7K /tank
tank/downloads 4.89T 8.10T 2.30T /tank/downloads
tank/downloads/movies 2.59T 8.10T 2.59T /tank/downloads/movies
tank/usr 1.29T 8.10T 32.0K /tank/usr
tank/usr/home 1.29T 8.10T 69.5K /usr/home
tank/usr/home/josh 1.29T 8.10T 13.4G /usr/home/josh
tank/usr/home/josh/hellanzb 32.0K 8.10T 32.0K /usr/home/josh/hellanzb
tank/usr/home/josh/rtorrent 1.27T 8.10T 1.27T /usr/home/josh/rtorrent
tank/usr/home/josh/watch 8.00M 8.10T 8.00M /usr/home/josh/watch

# zpool status tank
pool: tank
state: ONLINE
scrub: scrub completed after 7h43m with 0 errors on Sun Mar 6 07:43:56 2011
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da8 ONLINE 0 0 0
da18 ONLINE 0 0 0
da19 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da11 ONLINE 0 0 0
da10 ONLINE 0 0 0
da17 ONLINE 0 0 0
da9 ONLINE 0 0 0
da5 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da3 ONLINE 0 0 0
da2 ONLINE 0 0 0
da4 ONLINE 0 0 0

errors: No known data errors

Controller details
======================
mpt0: port 0x6000-0x60ff mem 0xf75fc000-0xf75fffff,0xf75e0000-0xf75effff irq 18 at device 0.0 on pci1
mpt0: [ITHREAD]
mpt0: MPI Version=1.5.20.0
mpt1: port 0x7000-0x70ff mem 0xf78fc000-0xf78fffff,0xf78e0000-0xf78effff irq 19 at device 0.0 on pci2
mpt1: [ITHREAD]
mpt1: MPI Version=1.5.20.0
mpt2: port 0xd000-0xd0ff mem 0xf7ffc000-0xf7ffffff,0xf7fe0000-0xf7feffff irq 19 at device 0.0 on pci6
mpt2: [ITHREAD]
mpt2: MPI Version=1.5.19.0

Disk details
======================
da8 at mpt0 bus 0 scbus0 target 0 lun 0
da8: Fixed Direct Access SCSI-5 device
da8: 300.000MB/s transfers
da8: Command Queueing enabled
da8: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da9 at mpt0 bus 0 scbus0 target 1 lun 0
da9: Fixed Direct Access SCSI-5 device
da9: 300.000MB/s transfers
da9: Command Queueing enabled
da9: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da0 at mpt1 bus 0 scbus1 target 0 lun 0
da0: Fixed Direct Access SCSI-5 device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da10 at mpt0 bus 0 scbus0 target 2 lun 0
da10: Fixed Direct Access SCSI-5 device
da10: 300.000MB/s transfers
da10: Command Queueing enabled
da10: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da11 at mpt0 bus 0 scbus0 target 3 lun 0
da11: Fixed Direct Access SCSI-5 device
da11: 300.000MB/s transfers
da11: Command Queueing enabled
da11: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da1 at mpt1 bus 0 scbus1 target 1 lun 0
da1: Fixed Direct Access SCSI-5 device
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da2 at mpt1 bus 0 scbus1 target 2 lun 0
da2: Fixed Direct Access SCSI-5 device
da2: 300.000MB/s transfers
da2: Command Queueing enabled
da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da3 at mpt1 bus 0 scbus1 target 3 lun 0
da3: Fixed Direct Access SCSI-5 device
da3: 300.000MB/s transfers
da3: Command Queueing enabled
da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da4 at mpt1 bus 0 scbus1 target 4 lun 0
da4: Fixed Direct Access SCSI-5 device
da4: 300.000MB/s transfers
da4: Command Queueing enabled
da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da5 at mpt1 bus 0 scbus1 target 5 lun 0
da5: Fixed Direct Access SCSI-5 device
da5: 300.000MB/s transfers
da5: Command Queueing enabled
da5: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da6 at mpt1 bus 0 scbus1 target 6 lun 0
da6: Fixed Direct Access SCSI-5 device
da6: 300.000MB/s transfers
da6: Command Queueing enabled
da6: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da7 at mpt1 bus 0 scbus1 target 7 lun 0
da7: Fixed Direct Access SCSI-5 device
da7: 300.000MB/s transfers
da7: Command Queueing enabled
da7: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da16 at mpt2 bus 0 scbus2 target 82 lun 0
da16: Fixed Direct Access SCSI-5 device
da16: 300.000MB/s transfers
da16: Command Queueing enabled
da16: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da17 at mpt2 bus 0 scbus2 target 83 lun 0
da17: Fixed Direct Access SCSI-5 device
da17: 300.000MB/s transfers
da17: Command Queueing enabled
da17: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da18 at mpt2 bus 0 scbus2 target 84 lun 0
da18: Fixed Direct Access SCSI-5 device
da18: 300.000MB/s transfers
da18: Command Queueing enabled
da18: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da19 at mpt2 bus 0 scbus2 target 85 lun 0
da19: Fixed Direct Access SCSI-5 device
da19: 300.000MB/s transfers
da19: Command Queueing enabled
da19: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)

Benchmark results #1 (prefetch disabled)
======================
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
foghornleghorn. 16G 213 99 266782 53 90296 19 480 95 218719 24 229.7 6
Latency 43348us 37929us 242ms 102ms 68306us 462ms
Version 1.96 ------Sequential Create------ --------Random Create--------
foghornleghorn.res. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 15220 44 +++++ +++ 19214 56 22371 58 +++++ +++ 22133 66
Latency 10658us 60us 82us 6540us 39us 1677us

Benchmark results #2 (prefetch enabled)
======================
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
foghornleghorn. 16G 201 99 276506 56 198428 38 459 97 627451 73 252.0 5
Latency 45695us 35953us 265ms 69630us 42440us 389ms
Version 1.96 ------Sequential Create------ --------Random Create--------
foghornleghorn.res. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 14988 50 +++++ +++ 18693 59 18535 51 +++++ +++ 20827 67
Latency 13309us 93us 116us 8165us 36us 1046us

Wednesday, July 19, 2006

Grid File Systems: A Forensic Analysis

Grid File Systems: A Forensic Analysis

Joshua Boyd
College of Information Science and Technology, Radford University
Radford, Virginia 24142, United States of America

and

William Leonard
College of Information Science and Technology, Radford University
Radford, Virginia 24142, United States of America

and

Brian Nash
College of Information Science and Technology, Radford University
Radford, Virginia 24142, United States of America

and

Chen-Chi Shing
College of Information Science and Technology, Radford University
Radford, Virginia 24142, United States of America

ABSTRACT

Because grid file systems are becoming more widespread and the unique nature of such systems, it is important that the security risks of storing sensitive data on these systems are thoroughly evaluated and tested. This paper describes what a grid file system is, potential security vulnerabilities, and how these vulnerabilities should be evaluated and tested. Further investigation will be made into the applications of forensic tools concerning grid file systems. The problem with investigating these types of systems is that data has the potential to be spread across multiple computer systems, perhaps distributed across the planet, making the task of investigating computer crimes that much more difficult. The reader should have a good sense of the background of these types of file systems and be better informed about the concerns of deploying such a system where sensitive data is to be stored on it.

Keywords: Grid File System, Gfarm, Security, Forensics, Grid Computing

1. INTRODUCTION

As grid computing moves towards the forefront of computing, it opens up a new set of challenges for investigators. As grids allow us not only to process more data, faster, but also to simultaneously utilize storage devices that may be located anywhere around the world, the sheer complexity of a grid environment can cause investigations to be an overwhelmingly daunting task. In addition, the nature of a grid environment lends itself to a number of unique vulnerabilities that are unique to this type of system in and of itself.

When evaluating the vulnerabilities of a grid system, it is important to not only consider the security limitations of the system as a whole, but also the potential vulnerabilities in the host operating system and the supporting software and protocols contained there in. While grid environments are still in the initial stages of development and implementation, the majority of grid systems rely on the same core underling technologies such as OpenLDAP or OpenSSL in order to function. The unique challenge in securing such an environment exists in that a single vulnerability in one of these technologies may multiply itself as the volume of hosts increase across the system. This situation becomes most prevalent in a public grid environment where hosts are permitted to join the grid at will, as central security management is no longer an option and a single host may in itself compromise the integrity of the entire grid.

While securing vast computer networks, which mix public and private systems, is no new task, grid systems typically rely on putting bits and pieces or individual files across multiple nodes in order to speed up accessibility or to provide reliability; two key attributes of a grid system. A new unique challenge arises in that private files may exist on host machines where the proprietor of that particular node may host individual pieces or whole files for which they may not be intended or permitted to have access to. While the proprietor may not permit access to these files by the system itself, the data may nonetheless still exist on the individual’s system at a lower level.

While vulnerabilities are certainly, a key concern with grid systems, investigation and forensic data analysis in a grid file system is the primary focus of our research. One of the unique aspects of investigating a grid file system exists in that a single file may exist across multiple hosts, across multiple physical sites. In conventional investigations, investigators may typically narrow their focus to a specific host or hosts involved in an incident, where as in a grid environment, a single incident may span across thousands of nodes where the proprietors of the hosts may be completely ignorant of the incident itself. This is but one of the many unique challenges that investigators will face as grid technology moves towards the mainstream.

As grid systems begin to become more widespread, it is safe to assume that data on these systems will need to be recovered or investigated by law enforcement, systems administrators, or other interested parties. It is important for administrators to develop a unique set of security and investigative policies particular to their specific grid environment as each such system varies significantly from one to the next.


2. WHAT IS A GRID FILE SYSTEM?

A Grid File System is, in essence, a Global Virtual File System. It has the ability to span multiple network nodes and appears as a single fixed disk volume to the end-user. User data is split into clusters that are distributed across many client nodes. Data is duplicated across the nodes but at the same time, not all of the data can be found on one node. This is to maximize throughput of the system while files are being retrieved, as well as to increase the security of the system such that if one node is compromised there is no usable data available on the hard drive.

A central master server, called a metadata server, manages the entire system. This metadata server stores information about all of the various nodes connected to it and where different files in the system can be found on the Grid. This server has the ability to support more than 10,000 clients and file server nodes, respectively. Grid file systems are an abstract technology, which is still in development, but several implementations are being used and tested today, such as Gfarm, which we will be looking at in depth later in this paper.

3. DATA GRIDS VERSUS GRID COMPUTING

An important concept to realize while working with grid systems is that not all grids are created equal. When researching topics such as this many authors do not make any distinction between the various sorts of grid systems and simply refer to Grid systems as a whole. This becomes problematic when we begin to investigate the various subsystems that have become part of the definition of a grid itself.

Data grid systems and computing grids are similar, yet fundamentally different at the most basic level. A data grid system is a system that is in place to allow access to large amounts of data across many networked nodes. On the other hand, computing grids distributed computational processes across many nodes. Both are similar in that they utilize networked nodes to accomplish the system goals, but beyond this, there is not much in common. Computing grids will often utilize data grid systems as a way to store the results returned from nodes once computation is complete.

4. IMPLEMENTATION

There are several current implementations of Grid file systems. GFS is a very new architecture, with the earliest open source prototype of Gfarm released in 2001. A second version of Gfarm was released in March 2005 and a third version is currently under development. Current implementations include Gfarm, the Globus Toolkit, the Lustre File System, Nirvana’s Storage Resource Broker, and the Google file system.

Many different institutions and organizations are currently using Grid file systems. Grid file systems have several advantages over conventional networked file systems. Grids scale very well and it is very easy to add additional file system and client nodes to an existing system. It is also possible to combine Grid file systems and Grid computing systems to have client nodes server both types of systems. Users are able to access the file system without being concerned with the physical location of a file and all users have access to the same resources, regardless of where they are physically located in the world.

Some of the advantages of grid file systems are that they are large-scale systems, they eliminate the high cost associated with data servers, and are designed to be extremely reliable. Due to the scalability of these systems, more than 10,000 clients and servers are able to connect, manipulate, and access the file system. Expensive data servers are no longer required since regular workstation machines can be utilized due to the redundancy built into these systems. Fault tolerance is a major goal of grids and as such the loss of multiple nodes, or even entire datacenters, will not affect the system as a whole. By spreading nodes throughout different geographical locations, an entire location could be taken offline and the end-user would not lose any data or have any problem accessing their data.

A few disadvantages to storing data in a Grid file system are electrical consumption and limitations to reliability and scalability. Electrical consumption in large-scale implementations can become very costly, however, when weighed against the advantages of such a system it is generally accepted. Many implementations of GFS use only a single master file server, which creates a central point of failure that limits both reliability and scalability. In the event that this central server goes down or is utilized to the point where it can no longer respond to requests in a timely fashion the entire system may become unavailable to the end-user. A solution to this is to setup synchronized master file servers with failover, however this solution is not currently implemented in any of the open source systems that we have found.

When designing or implementing a Grid File System there are three criteria that must be met, according to Ian Foster in “What is the Grid? A Three Point Checklist” (http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf):

Coordinates resources that are not subject to centralized control
Using standard, open, general-purpose protocols and interfaces
To deliver nontrivial qualities of service

5. GFARM

Gfarm is a reference implementation of the Grid Datafarm architecture. It provides several services as a means to access the file system, including Samba, GridFTP, and NFS. A Gfarm metadata server is utilized to store locations of files across the various fileserver nodes and computer nodes can support up to a petabyte of storage, depending on the operating system that the nodes are operating on. The Gfarm file system daemon (gfsd) is used to facilitate remote file operations such as creation, deletion, retrieval, and editing of files stored across nodes.

The Grid Datafarm architecture is based on four primary ideas: global petascale data-intensive computing, global parallel processing, scalable parallel processing, and scalable I/O bandwidth. This architecture allows processing of large amounts of data at multiple regional clusters, enables high-speed access to data using file access locality, and fault tolerance of hard disks and networks are resolved through data replication across multiple nodes.

The Grid Datafarm architecture was first tested in the SC2002 grid experiment. This experiment involved seven grids between the United States and Japan. SC2002 was able to store up to eighteen terabytes of data and had a maximum access rate up 6,600 megabytes per second. The grid also had a computing power of 962 Gigaflops, which is about twice as fast as an SR8000 super computer.

6. SECURITY CONCERNS

Gfarm is reliant upon many different protocols and software, and as such, has a broad range of potential security problems that could occur. We will be investigating and testing every aspect of this system to begin to evaluate Gfarm as a system. Gfarm was created with performance in mind, and not security. This is a problem with many systems and is not unique to grid systems at all. Performance must always be weighed against security, and more often that not performance seems to win out during the development phase and security is added in after initial implementation and deployment of a system.

The areas that we will be evaluating include support software, underlying operating systems, and the network protocols that are utilized by Gfarm. The support software utilized by Gfarm is OpenLDAP and OpenSSL. Current implementations of Gfarm require older versions of these software packages and as such do not have the latest security patches applied to them. UNIX and Linux operating systems are used for server operations, primarily Fedora, RedHat, Debian, Solaris 9, FreeBSD, and NetBSD. On the client side just about any operating system can be used, however the ones that we will be investigating are Linux, UNIX, Microsoft Windows, and Macintosh OS X. Rather than investigate these systems as a whole, we will only be investigating the client applications used within these operating systems. Network protocol security concerns are becoming rarer as time progresses; however, we do not want to simply dismiss the protocols being utilized as “good enough”. TCP/IP and UDP are used by Gfarm and we will be taking a closer look at how exactly Gfarm uses these protocols and what, if any, encryption is being used to safeguard traffic across these channels.

8. REFERENCES

[1] Gfarm Datafarm: Development, http://datafarm.apgrid.org

[2] GLOBUS Alliance, http://www.globus.org

[3] Gfarm File System Wiki, http://www.hpcc.nectec.or.th/wiki/index.php/Gfarm_files_system

[4] Ian Foster, “What is the Grid? A Three Point Checklist”, http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf