Network File System (NFS)
Distributed File System (DFS)
A distributed file system (DFS) is a file system that is spread across multiple servers or locations. It allows clients to access files as if they were stored locally, even though they may be physically located on different servers. DFSs are used to improve scalability, performance, and fault tolerance.
Some popular DFS systems include:
- Network File System (NFS): NFS is a popular DFS that is widely used in Unix and Linux environments. It uses a client-server architecture, where the server exports directories to clients, and the clients can mount the exported directories.
- Hadoop Distributed File System (HDFS): HDFS is a DFS that is designed for large-scale data processing. It uses a master-slave architecture, where the master node manages the metadata for the files, and the slave nodes store the data.
- GlusterFS: GlusterFS is a distributed file system that is built on top of TCP/IP. It uses a distributed hash table (DHT) to distribute the data across multiple servers.
- Ceph: Ceph is a unified distributed storage system that provides object storage, block storage, and file system storage. It uses a RADOS (Reliable Autonomic Distributed Object Store) architecture to distribute the data across multiple servers.
The desirable properties of a DFS include:
- Transparency: The DFS should be transparent to the client, meaning that the client should be able to access files in the DFS as if they were local files.
- Support for concurrent clients: The DFS should be able to support multiple client processes accessing the same file concurrently.
- Replication: The DFS should store multiple copies of the file on different servers to improve fault tolerance and performance.
Additional desirable properties:
- Scalability: The DFS should be able to scale to support a large number of clients and files.
- Security: The DFS should provide security features to protect files from unauthorized access.
- Performance: The DFS should provide good performance for both read and write operations.
Concurrent Accesses in DFS
One-Copy Update Semantics:
- This refers to the idea that when a file is replicated within a Distributed File System, the contents of the file, as visible to clients, should not appear any different than when the file has only one replica.
- In other words, whether a file is replicated or not, clients accessing the file should see consistent and coherent data. This consistency ensures that the behavior of the file system is predictable for users, regardless of the number of replicas.
2. At Most Once Operation vs. At Least Once Operation:
- At most once operation: An at most once operation is an operation that is guaranteed to be executed at most once, even if there are failures in the system. At most once operations are useful for applications where idempotency is important, such as financial applications.
- At least once operation: An at least once operation is an operation that is guaranteed to be executed at least once, even if there are failures in the system. At least once operations are useful for applications where durability is important, such as logging applications.
The choice between “At Most Once” and “At Least Once” semantics depends on the nature of the operation and the desired behavior.
- At Most Once: Suitable for operations where repetition could lead to undesirable consequences, such as duplication of data.
- At Least Once: Appropriate for idempotent operations where repetition does not introduce issues.
Security in DFS
Authentication is the process of verifying that a given user is who they claim to be. This is typically done by requiring the user to provide a username and password, or by using another form of authentication, such as biometric authentication.
Authorization is the process of verifying that a user is allowed to access a particular resource.
- Access control lists (ACLs) are a way of specifying who has access to a particular resource and what type of access they have. ACLs are typically associated with individual resources, such as files and directories.
- Capability lists are a way of specifying what resources a particular user has access to and what type of access they have. Capability lists are typically associated with individual users. Capabilities are indivisible units of authority. Each capability grants the holder permission to perform a specific operation on a specific resource.
Before we jump to NFS, I want to cover another interesting topic: “mount”.
Mount in Unix-like operating systems
A file system is a way of organizing and storing files on a storage device, such as a hard disk or a partition. Common file systems in Unix-like systems include ext4, XFS, and NFS.
In Unix-like operating systems, the term “mount” refers to the process of associating a file system with a particular directory in the system’s directory tree. This allows the files and directories within that file system to be accessed by users and applications as if they were part of the overall directory structure.
A mount point is an existing directory in the file system hierarchy where the contents of a separate file system are attached. When a file system is mounted at a specific mount point, the files and directories in that file system become accessible through that directory.
Once a file system is mounted, you can navigate to the specified mount point and access the files and directories within that file system as if they were part of the local file system. This flexibility allows Unix-like operating systems to manage various storage devices and network resources seamlessly.
For example, if a client mounts the directory /dev/sda1
into the directory /mnt/external/sda1
(may be using sudo mount -t ntfs /mnt/external/sda1
), then the client can access the files inside sda1 directory i.e.,/mnt/external/sda1/foo.txt
.
Network File System (NFS)
The Network File System (NFS) is a distributed file system that allows users to access files over a computer network as if they were local files. It is a client-server architecture, where the server exports directories to clients, and the clients can mount the exported directories.
NFS uses a remote procedure call (RPC) protocol to communicate between the client and server. This makes it relatively easy to implement and deploy.
NFS Client System
The NFS client system is integrated with the kernel (OS). It performs RPCs to the NFS server system for DFS operations, such as reading and writing files, creating and deleting files, and moving and renaming files.
NFS Server System
The NFS server system plays the role of both flat file service and directory service. It allows mounting of files and directories.
Mounting a file or directory means that the client makes the file or directory appear as if it is part of the client’s local file system. For example, if a client mounts the directory /usr/edison/inventions
into the directory /usr/tesla/my_competitors
, then the client can access the file /usr/tesla/my_competitors/foo
as if it were the file /usr/edison/inventions/foo
.
Mounting files and directories does not clone (copy) the files. It simply creates a symbolic link to the files on the server. This means that the client can access the files without having to download them to the client machine.
Virtual File System (VFS) Module
The VFS is a layer of abstraction between the kernel and the file system. It allows processes to access files via file descriptors, just like local Unix files. This makes local and remote files indistinguishable to processes, which gives transparency.
The VFS keeps a data structure for each mounted file system. This data structure contains information about the file system, such as its type and location.
The VFS also keeps a data structure called a v-node for all open files. The v-node is a cache of information about the file, such as its size, permissions, and location. If the file is local, the v-node points to the local disk i-node. If the file is remote, the v-node contains the address of the remote NFS server.
For a given file access, the VFS decides whether to route the request to the local file system or to the NFS client system. This decision is made based on the file handle of the file. The file handle is a unique identifier for a file. It is generated by the NFS server and is returned to the client when the client opens the file. The VFS uses the file handle to identify the file system on which the file is located and to determine which file system operations to perform.
The VFS is a critical component of the Unix operating system. It allows processes to access files in a transparent and efficient manner.
Here is an example of how the VFS works:
- A process opens a file using the open() system call.
- The kernel uses the file path to find the file system that the file is located on.
- The kernel then calls the VFS to open the file.
- The VFS checks to see if the file is local or remote.
- If the file is local, the VFS calls the local file system to open the file.
- If the file is remote, the VFS calls the NFS client system to open the file.
- Once the file is open, the VFS returns a file descriptor to the process.
- The process can then use the file descriptor to read and write to the file.
The VFS hides the details of how the file is located and accessed from the process. This makes the process code more portable and easier to maintain. (v-node → file descriptor → inode). This means that the vnode is at the highest level, followed by the file descriptor, and then the inode.
Server Optimisations
Server caching is one of the big reasons NFS is so fast with reads. Server caching stores some of the recently-accessed blocks (of files and directories) in memory. This means that when a client requests a block that is already in the cache, the server can return the block immediately without having to read it from disk. This can significantly improve the performance of read operations.
Most programs tend to have locality of access. This means that blocks accessed recently will likely be accessed soon in the future. Server caching takes advantage of this locality of access by keeping recently-accessed blocks in memory.
There are two main flavours of writes in NFS:
- Delayed write: With delayed write, the server writes the data to memory and then flushes it to disk every 30 seconds (or via the Unix sync operation). This is faster than write-through, but it is not as consistent. If the server crashes before the data is flushed to disk, the data may be lost.
- Write-through: With write-through, the server writes the data to disk immediately before acknowledging the client request. This is more consistent than delayed write, but it may be slower.
The choice of which write flavour to use depends on the application. Applications that require consistency, such as financial applications, should use write-through. Applications that require speed, such as streaming media applications, may be able to use delayed write.
Other server optimisations in NFS:
- Asynchronous I/O: NFS servers can use asynchronous I/O to improve performance. Asynchronous I/O allows the server to continue processing other requests while it is waiting for I/O operations to complete.
- Multiple threads: NFS servers can use multiple threads to improve performance. This allows the server to handle multiple client requests simultaneously.
- Load balancing: NFS servers can be load balanced to distribute the load across multiple servers. This can improve performance and reliability.
By using server optimizations such as caching, asynchronous I/O, multiple threads, and load balancing, NFS servers can provide high performance and reliability for a variety of applications.
Client Caching
Client caching is similar to server caching in that it stores some of the recently-accessed blocks (of files and directories) in memory. However, there are a few key differences:
- Client caching is performed by the NFS client, while server caching is performed by the NFS server.
- Client caching is used to improve the performance of read operations, while server caching is used to improve the performance of both read and write operations.
- Client caching can lead to inconsistencies, while server caching is always consistent.
Each block in the client cache is tagged with the following information:
- Tc: The time when the cache entry was last validated.
- Tm: The time when the block was last modified at the server.
A cache entry at time T is valid if (T-Tc < t)
where t is the freshness interval. It is a compromise between consistency and efficiency. Sun Solaris sets t adaptively between 3-30 seconds for files and 30-60 seconds for directories.
When a block is written, the client performs a delayed write to the server. This means that the client writes the data to the cache and then sends a write request to the server. The server does not acknowledge the write request until it has flushed the data to disk. This approach is faster than write-through, but it is not as consistent. If the client crashes before the data is flushed to the server, the data may be lost.
Client caching can significantly improve the performance of read operations, especially for applications that access the same files frequently. However, it is important to be aware of the potential for inconsistencies. Applications that require consistency should not rely on client caching.
Here is an example of how client caching can lead to inconsistencies:
- A client opens a file and reads a block into the cache.
- The client modifies the block and writes it back to the cache.
- The client crashes before it can flush the changes to the server.
- Another client opens the same file and reads the block from the cache.
- The second client sees the old version of the block, even though the first client modified the block.
To avoid this type of inconsistency, applications should use write-through semantics or flush the cache before closing a file.
Hope you enjoyed reading. I’m always open to suggestions and new ideas. Please write to me :)