- 25 Aug, 2023 16 commits
-
-
Ameer Hamza authored
-
Ameer Hamza authored
-
Ameer Hamza authored
-
Tony Hutter authored
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then power off the drive's slot in the enclosure if it becomes FAULTED. This can help silence misbehaving drives. This assumes your drive enclosure fully supports slot power control via sysfs. Reviewed-by: @AllKind Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Tony Hutter <hutter2@llnl.gov> Closes #15200
-
Rob N authored
In 019dea0a we removed the conversion from EAGAIN->EXDEV inside zfs_clone_range(), but forgot to add a test for EAGAIN to the copy_file_range() entry points to trigger fallback to a content copy. This commit fixes that. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <robn@despairlabs.com> Closes #15170 Closes #15172
-
Umer Saleem authored
For Native Debian packaging, zinject binary and man page is packaged in ZFS test package. zinject is not not directly related to ZTS and should be packaged with other utilities, like it is present in zfs_<ver>.rpm/deb packages. This commit moves zinject binary and man page from openzfs-zfs-test to openzfs-zfsutils package. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Ameer Hamza <ahamza@ixsystems.com> Signed-off-by:
Umer Saleem <usaleem@ixsystems.com> Closes #15160
-
Rafael Kitover authored
Support mountpoint=legacy for the root dataset in the dracut zfs support scripts. mountpoint=/ or mountpoint=/sysroot also works. Change zfs-env-bootfs.service to add zfsutil to BOOTFSFLAGS only for root datasets with mountpoint != legacy. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by:
Rafael Kitover <rkitover@gmail.com> Closes #15149
-
oromenahar authored
Return the more descriptive error codes instead of `EXDEV` when the parameters don't match the requirements of the clone function. Updated the comments in `brt.c` accordingly. The first three errors are just invalid parameters, which zfs can not handle. The fourth error indicates that the block which should be cloned is created and cloned or modified in the same transaction group (`txg`). Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Rob Norris <rob.norris@klarasystems.com> Signed-off-by:
Kay Pedersen <mail@mkwg.de> Closes #15148
-
наб authored
POSIX timers target the process, not the thread (as does SIGINFO), so we need to block it in the main thread which will die if interrupted. Ref: https://101010.pl/@ed1conf@bsd.network/110731819189629373 Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Jorgen Lundman <lundman@lundman.net> Signed-off-by:
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #15113
-
Ryan Lahfa authored
When compiling a kernel with bcachefs and zfs, the two macros will collide, making it impossible to have both filesystems. It is sufficient to just undefine the macro before calling it. On why this should be in ZFS rather than bcachefs, currently, bcachefs is not a in-tree filesystem, but, it has a reasonably high chance of getting included soon. This avoids the breakage in ZFS early, this patch may be distributed downstream in NixOS and is already used there. Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Ryan Lahfa <ryan@lahfa.xyz> Closes #15144
-
Mateusz Piotrowski authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Mateusz Piotrowski <0mp@FreeBSD.org> Closes #15141
-
Alexander Motin authored
The previous patch #14841 appeared to have significant flaw, causing deadlocks if zl_get_data callback got blocked waiting for TXG sync. I already handled some of such cases in the original patch, but issue #14982 shown cases that were impossible to solve in that design. This patch fixes the problem by postponing log blocks allocation till the very end, just before the zios issue, leaving nothing blocking after that point to cause deadlocks. Before that point though any sleeps are now allowed, not causing sync thread blockage. This require slightly more complicated lwb state machine to allocate blocks and issue zios in proper order. But with removal of special early issue workarounds the new code is much cleaner now, and should even be more efficient. Since this patch uses null zios between write, I've found that null zios do not wait for logical children ready status in zio_ready(), that makes parent write to proceed prematurely, producing incorrect log blocks. Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children() fixes it. Reviewed-by:
Rob Norris <rob.norris@klarasystems.com> Reviewed-by:
Mark Maybee <mark.maybee@delphix.com> Reviewed-by:
George Wilson <george.wilson@delphix.com> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15122
-
Alexander Motin authored
If we get next block allocation error during log write, we trigger transaction commit. But the block we have just completed is still written and transactions it covers will be acknowledged normally. If after that we ignore the block during replay just because it is the last in the chain, we may not replay some transactions that we have acknowledged as synced, that is not right. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15132
-
Alexander Motin authored
In most cases dmu_sync() works with dirty records directly and does not need actual data. The only exception is dmu_sync_late_arrival(). To save some CPU time use dmu_buf_hold_noread*() in z*_get_data() and explicitly call dbuf_read() in dmu_sync_late_arrival(). There is also a chance that by that time TXG will already be synced and we won't have to do it at all. Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15153
-
Alexander Motin authored
Fastwrite was introduced many years ago to improve ZIL writes spread between multiple top-level vdevs by tracking number of allocated but not written blocks and choosing vdev with smaller count. It suposed to reduce ZIL knowledge about allocation, but actually made ZIL to even more actively report allocation code about the allocations, complicating both ZIL and metaslabs code. On top of that, it seems ZIO_FLAG_FASTWRITE setting in dmu_sync() was lost many years ago, that was one of the declared benefits. Plus introduction of embedded log metaslab class solved another problem with allocation rotor accounting both normal and log allocations, since in most cases those are now in different metaslab classes. After all that, I'd prefer to simplify already too complicated ZIL, ZIO and metaslab code if the benefit of complexity is not obvious. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
George Wilson <george.wilson@delphix.com> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15107
-
Alexander Motin authored
The transaction there does not produce any dirty data or log blocks, so it should not be throttled. All other cases wait for TXG sync, by which time the log block we are writing will be obsolete, so we can skip waiting and just return error here instead. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15096
-
- 09 Aug, 2023 2 commits
-
-
Alexander Motin authored
NAS-123369 / 24.04 / Packaging: Move zinject from openzfs-zfs-test to openzfs-zfsutils package
-
Umer Saleem authored
For Native Debian packaging, zinject binary and man page is packaged in ZFS test package. zinject is not not directly related to ZTS and should be packaged with other utilities, like it is present in zfs_<ver>.rpm/deb packages. This commit moves zinject binary and man page from openzfs-zfs-test to openzfs-zfsutils package. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Ameer Hamza <ahamza@ixsystems.com> Signed-off-by:
Umer Saleem <usaleem@ixsystems.com> Closes #15160
-
- 02 Aug, 2023 6 commits
-
-
Serapheim Dimitropoulos authored
When the vdev properties features was merged an extra check was added in `spa_vdev_remove_top_check()` which checked whether the vdev that we want to remove is already being removed and if so return an EALREADY error. ``` static int spa_vdev_remove_top_check(vdev_t *vd) { ... <snip> ... /* * This device is already being removed */ if (vd->vdev_removing) return (SET_ERROR(EALREADY)); ``` Before that change we'd still fail with an error but it was a more generic one - here is the check that failed later in the same function: ``` /* * There can not be a removal in progress. */ if (spa->spa_removing_phys.sr_state == DSS_SCANNING) return (SET_ERROR(EBUSY)); ``` Changing the error code returned from that function changed the behavior of the removal's library interface exposed to the userland - `spa_vdev_remove()` now returns `EZFS_UNKNOWN` instead of `EZFS_EBUSY` that was returning before. This patch adds logic to make `spa_vdev_remove()` mindful of the new EALREADY code and propagating `EZFS_EBUSY` reverting to the previously established semantics of that function. Reviewed-by:
Mark Maybee <mark.maybee@delphix.com> Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Signed-off-by:
Serapheim Dimitropoulos <serapheim@delphix.com> Closes #15013 Closes #15129
-
наб authored
If looking up a snapdir inode failed, hold pool config – hold the snapshot – get its creation property – release it – release it, then use that as the [amc]time in the allocated inode. If that fails then fall back to current time. No performance impact since this is only done when allocating a new snapdir inode. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #15110 Closes #15117
-
Zach Dykstra authored
glibc includes sys/types.h from stdlib.h. This is not the case for MUSL, so explicitly include it. Fixes usage of uint_t. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Zach Dykstra <dykstra.zachary@gmail.com> Closes #15130
-
oromenahar authored
Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Rob Norris <rob.norris@klarasystems.com> Signed-off-by:
Kay Pedersen <mail@mkwg.de> Closes #15128
-
Rob N authored
Before Linux 5.3, the filesystem's copy_file_range handler had to signal back to the kernel that we can't fulfill the request and it should fallback to a content copy. This is done by returning -EOPNOTSUPP. This commit converts the EXDEV return from zfs_clone_range to EOPNOTSUPP, to force the kernel to fallback for all the valid reasons it might be unable to clone. Without it the copy_file_range() syscall will return EXDEV to userspace, breaking its semantics. Add test for copy_file_range fallbacks. copy_file_range should always fallback to a content copy whenever ZFS can't service the request with cloning. Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <robn@despairlabs.com> Closes #15131
-
Rob N authored
This gives `zdb -b` support for clone blocks. Previously, it didn't know what clones were, so would count their space allocation multiple times and then report leaked space (or, in debug, would assert trying to claim blocks a second time). This commit fixes those bugs, and reports the number of clones and the space "used" (saved) by them. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15123
-
- 31 Jul, 2023 2 commits
-
-
Ameer Hamza authored
Merge after upstream zfs-2.2-release rc3 tag
-
Ameer Hamza authored
Merge after rc3 tag
-
- 27 Jul, 2023 4 commits
-
-
Brian Behlendorf authored
Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov>
-
oromenahar authored
Return the more descriptive EOPNOTSUPP instead of EXDEV when the storage pool doesn't support block cloning. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Rob Norris <rob.norris@klarasystems.com> Signed-off-by:
Kay Pedersen <mail@mkwg.de> Closes #15097
-
Ameer Hamza authored
Truenas/zfs 2.2 release rc3 no tag
-
Ameer Hamza authored
-
- 26 Jul, 2023 10 commits
-
-
Rob Norris authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050 Closes #405 Closes #13349
-
Rob Norris authored
Redhat have backported copy_file_range and clone_file_range to the EL7 kernel using an "extended file operations" wrapper structure. This connects all that up to let cloning work there too. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050
-
Rob Norris authored
Prior to Linux 4.5, the FICLONE etc ioctls were specific to BTRFS, and were implemented as regular filesystem-specific ioctls. This implements those ioctls directly in OpenZFS, allowing cloning to work on older kernels. There's no need to gate these behind version checks; on later kernels Linux will simply never deliver these ioctls, instead calling the approprate VFS op. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050
-
Rob Norris authored
This implements the Linux VFS ops required to service the file copy/clone APIs: .copy_file_range (4.5+) .clone_file_range (4.5-4.19) .dedupe_file_range (4.5-4.19) .remap_file_range (4.20+) Note that dedupe_file_range() and remap_file_range(REMAP_FILE_DEDUP) are hooked up here, but are not implemented yet. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050
-
Rob Norris authored
Block cloning introduced a new state transition from DB_NOFILL to DB_READ. This occurs when a block is cloned and then read on the current txg. In this case, the clone will move the dbuf to DB_NOFILL, and then the read will be issued for the overidden block pointer. If that read is still outstanding when it comes time to write, the dbuf will be in DB_READ, which is not handled by the checks in dbuf_sync_leaf, thus tripping the assertions. This updates those checks to allow DB_READ as a valid state iff the dirty record is for a BRT write and there is a override block pointer. This is a safe situation because the block already exists, so there's nothing that could change from underneath the read. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Original-patch-by:
Kay Pedersen <mail@mkwg.de> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050
-
Rob Norris authored
dbuf_undirty() will (correctly) only removed dirty records for the given (open) txg. If there is a dirty record for an earlier closed txg that has not been synced out yet, then db_dirty_records will still have entries on it, tripping the assertion. Instead, change the assertion to only consider the current txg. To some extent this is redundant, as its really just saying "did dbuf_undirty() work?", but it it doesn't hurt and accurately expresses our expectations. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Original-patch-by:
Kay Pedersen <mail@mkwg.de> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050
-
Rob Norris authored
bv_entcount can be a relatively large allocation (see comment for BRT_RANGESIZE), so get it from the big allocator. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050
-
Rob Norris authored
Just silencing the warning about large allocations. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050
-
Brian Behlendorf authored
For large JBODs the log message "zfs_iter_vdev: no match" can account for the bulk of the log messages (over 70%). Since this message is purely informational and not that useful we remove it. Reviewed-by:
Olaf Faaland <faaland1@llnl.gov> Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #15086 Closes #15094
-
Brian Behlendorf authored
Update the META file to reflect compatibility with the 6.4 kernel. Reviewed-by:
George Melikov <mail@gmelikov.ru> Reviewed-by:
Rob Norris <rob.norris@klarasystems.com> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #15095
-