- 26 Jul, 2023 3 commits
-
-
Rob Norris authored
Just silencing the warning about large allocations. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Kay Pedersen <mail@mkwg.de> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Closes #15050
-
Brian Behlendorf authored
For large JBODs the log message "zfs_iter_vdev: no match" can account for the bulk of the log messages (over 70%). Since this message is purely informational and not that useful we remove it. Reviewed-by:
Olaf Faaland <faaland1@llnl.gov> Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #15086 Closes #15094
-
Brian Behlendorf authored
Update the META file to reflect compatibility with the 6.4 kernel. Reviewed-by:
George Melikov <mail@gmelikov.ru> Reviewed-by:
Rob Norris <rob.norris@klarasystems.com> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #15095
-
- 25 Jul, 2023 3 commits
-
-
Alexander Motin authored
This locking was recently added as part of #14979. But appears it is illegal to take zl_issuer_lock while holding dp_config_rwlock, taken by dsl_pool_hold(). It causes deadlock with sync thread in spa_sync_upgrades(). On a second thought, we should not need this locking, since zil_commit_impl() we call below takes zl_issuer_lock, that should sufficiently protect zl_suspend reads, combined with other logic from #14979. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15103
-
Alexander Motin authored
When we have some LWBs closed and their ZIOs ready to be issued, we can not afford sleeping on config lock if somebody else try to lock it as writer, or it will cause a deadlock. To solve it, move spa_config_enter() from zil_lwb_write_issue() to zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering with other threads. Now if we can't immediately lock config, issue all previously closed LWBs so that they could drop their config locks after completion, and only then allow sleeping on our lock. Reviewed-by:
Mark Maybee <mark.maybee@delphix.com> Reviewed-by:
Prakash Surya <prakash.surya@delphix.com> Reviewed-by:
George Wilson <george.wilson@delphix.com> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15078 Closes #15080
-
Umer Saleem authored
This commit updates changelog for native Debian packages for OpenZFS 2.2.0 release. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Umer Saleem <usaleem@ixsystems.com> Closes #15104
-
- 21 Jul, 2023 19 commits
-
-
Brian Behlendorf authored
Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov>
-
Rob N authored
This new check in 0.9.0 appears to have some issues with various forms of "early return", like trap, exit and return. This is tripping up (at least): cmd/zed/zed.d/history_event-zfs-list-cacher.sh /etc/zfs/zfs-functions Its not obvious what its complaining about or what the remedy is, so it seems sensible to disable this check for now. See also: https://www.shellcheck.net/wiki/SC2317 https://github.com/koalaman/shellcheck/issues/2542 https://github.com/koalaman/shellcheck/issues/2613 Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <robn@despairlabs.com> Closes #15089
-
Rob N authored
metaslab_force_ganging isn't enough to actually force ganging, because it still only forces 3% of the time. This adds metaslab_force_ganging_pct so we can configure how often to force ganging. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #15088
-
Alexander Motin authored
- Reduce maximum prefetch distance for 32bit platforms to 8MB as it was previously. Those systems didn't grow much probably, so better stay conservative there. - Retire array_rd_sz tunable, blocking prefetch for large requests. We should not penalize applications trying to be more efficient. The speculative prefetcher by itself has reasonable distance limits, and 1MB is not much at all these days. Reviewed-by:
Allan Jude <allan@klarasystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15072
-
Alexander Motin authored
To simplify error handling bpobj_iterate_blkptrs() iterates through the list of block pointers backwards. Unfortunately speculative prefetcher is currently unable to detect such patterns, that makes each block read there synchronous and very slow on HDD pools. According to my tests, added explicit prefetch reduces time needed to asynchronously delete 8 snapshots of 4 million blocks each from 20 seconds to less than one, that should free sync thread for other useful work, such as async writes, scrub, etc. While there, plug one memory leak in case of bpobj_open() error and harmonize some variable names. Reviewed-by:
Allan Jude <allan@klarasystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15071
-
Alan Somers authored
With anything but fletcher-4, even a tiny change in the input will cause the checksum value to change completely. So knowing the actual and expected checksums doesn't provide much more information than "they don't match". The harm in sending them is simply that they bloat the event. In particular, on FreeBSD the event must fit into a 1016 byte buffer. Fixes #14717 for mirrored pools. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Rich Ercolani <rincebrain@gmail.com> Signed-off-by:
Alan Somers <asomers@gmail.com> Sponsored-by: Axcient Closes #14717 Closes #15052
-
Alan Somers authored
The checksum histograms were intended to be used with ATA and parallel SCSI, which are obsolete. With modern storage hardware, they will almost always look like white noise; all bits will be wrong. They only serve to bloat the event. That's a particular problem on FreeBSD, where events must fit into a 1016 byte buffer. This fixes issue #14717 for RAIDZ pools, but not for mirror pools. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Rich Ercolani <rincebrain@gmail.com> Signed-off-by:
Alan Somers <asomers@gmail.com> Sponsored-by: Axcient Closes #15052
-
Tony Hutter authored
We would see zed assert on one of our systems if we powered off a slot. Further examination showed zfs_retire_recv() was reporting a GUID of 0, which in turn would return a NULL nvlist. Add in a check for a zero GUID. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Tony Hutter <hutter2@llnl.gov> Closes #15084
-
Chunwei Chen authored
We cannot call zpl_enter in zpl_test_super, because zpl_test_super is under spinlock so we can't sleep, and also because zpl_test_super is called without sb->s_umount taken, so it's possible we would race with zfs_umount and call zpl_enter on freed zfsvfs. Here's an stack trace when this happens: [ 2379.114837] VERIFY(cvp->cv_magic == CV_MAGIC) failed [ 2379.114845] PANIC at spl-condvar.c:497:__cv_broadcast() [ 2379.114854] Kernel panic - not syncing: VERIFY(cvp->cv_magic == CV_MAGIC) failed [ 2379.115012] Call Trace: [ 2379.115019] dump_stack+0x74/0x96 [ 2379.115024] panic+0x114/0x2f6 [ 2379.115035] spl_panic+0xcf/0xfc [spl] [ 2379.115477] __cv_broadcast+0x68/0xa0 [spl] [ 2379.115585] rrw_exit+0xb8/0x310 [zfs] [ 2379.115696] rrm_exit+0x4a/0x80 [zfs] [ 2379.115808] zpl_test_super+0xa9/0xd0 [zfs] [ 2379.115920] sget+0xd1/0x230 [ 2379.116033] zpl_mount+0xdc/0x230 [zfs] [ 2379.116037] legacy_get_tree+0x28/0x50 [ 2379.116039] vfs_get_tree+0x27/0xc0 [ 2379.116045] path_mount+0x2fe/0xa70 [ 2379.116048] do_mount+0x80/0xa0 [ 2379.116050] __x64_sys_mount+0x8b/0xe0 [ 2379.116052] do_syscall_64+0x35/0x50 [ 2379.116054] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 2379.116057] RIP: 0033:0x7f9912e8b26a Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Chunwei Chen <david.chen@nutanix.com> Closes #15077
-
Ameer Hamza authored
Since spa_min_alloc may not be a power of 2, unlike ashifts, in the case of DRAID, we should not select the minimal value among several vdevs. Rounding to a multiple of it is unlikely to work for other vdevs. Instead, using the greatest common divisor produces smaller yet more reasonable results. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Ameer Hamza <ahamza@ixsystems.com> Closes #15067
-
Yuri Pankov authored
Check that vdev has valid zap and bail out early. While here, move objid selection out of the loop, it's not going to change. Reviewed-by:
Allan Jude <allan@klarasystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Yuri Pankov <yuripv@FreeBSD.org> Closes #15063
-
Ameer Hamza authored
Ashift can be set for a vdev only during its creation, and the top-level vdev does not change when a vdev is attached or replaced. The ashift property should not be used during attachment, as it does not allow attaching/replacing a vdev if the pool's ashift property is increased after the existing vdev was created. Instead, we should be able to attach the vdev if the attached vdev can satisfy the ashift requirement with its parent. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Ameer Hamza <ahamza@ixsystems.com> Closes #15061
-
Wojciech Małota-Wójcik authored
On my machines I observe random failures caused by rollback happening after zfs root is mounted. I've observed two types of failures: - zfs-rollback-bootfs.service fails saying that rollback must be done just before mounting the dataset - boot process fails and rescue console is entered. After making this modification and testing it for couple of days none of those problems have been observed anymore. I don't know if `dracut-mount.service` is still needed in the `After` directive. Maybe someone else is able to address this? Reviewed-by:
Gregory Bartholomew <gregory.lee.bartholomew@gmail.com> Signed-off-by:
Wojciech Małota-Wójcik <59281144+outofforest@users.noreply.github.com> Closes #15025
-
Alexander Motin authored
Set ARC_FLAG_NO_BUF when prefetching data L1 buffers for scan. We do not prefetch data L0 buffers, so we do not need the L1 buffers, only want them to be ready in ARC. This saves some CPU time on the buffers decompression. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15029
-
Coleman Kane authored
The disk_check_media_change() function was added which replaces bdev_check_media_change. This change was introduced in 6.5rc1 444aa2c58cb3b6cfe3b7cc7db6c294d73393a894 and the new function takes a gendisk* as its argument, no longer a block_device*. Thus, bdev->bd_disk is now used to pass the expected data. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Coleman Kane <ckane@colemankane.org> Closes #15060
-
Coleman Kane authored
This change was introduced in Linux commit 7ba150834b840f6f5cdd07ca69a4ccf39df59a66 Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Coleman Kane <ckane@colemankane.org> Closes #15059
-
Coleman Kane authored
Make the version here match that elsewhere in the kernel and system headers. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Coleman Kane <ckane@colemankane.org> Closes #15058 Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov>
-
Yuri Pankov authored
As it turns out having autotrim default to 'on' on FreeBSD never really worked due to mess with defines where userland and kernel module were getting different default values (userland was defaulting to 'off', module was thinking it's 'on'). Reviewed-by:
Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Yuri Pankov <yuripv@FreeBSD.org> Closes #15079
-
Alan Somers authored
My analysis in PR #14716 was incorrect. Each histogram bucket contains the number of incorrect bits, by position in a 64-bit word, over the entire record. 8-bit buckets can overflow for record sizes above 2k. To forestall that, saturate each bucket at 255. That should still get the point across: either all bits are equally wrong, or just a couple are. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alan Somers <asomers@gmail.com> Sponsored-by: Axcient Closes #15049
-
- 20 Jul, 2023 5 commits
-
-
Alexander Motin authored
Unlike regular receive, raw receive require destination to have the same block structure as the source. In case of dnode reclaim this triggers two special cases, requiring special handling: - If dn_nlevels == 1, we can change the ibs, but dnode_set_blksz() should not dirty the data buffer if block size does not change, or durign receive dbuf_dirty_lightweight() will trigger assertion. - If dn_nlevels > 1, we just can't change the ibs, dnode_set_blksz() would fail and receive_object would trigger assertion, so we should destroy and recreate the dnode from scratch. Reviewed-by:
Paul Dagnelie <pcd@delphix.com> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15039 (cherry picked from commit c4e87421)
-
Alexander Motin authored
Since we are already iterating the ZAP, we have exact string key to remove, we do not need to call zap_remove_int() with the int key we just converted, we can call zap_remove() for the original string. This should make no functional change, only a micro-optimization. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15056 (cherry picked from commit fdba8cbb)
-
Alexander Motin authored
It seems 9c5167d1 "Project Quota on ZFS" missed to add prefetch for DMU_PROJECTUSED_OBJECT during scan (scrub/resilver). It should not cause visible problems, but may affect scub/resilver performance. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15024
-
Mateusz Guzik authored
Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Mateusz Guzik <mjguzik@gmail.com> Closes #15036
-
Alexander Motin authored
Starting approximately from version 1302506 vn_lock_pair() grown two additional arguments following head. There is a one week hole, but that is closet reference point we have. Reviewed-by:
Mateusz Guzik <mjguzik@gmail.com> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15047
-
- 30 Jun, 2023 10 commits
-
-
Brian Behlendorf authored
New features: - Fully adaptive ARC eviction (#14359) - Block cloning (#13392) - Scrub error log (#12812, #12355) - Linux container support (#14070, #14097, #12263) - BLAKE3 Checksums (#12918) - Corrective "zfs receive" (#9372) Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov>
-
Prakash Surya authored
The default timeout for ZVOL opens may not be sufficient for all cases, so we should enable the value to be more easily tuned to account for systems where the default value is insufficient. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Signed-off-by:
Prakash Surya <prakash.surya@delphix.com> Closes #15023
-
Brian Behlendorf authored
This reverts commit 77a3bb1f . Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov>
-
Rich Ercolani authored
The DDT is really inefficient on 4k and up vdevs, because it always allocates 4k blocks, and while compression could save us somewhat at ashift 9, that stops being true. So let's change the default to 32 KiB, which seems like a reasonable compromise between improved space savings and inflated write sizes for DDT updates. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rich Ercolani <rincebrain@gmail.com> Closes #14654
-
Rob N authored
The previous comment wondered if this case could happen; it turns out that it really can't. This block can only be entered if dde_type and dde_class are "real"; that only happens when a ddt entry has been previously synced to a ddt store, that is, it was created on a previous txg. Since its gone through that sync, its dde_refcount must be >0. ddt_addref() is called from brt_pending_apply(), which is called at the beginning of spa_sync(), before pending DMU writes/frees are issued. Freeing a dedup block is the only thing that can decrement dde_refcount, so there's no way for it to drop to zero before applying the clone bumps it. Further, even if it _could_ go to zero, it wouldn't be necessary to fill the entry from the block. The phys content is not cleared until the free is issued, which happens when the refcount goes to zero, when the last real free comes through. The cloned block should be identical to what's in the phys already, so the fill should be a no-op anyway. I've replaced this with an assertion because this is all very dependent on the ordering in which BRT and DDT changes are applied, and that might change in the future. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-By: Klara, Inc. Closes #15004
-
Alexander Motin authored
With zl_suspend read in zil_commit() not protected by any locks it is possible for new ZIL writes to be in progress while zil_destroy() called by zil_suspend() freeing them. This patch closes the race by taking zl_issuer_lock in zil_suspend() and adding the second zl_suspend check to zil_get_commit_list(), protected by the lock. It allows all already queued transactions to be logged normally, while blocks any new ones, calling txg_wait_synced() for the TXGs. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14979
-
Alexander Motin authored
- Pack struct zio_prop by 4 bytes from 84 to 80. - Skip new child ZIO locking while linking to parent. The newly allocated ZIO is not externally visible yet, so nobody should care. - Skip io_bp_copy writes when not used (write && non-debug). Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14985
-
Alexander Motin authored
Scan process may skip blocks based on their birth time, DVA, etc. Traditionally those blocks were accounted as issued, that caused reporting of hugely over-inflated numbers, having nothing to do with actual disk I/O. This change utilizes never used field in struct dsl_scan_phys to account such skipped bytes, allowing to report how much data were actually scrubbed/resilvered and what is the actual I/O speed. While formally it is an on-disk format change, it should be compatible both ways, so should not need a feature flag. This should partially address the same issue as c85ac731 , but from a different perspective, complementing it. Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Akash B <akash-b@hpe.com> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15007
-
Arshad Hussain authored
This patch changes the passing of "size" to snprintf from hard-coded (openended) to sizeof(errbuf). This is bringing to standard with rest of the code where- ever 'errbuf' is used. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes #15003
-
Alexander Motin authored
The previous code was checking zfs_is_namespace_prop() only for the last property on the list. If one was not "namespace", then remount wasn't called. To fix that move zfs_is_namespace_prop() inside the loop and remount if at least one of properties was "namespace". Reviewed-by:
Umer Saleem <usaleem@ixsystems.com> Reviewed-by:
Ameer Hamza <ahamza@ixsystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15000
-