- 22 Feb, 2024 1 commit
-
-
Ameer Hamza authored
Signed-off-by:
Ameer Hamza <ahamza@ixsystems.com>
-
- 16 Feb, 2024 2 commits
-
-
Brian Behlendorf authored
On Linux block devices used for vdevs will by partitioned. The block device must be large enough for an 64M partition starting at offset of 2048 sectors (part1), and a second 64M reserved partition at the end of the device (part9). This commit adds a capacity check when creating the GPT label to immediately detect a device which is too small. With the existing code this would be caught slightly latter when attempting to use the partition. Catching it sooner let's us print a more useful error. Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #15898
-
Tony Hutter authored
Skip cross filesystem block cloning tests on FreeBSD if running less than version 14.0. Cross filesystem copy_file_range() was added in FreeBSD 14. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Tony Hutter <hutter2@llnl.gov> Closes #15901
-
- 15 Feb, 2024 15 commits
-
-
Rob Norris authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Most values in zio_checksum can never be used for dedup, partly because the dedup= property only offers a limited list, but also some values (eg ZIO_CHECKSUM_OFF) aren't real and will never be seen. A true flag would be better than a hardcoded list, but thats more cleanup elsewhere than I want to do right now. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Only a single bit is needed to track entry state, and definitely not two whole bytes. Some light refactoring in ddt_lookup() is needed to support this, but it reads a lot better now. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Nothing uses it. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Store objects store keys and values, so have them take those types and nothing more. This way, they don't need to be concerned about the "kind" of entry being operated on; the dispatch layer can take care of the appropriate conversions. This adds a "contains" op to see if a particular entry exists without loading it, which makes a couple of things easier to do; in particular, it allows us to avoid an allocation in ddt_class_contains(). Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
ddt_get_dedup_histogram() was actually checking it, just in an extremely cursed way. ddt_get_dedup_object_stats() wasn't, but wasn't being called from a dangerous place so no one noticed. These checks are necessary, because spa_ddt[] is not populated until spa_load(), but the spa can exist before that, while being created, and as vdevs and metaslabs are initialised the space accounting functions will be called to update pool space counts. Probably the whole create path doesn't need to go asking for space accounting from metadata subsystems until after the pool is created. This will at least catch misuse. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Things get confused when there's more than one name for a thing. Note that we don't do this for ddt_object_t, ddt_histogram_t and ddt_stat_t because they're part of the public ZFS interface. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Mostly for consistency, so the reader is less likely to wonder why these things look different. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Just to make it easier to know which bits to pay attention to. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
It was a weird and confusing name, because it wasn't actually returning the number of DVAs in the entry (as in, in the value/phys part) but the maximum number of possible DVAs in a BP generated from the entry, based on the encrypt bit in the key. This is unlike the similarly named BP_GET_NDVAS, which really does return the number of DVAs. Since its only used in this one place, and for a specific purpose, it seemed more sensible to just write it in-place and remove the name. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
We want to add other kinds of dedup-related objects and keep stats for them. This makes those functions easier to use from outside ddt.c. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
We're about to have different kinds of things that we'll compare on key, so generalise this function to support that. (It actually worked fine because of the way the casts work out, but it requires the key to be at the start of the object so the cast through ddt_entry_t works, and even then it reads strangely for anything that's not a ddt_entry_t). Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Always do them on the heap, and when we know how much we need, only that much. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
I think I can say with some confidence that anyone making a new storage type in 2023 is doing their own thing with compression, not this. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
Rob Norris authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887
-
- 13 Feb, 2024 2 commits
-
-
Alexander Motin authored
This changes taskq_thread_should_stop() to limit maximum exit rate for idle threads to one per 5 seconds. I believe the previous one was broken, not allowing any thread exits for tasks arriving more than one at a time and so completing while others are running. Also while there: - Remove taskq_thread_spawn() calls on task allocation errors. - Remove extra taskq_thread_should_stop() call. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Rich Ercolani <rincebrain@gmail.com> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15873
-
Bi11 authored
Fix a misreport in 'zdb -d' where it falsely marked BRT objects as leaked. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Yuxin Wang <yuxinwang9999@gmail.com> Closes #15882
-
- 12 Feb, 2024 1 commit
-
-
Bi11 authored
Similar to deduplication, the size of data duplicated by block cloning should not be included in the slop space calculation. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Yuxin Wang <yuxinwang9999@gmail.com> Closes #15874
-
- 09 Feb, 2024 1 commit
-
-
Kevin Greene authored
Reviewed-by:
John Wren Kennedy <john.kennedy@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Kevin Greene <kevin.greene@delphix.com> Closes #15868
-
- 08 Feb, 2024 3 commits
-
-
Shawn Bayern authored
Fixes a small inaccuracy in the description of snapshot atomicity zfs-snapshot(8) appears to contain a small error. The existing version reads "Snapshots are taken atomically, so that all snapshots correspond to the same moment in time." Per zfs_main.c, which in do_snapshot() simply loops over argv, this does not appear to be correct when multiple snapshots are specified explicitly on the command line. I believe the intent of the man page was to say that *recursive* snapshots are all created atomically. This proposed change fixes that error. Because the existing statement may confuse some readers anyway, the commit also also adds a small amount of general explanatory information that may be helpful. The change also adds an introductory sentence that summarizes what 'zfs snapshot' does in the first place. In that sentence, the text "different datasets" is intended to indicate that (again per the code) the same dataset cannot be specified multiple times on the command line. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Shawn Bayern <sbayern@law.fsu.edu> Closes #15857
-
Rob N authored
Because "filesystem" and "volume" are just too long! Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <robn@despairlabs.com> Closes #15864
-
Don Brady authored
Slow disk response times can be indicative of a failing drive. ZFS currently tracks slow I/Os (slower than zio_slow_io_ms) and generates events (ereport.fs.zfs.delay). However, no action is taken by ZED, like is done for checksum or I/O errors. This change adds slow disk diagnosis to ZED which is opt-in using new VDEV properties: VDEV_PROP_SLOW_IO_N VDEV_PROP_SLOW_IO_T If multiple VDEVs in a pool are undergoing slow I/Os, then it skips the zpool_vdev_degrade(). Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Reviewed-by:
Allan Jude <allan@klarasystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by:
Rob Wing <rob.wing@klarasystems.com> Signed-off-by:
Don Brady <don.brady@klarasystems.com> Closes #15469
-
- 07 Feb, 2024 2 commits
-
-
the-Chain-Warden-thresh authored
CVE-2020-24370 is a security vulnerability in lua. Although the CVE description in CVE-2020-24370 said that this CVE only affected lua 5.4.0, according to lua this CVE actually existed since lua 5.2. The root cause of this CVE is the negation overflow that occurs when you try to take the negative of 0x80000000. Thus, this CVE also exists in openzfs. Try to backport the fix to the lua in openzfs since the original fix is for 5.4 and several functions have been changed. https://github.com/advisories/GHSA-gfr4-c37g-mm3v https://nvd.nist.gov/vuln/detail/CVE-2020-24370 https://www.lua.org/bugs.html#5.4.0-11 https://github.com/lua/lua/commit/a585eae6e7ada1ca9271607a4f48dfb1786 Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
ChenHao Lu <18302010006@fudan.edu.cn> Closes #15847
-
Cameron Harr authored
When very large pools are present, it can be laborious to find reasons for why a pool is degraded and/or where an unhealthy vdev is. This option filters out vdevs that are ONLINE and with no errors to make it easier to see where the issues are. Root and parents of unhealthy vdevs will always be printed. Testing: ZFS errors and drive failures for multiple vdevs were simulated with zinject. Sample vdev listings with '-e' option - All vdevs healthy NAME STATE READ WRITE CKSUM iron5 ONLINE 0 0 0 - ZFS errors NAME STATE READ WRITE CKSUM iron5 ONLINE 0 0 0 raidz2-5 ONLINE 1 0 0 L23 ONLINE 1 0 0 L24 ONLINE 1 0 0 L37 ONLINE 1 0 0 - Vdev faulted NAME STATE READ WRITE CKSUM iron5 DEGRADED 0 0 0 raidz2-6 DEGRADED 0 0 0 L67 FAULTED 0 0 0 too many errors - Vdev faults and data errors NAME STATE READ WRITE CKSUM iron5 DEGRADED 0 0 0 raidz2-1 DEGRADED 0 0 0 L2 FAULTED 0 0 0 too many errors raidz2-5 ONLINE 1 0 0 L23 ONLINE 1 0 0 L24 ONLINE 1 0 0 L37 ONLINE 1 0 0 raidz2-6 DEGRADED 0 0 0 L67 FAULTED 0 0 0 too many errors - Vdev missing NAME STATE READ WRITE CKSUM iron5 DEGRADED 0 0 0 raidz2-6 DEGRADED 0 0 0 L67 UNAVAIL 3 1 0 - Slow devices when -s provided with -e NAME STATE READ WRITE CKSUM SLOW iron5 DEGRADED 0 0 0 - raidz2-5 DEGRADED 0 0 0 - L10 FAULTED 0 0 0 0 external device fault L51 ONLINE 0 0 0 14 Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Cameron Harr <harr1@llnl.gov> Closes #15769
-
- 06 Feb, 2024 1 commit
-
-
Brian Behlendorf authored
On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls are expected to either fully clone the specified range or return an error. The range may be for an entire file. While internally ZFS supports cloning partial ranges there's no way to return the length cloned to the caller so we need to make this all or nothing. As part of this change support for the REMAP_FILE_CAN_SHORTEN flag has been added. When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range() will return a shortened range when encountering pending dirty records. When it's clear zfs_clone_range() will block and wait for the records to be written out allowing the blocks to be cloned. Furthermore, the file range lock is held over the region being cloned to prevent it from being modified while cloning. This doesn't quite provide an atomic semantics since if an error is encountered only a portion of the range may be cloned. This will be converted to an error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the caller. However, the destination file range is left in an undefined state. A test case has been added which exercises this functionality by verifying that `cp --reflink=never|auto|always` works correctly. Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #15728 Closes #15842
-
- 05 Feb, 2024 1 commit
-
-
Rich Ercolani authored
Step 1 in trying to slowly rip the zdb functions out of zdb.c to allow people to play with more flexible things to leverage zdb's functionality. No promises on any functions or structs being stable, now or probably in general unless someone builds a more polished abstraction, the goal at the moment is to slowly untangle the global state usage in zdb... Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rich Ercolani <rincebrain@gmail.com> Closes #15804
-
- 02 Feb, 2024 1 commit
-
-
Umer Saleem authored
On Linux, ZFS uses blkdev_issue_discard in vdev_disk_io_trim to issue trim command which is synchronous. This commit updates vdev_disk_io_trim to use __blkdev_issue_discard, which is asynchronous. Unfortunately there isn't any asynchronous version for blkdev_issue_secure_erase, so performance of secure trim will still suffer. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Umer Saleem <usaleem@ixsystems.com> Closes #15843
-
- 29 Jan, 2024 10 commits
-
-
Rob Norris authored
struct mnt_idmap no longer has a struct user_namespace within it. Work around this by creating a temporary with the copy of the map we need taken from the idmap. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by:
Youzhong Yang <yyang@mathworks.com> Signed-off-by:
Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #15805
-
Rob Norris authored
The name inode_permission is now defined in the kernel. Rename ours to test_permission, in line with most of our other tests. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #15805
-
Rob Norris authored
MAX_ORDER has been renamed to MAX_PAGE_ORDER. Rather than just redefining it, instead define our own name and set it consistently from the start. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #15805
-
Rob Norris authored
Linux has removed strlcpy in favour of strscpy. This implements a fallback implementation of strlcpy for this case. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #15805
-
Rob Norris authored
blkdev_get_by_path() and blkdev_put() have been replaced by bdev_open_by_path() and bdev_release(), which return a "handle" object with the bdev object itself inside. This adds detection for the new functions, and macros to handle the old and new forms consistently. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #15805
-
Rob Norris authored
The kernel is now being compiled with -Wmissing-prototypes. Most of our test stub functions had no prototype, and failed to compile. Since they don't need to be visible anywhere else, just make them all static. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #15805
-
Brian Behlendorf authored
Update the META file to reflect compatibility with the 6.7 kernel. Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #15833
-
Paul Dagnelie authored
During device removal stress tests, we noticed that we were tripping the assertion that mg_initialized was true. After investigation, it was determined that the mg in question was the embedded log metaslab group for a newly added vdev; the normal mg had been initialized (by metaslab_sync_reassess, via vdev_sync_done). However, because the spa config alloc lock is not held as writer across both calls to metaslab_sync_reassess, it is possible for an allocation to happen between the two metaslab_groups being initialized. Because the metaslab code doesn't check the group in question, just the vdev's main mg, it is possible to get past the initial check in vdev_allocatable and later fail due to the assertion. We simply remove the assertions. We could also consider locking the ALLOC lock around the reassess calls in vdev_sync_done, but that risks deadlocks. We could check the actual target mg in vdev_allocatable, but that risks racing with a passivation that comes in after that check but before the assertion. We still won't be able to actually allocate from the metaslab group if no metaslabs are ready, so this change shouldn't break anything. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
George Wilson <george.wilson@delphix.com> Signed-off-by:
Paul Dagnelie <pcd@delphix.com> Closes #15818
-
Richard Kojedzinszky authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by:
Richard Kojedzinszky <richard@kojedz.in> Closes #15793
-
Richard Kojedzinszky authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by:
Richard Kojedzinszky <richard@kojedz.in> Closes #15793
-