- 21 Feb, 2021 2 commits
-
-
Tony Hutter authored
A multpathed disk will have several 'underlying' paths to the disk. For example, multipath disk 'dm-0' may be made up of paths: /dev/{sda,sdb,sdc,sdd}. On many enclosures those underlying sysfs paths will have a symlink back to their enclosure device entry (like 'enclosure_device0/slot1'). This is used by the statechange-led.sh script to set/clear the fault LED for a disk, and by 'zpool status -c'. However, on some enclosures, those underlying paths may not all have symlinks back to the enclosure device. Maybe only two out of four of them might. This patch updates zfs_get_enclosure_sysfs_path() to favor returning paths that have symlinks back to their enclosure devices, rather than just returning the first path. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Tony Hutter <hutter2@llnl.gov> Closes #11617
-
Brian Atkinson authored
Making uio_impl.h the common header interface between Linux and FreeBSD so both OS's can share a common header file. This also helps reduce code duplication for zfs_uio_t for each OS. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Brian Atkinson <batkinson@lanl.gov> Closes #11622
-
- 20 Feb, 2021 6 commits
-
-
Christian Schwarz authored
I think this is the behavior that most users expect. Future work: have a separate flag, e.g., -O, to specify separate set_global_vars for the zdb child than for the ztest children. Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by:
Christian Schwarz <me@cschwarz.com> Closes #11602
-
Christian Schwarz authored
Without set_global_var() in the child processes the -o option provides little use. Before this change set_global_var() was called as a side-effect of getopt processing which only happens for the parent ztest process. This change limits the set of options that can be set and makes them available to the child through ztest_shared_opts_t. Future work: support arbitrary option count and length. Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by:
Christian Schwarz <me@cschwarz.com> Closes #11602
-
Christian Schwarz authored
Also fixes leak of the dlopen handle in the error case. Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by:
Christian Schwarz <me@cschwarz.com> Closes #11602
-
Christian Schwarz authored
Without this patch I get the error Setting global variables is only supported on little-endian systems when using `zdb -o` on my amd64 machine. Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by:
Christian Schwarz <me@cschwarz.com> Closes #11602
-
Ryan Moeller authored
Add zfs_racct_* interfaces for platform-dependent read/write accounting. Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Ryan Moeller <ryan@iXsystems.com> Closes #11613
-
Don Brady authored
Fix regression seen in issue #11545 where checksum errors where not being counted or showing up in a zpool event. Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Don Brady <don.brady@delphix.com> Closes #11609
-
- 18 Feb, 2021 4 commits
-
-
Mark Johnston authored
First, the crypto request completion handler contains a bug in that it fails to reset fs_done correctly after the request is completed. This is only a problem for asynchronous drivers. Second, some hardware drivers have input constraints which ZFS does not satisfy. For instance, ccp(4) apparently requires the AAD length for AES-GCM to be a multiple of the cipher block size, and with qat(4) the AES-GCM AAD length may not be longer than 240 bytes. FreeBSD's generic crypto framework doesn't have a mechanism to automatically fall back to a software implementation if a hardware driver cannot process a request, and ZFS does not tolerate such errors. The plan is to implement such a fallback mechanism, but with FreeBSD 13.0 approaching we should simply disable the use hardware drivers for now. Reviewed-by:
Ryan Moeller <ryan@iXsystems.com> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Mark Johnston <markj@FreeBSD.org> Closes #11612
-
Andriy Gapon authored
That happens because of an off-by-one mistake. share_mount_one_cb() calls report_mount_progress(current=sm_done) after having incremented sm_done by one. Then report_mount_progress() increments the parameter again. It appears that that logic became obsolete after commit a10d50f9 , parallel zfs mount. On FreeBSD I observe that zfs mount -a -v prints, for example, (null): (201/248) That happens because set_progress_header() is never called. With this change the output becomes correct: Mounting ZFS filesystems: (209/248) Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Andriy Gapon <avg@FreeBSD.org> Closes #11607
-
Ryan Libby authored
Remove function that become unused after refactoring in e2af2acc . Reviewed-by:
Ryan Moeller <ryan@ixsystems.com> Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Ryan Libby <rlibby@FreeBSD.org> Closes #11614
-
Colm authored
Property to allow sets of features to be specified; for compatibility with specific versions / releases / external systems. Influences the behavior of 'zpool upgrade' and 'zpool create'. Initial man page changes and test cases included. Brief synopsis: zpool create -o compatibility=off|legacy|file[,file...] pool vdev... compatibility = off : disable compatibility mode (enable all features) compatibility = legacy : request that no features be enabled compatibility = file[,file...] : read features from specified files. Only features present in *all* files will be enabled on the resulting pool. Filenames may be absolute, or relative to /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc checked first). Only affects zpool create, zpool upgrade and zpool status. ABI changes in libzfs: * New function "zpool_load_compat" to load and parse compat sets. * Add "zpool_compat_status_t" typedef for compatibility parse status. * Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum * Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum An initial set of base compatibility sets are included in cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is modified to install these in $pkgdatadir/compatibility.d and to create symbolic links to a reasonable set of aliases. Reviewed-by: ericloewe Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Richard Laager <rlaager@wiktel.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Colm Buckley <colm@tuatha.org> Closes #11468
-
- 17 Feb, 2021 2 commits
-
-
Brian Behlendorf authored
Rather than conditionally compiling out the edonr code for FreeBSD update zfs_mod_supported_feature() to indicate this feature is unsupported. This ensures that all spa features are defined on every platform, even if they are not supported. Reviewed-by:
Ryan Moeller <ryan@ixsystems.com> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #11605 Issue #11468
-
José Luis Salvador Rufo authored
There are two issues that don't allow ZFS to be compiled using uClibc. `backtrace()`, and `program_invocation_short_name` as a `const`. This patch adds uClibc to the conditionals in the same way there are already for Glibc for `backtrace()`; and removes the external param `program_invocation_short_name` because its only used here for the whole project. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
José Luis Salvador Rufo <salvador.joseluis@gmail.com> Closes #11600
-
- 15 Feb, 2021 1 commit
-
-
Ryan Moeller authored
FreeBSD's zfsd fails to build after e2af2acc due to strict type checking errors from the implicit conversion between bool and boolean_t in the inline predicate definitions in abd.h. Use conditionals to return the correct value type from these functions. Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Igor Kozhukhov <igor@dilos.org> Signed-off-by:
Ryan Moeller <freqlabs@FreeBSD.org> Closes #11592
-
- 10 Feb, 2021 1 commit
-
-
Brian Behlendorf authored
Increase the Linux-Maximum version in the META file to 5.11. All of the required compatibility patches have been merged. Reviewed-by:
George Melikov <mail@gmelikov.ru> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #11586
-
- 09 Feb, 2021 3 commits
-
-
Arshad Hussain authored
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Reviewed-by:
Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by:
Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by:
Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by:
Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes #11526
-
khng300 authored
zfs_znode_update_vfs is a more platform-agnostic name than zfs_inode_update. Besides that, the function's prototype is moved to include/sys/zfs_znode.h as the function is also used in common code. Reviewed-by:
Ryan Moeller <ryan@ixsystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Ka Ho Ng <khng300@gmail.com> Sponsored by: The FreeBSD Foundation Closes #11580
-
Kleber Tarcísio authored
The first time through the loop prevdb and prevhdl are NULL. They are then both set, but only prevdb is checked. Add an ASSERT to make it clear that prevhdl must be set when prevdb is. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Kleber <klebertarcisio@yahoo.com.br> Closes #10754 Closes #11575
-
- 08 Feb, 2021 1 commit
-
-
Antonio Russo authored
3d40b655 refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. 3d40b655 accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Reviewed-by:
Ryan Moeller <ryan@ixsystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Antonio Russo <aerusso@aerusso.net> Closes #11474 Closes #11576
-
- 05 Feb, 2021 2 commits
-
-
наб authored
When all pools are exported ZFS will generate an empty cache file. This will cause the import service to fail, which is sub-optimal, since this means that dracut fails, and it necessary to run `zpool import -a` to boot, delete the file, and regenerate+reinstall the initrd. This resolves the issue by treating an zero-length cache files the same as a missing cache file. This aligns the behavior with that of the `zpool` command itself. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Richard Laager <rlaager@wiktel.com> Signed-off-by:
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11568
-
nssrikanth authored
The pool guid and vdev guid received by zfs_agent_post_event(), which calls zfs_retire_recv(), are normally non-zero. However, later in this same method they may be unconditionally reset to zero by the code which is intended to handle multipath, spare and l2arc vdevs. This will result in the EC_dev_remove not being handled. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>\ Co-authored-by:
Vipin Kumar Verma <vipin.verma@hpe.com> Signed-off-by:
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> Closes #11564
-
- 04 Feb, 2021 1 commit
-
-
Brian Behlendorf authored
Clarify how to include snapshots in the `zpool list` output by referencing the full name of the `listsnapshots` pool property, and the `zpool list -t snapshot` option. Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
George Melikov <mail@gmelikov.ru> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #11562 Closes #11565
-
- 02 Feb, 2021 4 commits
-
-
Christian Schwarz authored
Expand the comments to make it clear exactly what is guaranteed by dmu_tx_assign() and txg_hold_open(). Additionally, update the comment which refers to txg_exit() when it should reference txg_rele_to_sync(). Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Christian Schwarz <me@cschwarz.com> Closes #11521
-
George Melikov authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
George Melikov <mail@gmelikov.ru> Closes #11554
-
George Melikov authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
George Melikov <mail@gmelikov.ru> Closes #11554
-
George Melikov authored
Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
George Melikov <mail@gmelikov.ru> Closes #11554
-
- 30 Jan, 2021 2 commits
-
-
Brian Behlendorf authored
This compatibility code is no longer needed. For it a while iov_iter_init_compat() was used by zfs_uio_prefaultpages() but this code should have been dropped as part of commit 83b91ae1 . Take care of that oversight and remove it. Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Signed-off-by:
Brian Behlendorf <behlendorf1@llnl.gov> Closes #11543
-
Matthew Ahrens authored
ABD's currently track their parent/child relationship. This applies to `abd_get_offset()` and `abd_borrow_buf()`. However, nothing depends on knowing this relationship, it's only used for consistency checks to verify that we are not destroying an ABD that's still in use. When we are creating/destroying ABD's frequently, the performance impact of maintaining these data structures (in particular the atomic increment/decrement operations) can be measurable. This commit removes this verification code on production builds, but keeps it when ZFS_DEBUG is set. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Signed-off-by:
Matthew Ahrens <mahrens@delphix.com> Closes #11535
-
- 29 Jan, 2021 2 commits
-
-
nssrikanth authored
In ZED zfs_retire agent added a check to handle Distributed Spare replacement for Faulted VDEV also. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by:
Vipin Kumar Verma <vipin.verma@hpe.com> Signed-off-by:
Mark Maybee <mark.maybee@hpe.com> Closes #11354 Closes #11355
-
Brian Atkinson authored
I originally applied a fix in #11539 to fix a parent's child references when a gang ABD is free'd. However, I did not take into account abd_gang_add_gang(). We still need to make sure to update the child references in this function as well. In order to resolve this I removed decreasing the gang ABD's size in abd_free_gang() as well as moved back the original placeent of zfs_refcount_remove_many() in abd_free(). Reviewed-by:
Mark Maybee <mark.maybee@delphix.com> Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Brian Atkinson <batkinson@lanl.gov> Closes #11542
-
- 28 Jan, 2021 8 commits
-
-
George Melikov authored
All tests need to be included in the Makefiles. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
George Melikov <mail@gmelikov.ru> Closes #11541
-
Matthew Ahrens authored
`__vdev_disk_physio()` uses `abd_nr_pages_off()` to allocate a bio with a sufficient number of iovec's to process this zio (i.e. `nr_iovecs`/`bi_max_vecs`). If there are not enough iovec's in the bio, then additional bio's will be allocated. However, this is a sub-optimal code path. In particular, it requires several abd calls (to `abd_nr_pages_off()` and `abd_bio_map_off()`) which will have to walk the constituents of the ABD (the pages or the gang children) because they are looking for offsets > 0. For gang ABD's, `abd_nr_pages_off()` returns the number of iovec's needed for the first constituent, rather than the sum of all constituents (within the requested range). This always under-estimates the required number of iovec's, which causes us to always need several bio's. The end result is that `__vdev_disk_physio()` is usually O(n^2) for gang ABD's (and occasionally O(n^3), when more than 16 bio's are needed). This commit fixes `abd_nr_pages_off()`'s handling of gang ABD's, to correctly determine how many iovec's are needed, by adding up the number of iovec's for each of the gang children in the requested range. Reviewed-by:
Mark Maybee <mark.maybee@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Brian Atkinson <batkinson@lanl.gov> Signed-off-by:
Matthew Ahrens <mahrens@delphix.com> Closes #11536
-
George Amanakis authored
If we do not write any buffers to the cache device and the evict hand has not advanced do not update the cache device header. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
George Amanakis <gamanakis@gmail.com> Closes #11522 Closes #11537
-
Brian Atkinson authored
Moving the call to zfs_refcount_remove_many() in abd_free() to be called before any of the ABD free variants are called. This is necessary because abd_free_gang() adjusts the abd_size for the gang ABD. If the parent's child references are removed after free'ing the gang ABD the refcount is not adjusted correctly for the parent's children. I also removed some stray abd_put() in comments and changed abd_free_gang_abd() -> abd_free_gang(). Reviewed-by:
Mark Maybee <mark.maybee@delphix.com> Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Brian Atkinson <batkinson@lanl.gov> Closes #11539
-
Allan Jude authored
While you can use zdb -R poolname vdev:offset:[<lsize>/]<psize>[:flags] to extract individual DVAs from a vdev, it would be handy for be able copy an entire file out of the pool. Given a file or object number, add support to copy the contents to a file. Useful for debugging and recovery. Reviewed-by:
Jorgen Lundman <lundman@lundman.net> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Allan Jude <allan@klarasystems.com> Closes #11027
-
Mark Maybee authored
Before a hash table was added on top of the nvlist code, there were cases where the nvlist allocation was changed from fnvlist_alloc() to nvlist_alloc() to avoid expensive NV_UNIQUE_NAME checks. Now this is no longer necessary. These changes should be reverted to be consistent with other code. There are some cases where this change will also reduce the number of iterations. Reviewed-by:
Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Mark Maybee <mark.maybee@delphix.com> Closes #11464
-
Paul Dagnelie authored
There is a race condition in zfs_zrele_async when we are checking if we would be the one to evict an inode. This can lead to a txg sync deadlock. Instead of calling into iput directly, we attempt to perform the atomic decrement ourselves, unless that would set the i_count value to zero. In that case, we dispatch a call to iput to run later, to prevent a deadlock from occurring. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Signed-off-by:
Paul Dagnelie <pcd@delphix.com> Closes #11527 Closes #11530
-
George Melikov authored
If there is no scsi_debug module, then this test must be skipped, in this case cleanup routine should be prepared for absent pool. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
George Melikov <mail@gmelikov.ru> Closes #11534
-
- 27 Jan, 2021 1 commit
-
-
Alan Somers authored
Need to destroy the pthread mutex created in uu_avl_pool_create. https://svnweb.freebsd.org/base?view=revision&revision=262912 Obtained from: FreeBSD Sponsored by: Spectra Logic Corporation Reviewed-by:
Ryan Moeller <ryan@ixsystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alan Somers <asomers@gmail.com> Closes #11528
-