- 10 Jan, 2023 24 commits
-
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Fedor Uporov authored
Add mmp uberblock actualization from scratch object side
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Fedor Uporov authored
Improve comment.
-
Fedor Uporov authored
This reverts commit d99d9a30760dbe75be74717be8ef29e1720ddc0d.
-
Fedor Uporov authored
MMP uberblock could be owerwritten by scratch object if raidz expansion is in progress.
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Matthew Ahrens authored
-
Matthew Ahrens authored
-
Matthew Ahrens authored
-
Matthew Ahrens authored
The "shadow block" repair write was not acutally being executed due to bypassing in lower layers.
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Fedor Uporov authored
-
Matthew Ahrens authored
-
Fedor Uporov authored
Some blocks, which were synced in the same txg as raidz_reflow_complete_sync(), can have incorrect logical width. The increasing of txg value, which was added to expand txgs array, can help in this case.
-
Matthew Ahrens authored
This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Contributions-by:
Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by:
Stuart Maybee <stuart.maybee@comcast.net> Contributions-by:
Thorsten Behrens <tbehrens@outlook.com> Contributions-by:
Fmstrat <nospam@nowsci.com>
-
- 09 Jan, 2023 1 commit
-
-
Matthew Ahrens authored
-
- 06 Jan, 2023 5 commits
-
-
Coleman Kane authored
The bi_rw member of struct bio was renamed to bi_opf in Linux 6.2. As well, Linux's implementation of bio_set_op_attrs(...) has been removed. The HAVE_BIO_BI_OPF macro already appears to be defined, but the removal of the bio_set_op_attrs(...) implementation makes the build fall back on the locally-defined implementation, which isn't updated for the bio->bi_opf change. This commit adds that update. Reviewed-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Signed-off-by:
Coleman Kane <ckane@colemankane.org> Closes #14324 Closes #14331
-
Coleman Kane authored
Linux 6.2 renamed the get_acl() operation to get_inode_acl() in the inode_operations struct. This should fix Issue #14323. Reviewed-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Tony Hutter <hutter2@llnl.gov> Signed-off-by:
Coleman Kane <ckane@colemankane.org> Closes #14323 Closes #14331
-
Antonio Russo authored
Currently, if API tests fail, we either ignore the failures, or unconditionally halt the kernel build. This leads to situations where incompatibilities with existing APIs may develop, but not trip the configure compatibility checks. This introduces a new mechanism to require APIs for kernels above a particular version. While not perfect, this at least guarantees mainline kernels do not break existing APIs without at least providing some warning. Reviewed-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Antonio Russo <aerusso@aerusso.net> Closes #14343
-
Antonio Russo authored
Linux 863f144 modified the .tmpfile interface to pass a struct file, rather than a struct dentry, and expect the tmpfile implementation to open inside of tmpfile(). This patch implements a configuration test that checks for this new API and appropriately sets a HAVE_TMPFILE_DENTRY flag that tracks this old API. Contingent on this flag, the appropriate API is implemented. Reviewed-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Antonio Russo <aerusso@aerusso.net> Closes #14301 Closes #14343
-
Antonio Russo authored
mmapwrite is used during the ZTS to identify issues with mmap-ed files. This helper program exercises this pathway by continuously writing to a file. ee6bf97c modified the writing threads to terminate after a set amount of total data is written. This change allows standard program execution to reach the end of a writer thread without closing the file descriptor, introducing a resource "leak." This patch appeases resource leak analyses by close()-ing the file at the end of the thread. Reviewed-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Antonio Russo <aerusso@aerusso.net> Closes #14353
-
- 05 Jan, 2023 10 commits
-
-
Antonio Russo authored
mmapwrite spawns several threads, all of which perform writes on a file for the purpose of testing the behavior of mmap(2)-ed files. One thread performs an mmap and a write to the beginning of that region, while the others perform regular writes after lseek(2)-ing the end of the file. Because these regular writes are set in a while (1) loop, they will write an unbounded amount of data to disk. The mmap_write_001_pos test script SIGKILLs them after 30 seconds, but on fast testbeds, this may be enough time to exhaust the available space in the filesystem, leading to spurious test failures. Instead, limit the total file size by checking that the lseek return value is no greater than 250 * 1024*1024 bytes, which is less than the default minimum vdev size defined in includes/default.cfg . Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Antonio Russo <aerusso@aerusso.net> Closes #14277 Closes #14345
-
Clemens Lang authored
systemd-ask-password has a default timeout of 90 seconds, which means that dracut will fall back to the rescue shell 4.5 minutes after boot if no password is entered. This is undesirable when combined with, for example, unlocking remotely using dracut-sshd and systemd-tty-ask-password-agent. See also https://github.com/gsauthof/dracut-sshd#timeout and https://bugzilla.redhat.com/show_bug.cgi?id=868421 . Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Clemens Lang <neverpanic@gmail.com> Closes #14341
-
Richard Yao authored
Authored by: Dan McDonald <danmcd@mnx.io> Reviewed by: Patrick Mooney <pmooney@pfmooney.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Joshua M. Clulow <josh@sysmgr.org> Ported-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Illumos-issue: https://www.illumos.org/issues/15286 Illumos-commit: https://github.com/illumos/illumos-gate/commit/f137b22e734e85642da3e56e8b94da3f5f027c73 Porting Notes: The patch in illumos did not have much of a commit message, and did not provide attribution to the reporter, while original patch proposed to OpenZFS did, so I am listing the reporter (myself) and original patch author (also myself) below while including the original commit message with some minor corrections as part of the porting notes: In do_composition(), we have: size = u8_number_of_bytes[*p]; if (size <= 1 || (p + size) > oslast) break; There, we have type promotion from int8_t to size_t, which is unsigned. C will sign extend the value as part of the widening before treating the value as unsigned and the negative values we can counter are error values from U8_ILLEGAL_CHAR and U8_OUT_OF_RANGE_CHAR, which are -1 and -2 respectively. The unsigned versions of these under two's complement are SIZE_MAX and SIZE_MAX-1 respectively. The bounds check is written under the assumption that `size <= 1` does a signed comparison. This is followed by a pointer comparison to see if the string has the correct length, which is fine. A little further down we have: for (i = 0; i < size; i++) tc[i] = *p++; When an error condition is encountered, this will attempt to iterate at least SIZE_MAX-1 times, which will massively overflow the buffer, which is not fine. The kernel will kill the loop as soon as it hits the kernel stack guard on Linux systems built with CONFIG_VMAP_STACK=y, which should be just about all of them. That prevents arbitrary code execution and just about any other bad thing that a black hat attacker might attempt with knowledge of this buffer overflow. Other systems' kernels have mitigations for unbounded in-kernel buffer overflows that will catch this too. Also, the patch in illumos-gate made an effort to fix C style issues that had been fixed in the OpenZFS/ZFSOnLinux repository. Those issues had been mentioned in the email that I originally sent them about this issue. One of the fixes had not been already done, so it is included. Another to collect_a_seq()'s arguments was handled differently in OpenZFS. For the sake of avoiding unnecessary differences, it has been adopted. This has the interesting effect that if you correct the paths in the illumos-gate patch to match the current OpenZFS repository, you can reverse apply it cleanly. Original-patch-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reported-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Co-authored-by:
Dan McDonald <danmcd@mnx.io> Closes #14318 Closes #14342
-
Matthew Ahrens authored
The 22.0 release of the python `packaging` package removed the `LegacyVersion` trait, causing ZFS to no longer compile. This commit replaces the sections of `ax_python_dev.m4` that rely on `LegacyVersion` with updated implementations from the upstream `autoconf-archive`. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Matthew Ahrens <mahrens@delphix.com> Closes #14297
-
Mateusz Guzik authored
Reviewed-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by:
Ryan Moeller <ryan@iXsystems.com> Reviewed-by:
Alexander Motin <mav@FreeBSD.org> Signed-off-by:
Mateusz Guzik <mjguzik@gmail.com> Closes #14328
-
Martin Rüegg authored
Shebang was missing the `!` between `#` and the actual path. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Ryan Moeller <ryan@iXsystems.com> Signed-off-by:
Martin Rüegg <martin.rueegg@metaworx.ch> Closes #14339
-
Martin Rüegg authored
Fix #14338, failing to build deb-utils if existing `$PATH` variable would include a whitespace. Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Ryan Moeller <ryan@iXsystems.com> Signed-off-by:
Martin Rüegg <martin.rueegg@metaworx.ch> Closes #14339
-
Alexander Motin authored
On FreeBSD this reduces this structure size from 64 to 56 bytes. dnode_handle_t respectively reduces from 72 to 64 bytes. It sounds like a waste to need 72 bytes to be able to relocate 808 bytes of dnode_t, which relocation on FreeBSD is not even supported. Reviewed-by:
Ryan Moeller <ryan@iXsystems.com> Reviewed-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #14317
-
Alexander Motin authored
Recent ARC commits added new statistic counters, such as iohits, uncached state, etc. Represent those. Also some of previously reported numbers were confusing or even made no sense. Cleanup and restructure existing reports. Reviewed-by:
Ryan Moeller <ryan@iXsystems.com> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Issue #14115 Issue #14123 Issue #14243 Closes #14320
-
Alexander Motin authored
This saves 40 bytes per full ARC header, reducing it on FreeBSD from 240 to 200 bytes on production bits. Reviewed-by:
Ryan Moeller <ryan@iXsystems.com> Reviewed-by:
Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by:
Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by:
Matthew Ahrens <mahrens@delphix.com> Signed-off-by:
Alexander Motin <mav@FreeBSD.org> Closes #14315
-