1. 12 Apr, 2019 1 commit
  2. 11 Apr, 2019 1 commit
    • Alek P's avatar
      Allow zfs-tests to recover from hibernation · b31cf30a
      Alek P authored
      
      When a system sleeps during a zfs-test, the time spent
      hibernating is counted against the test's runtime even
      though the test can't and isn't running.
      This patch tries to detect timeouts due to hibernation and
      reruns tests that timed out due to system sleeping.
      In this version of the patch, the existing behavior of returning
      non-zero when a test was killed is preserved. With this patch applied
      we still return nonzero and we also automatically rerun the test we
      suspect of being killed due to system hibernation.
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Reviewed by: John Kennedy <john.kennedy@delphix.com>
      Signed-off-by: default avatarAlek Pinchuk <apinchuk@datto.com>
      Closes #8575 
      b31cf30a
  3. 10 Apr, 2019 3 commits
  4. 08 Apr, 2019 3 commits
  5. 06 Apr, 2019 1 commit
  6. 05 Apr, 2019 3 commits
  7. 04 Apr, 2019 2 commits
  8. 03 Apr, 2019 1 commit
  9. 02 Apr, 2019 1 commit
  10. 01 Apr, 2019 2 commits
  11. 29 Mar, 2019 4 commits
    • Michael Niewöhner's avatar
      Fix systemd-import services · e03b25a5
      Michael Niewöhner authored
      
      On debian, systemd complains about missing /bin/awk because it
      actually is located at /usr/bin/awk. It is not a good idea to
      hardcode binary paths because different linux distros use different
      paths. According to systemd's man page it is absolutely safe to
      miss paths for binaries located at standard locations (/bin,
      /sbin, /usr/bin, ...).
      
      Further, replace this more or less complicated awk command by
      grep.
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarMichael Niewöhner <foss@mniewoehner.de>
      Issue #8510
      e03b25a5
    • Michael Niewöhner's avatar
      Remove hard dependency on bash · 3b261892
      Michael Niewöhner authored
      
      zfs-import-* services have a hard dependency on bash while not
      everyone has bash installed. At this point /bin/sh is sufficient,
      so use that.
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarMichael Niewöhner <foss@mniewoehner.de>
      Issue #8510
      3b261892
    • Tom Caputi's avatar
      Update raw send documentation · dd29864b
      Tom Caputi authored
      
      This patch simply clarifies some of the limitations related to
      raw sends in the man page. No functional changes.
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Reviewed-by: default avatarJason Cohen <jwittlincohen@gmail.com>
      Signed-off-by: default avatarTom Caputi <tcaputi@datto.com>
      Closes #8503
      Closes #8544 
      dd29864b
    • Brian Behlendorf's avatar
      Add TRIM support · 1b939560
      Brian Behlendorf authored
      
      UNMAP/TRIM support is a frequently-requested feature to help
      prevent performance from degrading on SSDs and on various other
      SAN-like storage back-ends.  By issuing UNMAP/TRIM commands for
      sectors which are no longer allocated the underlying device can
      often more efficiently manage itself.
      
      This TRIM implementation is modeled on the `zpool initialize`
      feature which writes a pattern to all unallocated space in the
      pool.  The new `zpool trim` command uses the same vdev_xlate()
      code to calculate what sectors are unallocated, the same per-
      vdev TRIM thread model and locking, and the same basic CLI for
      a consistent user experience.  The core difference is that
      instead of writing a pattern it will issue UNMAP/TRIM commands
      for those extents.
      
      The zio pipeline was updated to accommodate this by adding a new
      ZIO_TYPE_TRIM type and associated spa taskq.  This new type makes
      is straight forward to add the platform specific TRIM/UNMAP calls
      to vdev_disk.c and vdev_file.c.  These new ZIO_TYPE_TRIM zios are
      handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs.
      This makes it possible to largely avoid changing the pipieline,
      one exception is that TRIM zio's may exceed the 16M block size
      limit since they contain no data.
      
      In addition to the manual `zpool trim` command, a background
      automatic TRIM was added and is controlled by the 'autotrim'
      property.  It relies on the exact same infrastructure as the
      manual TRIM.  However, instead of relying on the extents in a
      metaslab's ms_allocatable range tree, a ms_trim tree is kept
      per metaslab.  When 'autotrim=on', ranges added back to the
      ms_allocatable tree are also added to the ms_free tree.  The
      ms_free tree is then periodically consumed by an autotrim
      thread which systematically walks a top level vdev's metaslabs.
      
      Since the automatic TRIM will skip ranges it considers too small
      there is value in occasionally running a full `zpool trim`.  This
      may occur when the freed blocks are small and not enough time
      was allowed to aggregate them.  An automatic TRIM and a manual
      `zpool trim` may be run concurrently, in which case the automatic
      TRIM will yield to the manual TRIM.
      Reviewed-by: default avatarJorgen Lundman <lundman@lundman.net>
      Reviewed-by: default avatarTim Chase <tim@chase2k.com>
      Reviewed-by: default avatarMatt Ahrens <mahrens@delphix.com>
      Reviewed-by: default avatarGeorge Wilson <george.wilson@delphix.com>
      Reviewed-by: default avatarSerapheim Dimitropoulos <serapheim@delphix.com>
      Contributions-by: default avatarSaso Kiselkov <saso.kiselkov@nexenta.com>
      Contributions-by: default avatarTim Chase <tim@chase2k.com>
      Contributions-by: default avatarChunwei Chen <tuxoko@gmail.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #8419 
      Closes #598 
      1b939560
  12. 28 Mar, 2019 1 commit
  13. 27 Mar, 2019 1 commit
    • Tom Caputi's avatar
      Fix issues with truncated files in raw sends · 5dbf8b4e
      Tom Caputi authored
      
      This patch fixes a few issues with raw receives involving
      truncated files:
      
      * dnode_reallocate() now calls dnode_set_blksz() instead of
        dnode_setdblksz(). This ensures that any remaining dbufs with
        blkid 0 are resized along with their containing dnode upon
        reallocation.
      
      * One of the calls to dmu_free_long_range() in receive_object()
        needs to check that the object it is about to free some contents
        or hasn't been completely removed already by a previous call to
        dmu_free_long_object() in the same function.
      
      * The same call to dmu_free_long_range() in the previous point
        needs to ensure it uses the object's current block size and
        not the new block size. This ensures the blocks of the object
        that are supposed to be freed are completely removed and not
        simply partially zeroed out.
      
      This patch also adds handling for DRR_OBJECT_RANGE records to
      dprintf_drr() for debugging purposes.
      Reviewed-by: default avatarMatt Ahrens <mahrens@delphix.com>
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarTom Caputi <tcaputi@datto.com>
      Closes #7378 
      Closes #8528 
      5dbf8b4e
  14. 26 Mar, 2019 3 commits
  15. 25 Mar, 2019 1 commit
  16. 22 Mar, 2019 3 commits
  17. 21 Mar, 2019 5 commits
    • Olaf Faaland's avatar
      MMP interval and fail_intervals in uberblock · 060f0226
      Olaf Faaland authored
      
      When Multihost is enabled, and a pool is imported, uberblock writes
      include ub_mmp_delay to allow an importing node to calculate the
      duration of an activity test.  This value, is not enough information.
      
      If zfs_multihost_fail_intervals > 0 on the node with the pool imported,
      the safe minimum duration of the activity test is well defined, but does
      not depend on ub_mmp_delay:
      
      zfs_multihost_fail_intervals * zfs_multihost_interval
      
      and if zfs_multihost_fail_intervals == 0 on that node, there is no such
      well defined safe duration, but the importing host cannot tell whether
      mmp_delay is high due to I/O delays, or due to a very large
      zfs_multihost_interval setting on the host which last imported the pool.
      As a result, it may use a far longer period for the activity test than
      is necessary.
      
      This patch renames ub_mmp_sequence to ub_mmp_config and uses it to
      record the zfs_multihost_interval and zfs_multihost_fail_intervals
      values, as well as the mmp sequence.  This allows a shorter activity
      test duration to be calculated by the importing host in most situations.
      These values are also added to the multihost_history kstat records.
      
      It calculates the activity test duration differently depending on
      whether the new fields are present or not; for importing pools with
      only ub_mmp_delay, it uses
      
      (zfs_multihost_interval + ub_mmp_delay) * zfs_multihost_import_intervals
      
      Which results in an activity test duration less sensitive to the leaf
      count.
      
      In addition, it makes a few other improvements:
      * It updates the "sequence" part of ub_mmp_config when MMP writes
        in between syncs occur.  This allows an importing host to detect MMP
        on the remote host sooner, when the pool is idle, as it is not limited
        to the granularity of ub_timestamp (1 second).
      * It issues writes immediately when zfs_multihost_interval is changed
        so remote hosts see the updated value as soon as possible.
      * It fixes a bug where setting zfs_multihost_fail_intervals = 1 results
        in immediate pool suspension.
      * Update tests to verify activity check duration is based on recorded
        tunable values, not tunable values on importing host.
      * Update tests to verify the expected number of uberblocks have valid
        MMP fields - fail_intervals, mmp_interval, mmp_seq (sequence number),
        that sequence number is incrementing, and that uberblock values match
        tunable settings.
      Reviewed-by: default avatarAndreas Dilger <andreas.dilger@whamcloud.com>
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Reviewed-by: default avatarTony Hutter <hutter2@llnl.gov>
      Signed-off-by: default avatarOlaf Faaland <faaland1@llnl.gov>
      Closes #7842 
      060f0226
    • Jorgen Lundman's avatar
      Mutex leak in dsl_dataset_hold_obj() · d10b2f1d
      Jorgen Lundman authored
      In addition to dsl_dataset_evict_async() releasing a hold, there is
      an error case in dsl_dataset_hold_obj() which had missed 4 additional
      release calls.  This was introduced in a1d477c2.
      
      openzfsonosx-commit: https://github.com/openzfsonosx/zfs/commit/63ff7f1c
      
      
      
      Authored by: Jorgen Lundman <lundman@lundman.net>
      Reviewed-by: default avatarOlaf Faaland <faaland1@llnl.gov>
      Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
      Ported-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #8517
      d10b2f1d
    • cfzhu's avatar
      QAT: Allocate digest_buffer using QAT_PHYS_CONTIG_ALLOC() · 45001b94
      cfzhu authored
      
      If the buffer 'digest_buffer' is allocated in the qat_checksum()
      stack, it can't ensure that the address is physically contiguous,
      and the DMA result of the buffer may be handled incorrectly.
      Using QAT_PHYS_CONTIG_ALLOC() ensures a physically
      contiguous allocation.
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Reviewed-by: default avatarTom Caputi <tcaputi@datto.com>
      Signed-off-by: default avatarChengfei, Zhu <chengfeix.zhu@intel.com>
      Closes #8323 
      Closes #8521 
      45001b94
    • Brian Behlendorf's avatar
      Report holes when there are only metadata changes · ec4f9b8f
      Brian Behlendorf authored
      
      Update the dirty check in dmu_offset_next() such that dnode's
      are only considered dirty for the purpose or reporting holes
      when there are pending data blocks or frees to be synced.  This
      ensures that when there are only metadata updates to be synced
      (atime) that holes are reported.
      Reviewed-by: default avatarDebabrata Banerjee <dbanerje@akamai.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #6958 
      Closes #8505 
      ec4f9b8f
    • Brian Behlendorf's avatar
      Improve `zpool labelclear` · 066da71e
      Brian Behlendorf authored
      
      1) As implemented the `zpool labelclear` command overwrites
      the calculated offsets of all four vdev labels even when only a
      single valid label is found.  If the device as been re-purposed
      but still contains a valid label this can result in space no
      longer owned by ZFS being zeroed.  Prevent this by verifying
      every label removed is intact before it's overwritten.
      
      2) Address a small bug in zpool_do_labelclear() which prevented
      labelclear from working on file vdevs.  Only block devices support
      BLKFLSBUF, try the ioctl() but when it's reported as unsupported
      this should not be fatal.
      
      3) Fix `zpool labelclear` so it can be run on vdevs which were
      removed from the pool with `zpool remove`.  Additionally, allow
      intact but partial labels to be cleared as in the case of a failed
      `zpool attach` or `zpool replace`.
      
      4) Remove LABELCLEAR and LABELREAD variables for test cases.
      Reviewed-by: default avatarMatt Ahrens <mahrens@delphix.com>
      Reviewed-by: default avatarTim Chase <tim@chase2k.com>
      Reviewed-by: default avatarTony Hutter <hutter2@llnl.gov>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #8500 
      Closes #8373 
      Closes #6261 
      066da71e
  18. 20 Mar, 2019 1 commit
    • Julian Heuking's avatar
      Add missing dmu_zfetch_fini() in dnode_move_impl() · 304d469d
      Julian Heuking authored
      
      As it turns out, on the Windows platform when rw_init() is called
      (rather its bedrock call ExInitializeResourceLite) it is placed on
      an active-list of locks, and is removed at rw_destroy() time.
      
      dnode_move() has logic to copy over the old-dnode to new-dnode,
      including calling dmu_zfetch_init(new-dnode). But due to the missing
      dmu_zfetch_fini(old-dnode), kmem will call dnode_dest() to release the
      memory (and in debug builds fill pattern 0xdeadbeef) over the Windows
      active-lock's prev/next list pointers, making Windows sad.
      
      But on other platforms, the contents of dmu_zfetch_fini() is one
      call to list_destroy() and one to rw_destroy(), which is effectively
      a no-op call and is not required. This commit is mostly for
      "correctness" and can be skipped there.
      
      Porting Notes:
      * This leak exists on Linux but currently can never happen because
        the dnode_move() functionality is not supported.
      
      openzfsonosx-commit: openzfsonosx/zfs@d95fe517
      
      Authored by: Julian Heuking <JulianH@beckhoff.com>
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Reviewed-by: default avatarMatt Ahrens <mahrens@delphix.com>
      Signed-off-by: default avatarJorgen Lundman <lundman@lundman.net>
      Closes #8519
      304d469d
  19. 19 Mar, 2019 1 commit
  20. 15 Mar, 2019 2 commits
    • Brian Behlendorf's avatar
      Fix l2arc_evict() destroy race · ca6c7a94
      Brian Behlendorf authored
      
      When destroying an arc_buf_hdr_t its identity cannot be discarded
      until it is entirely undiscoverable.  This not only includes being
      unhashed, but also being removed from the l2arc header list.
      Discarding the header's identify prematurely renders the hash
      lock useless because it will always hash to bucket zero.
      
      This change resolves a race with l2arc_evict() by discarding the
      identity after it has been removed from the l2arc header list.
      This ensures either the header is not on the list or contains
      the correct identify.
      Reviewed-by: default avatarTom Caputi <tcaputi@datto.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #7688 
      Closes #8144 
      ca6c7a94
    • Tom Caputi's avatar
      Multiple DVA Scrubbing Fix · ab7615d9
      Tom Caputi authored
      
      Currently, there is an issue in the sequential scrub code which
      prevents self healing from working in some cases. The scrub code
      will split up all DVA copies of a bp and issue each of them
      separately. The problem is that, since each of the DVAs is no
      longer associated with the others, the self healing code doesn't
      have the opportunity to repair problems that show up in one of the
      DVAs with the data from the others.
      
      This patch fixes this issue by ensuring that all IOs issued by the
      sequential scrub code include all DVAs. Initially, only the first
      DVA of each is attempted. If an issue arises, the IO is retried
      with all available copies, giving the self healing code a chance
      to correct the issue.
      
      To test this change, this patch also adds the ability for zinject
      to specify individual DVAs to inject read errors into. We then
      add a new test case that utilizes this functionality to ensure
      scrubs and self-healing reads can handle and transparently fix
      issues with individual copies of blocks.
      Reviewed-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Reviewed-by: default avatarMatt Ahrens <mahrens@delphix.com>
      Signed-off-by: default avatarTom Caputi <tcaputi@datto.com>
      Closes #8453 
      ab7615d9