1. 20 Jul, 2016 2 commits
    • Tom Caputi's avatar
      Illumos Crypto Port module added to enable native encryption in zfs · 0b04990a
      Tom Caputi authored
      
      A port of the Illumos Crypto Framework to a Linux kernel module (found
      in module/icp). This is needed to do the actual encryption work. We cannot
      use the Linux kernel's built in crypto api because it is only exported to
      GPL-licensed modules. Having the ICP also means the crypto code can run on
      any of the other kernels under OpenZFS. I ended up porting over most of the
      internals of the framework, which means that porting over other API calls (if
      we need them) should be fairly easy. Specifically, I have ported over the API
      functions related to encryption, digests, macs, and crypto templates. The ICP
      is able to use assembly-accelerated encryption on amd64 machines and AES-NI
      instructions on Intel chips that support it. There are place-holder
      directories for similar assembly optimizations for other architectures
      (although they have not been written).
      Signed-off-by: default avatarTom Caputi <tcaputi@datto.com>
      Signed-off-by: default avatarTony Hutter <hutter2@llnl.gov>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Issue #4329
      0b04990a
    • Chunwei Chen's avatar
      Fix NULL pointer in zfs_preumount from 1d9b3bd8 · be88e733
      Chunwei Chen authored
      
      When zfs_domount fails zsb will be freed, and its caller
      mount_nodev/get_sb_nodev will do deactivate_locked_super and calls into
      zfs_preumount.
      
      In order to make sure we don't touch any nonexistent stuff, we must make sure
      s_fs_info is NULL in the fail path so zfs_preumount can easily check that.
      Signed-off-by: default avatarChunwei Chen <david.chen@osnexus.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #4867
      Issue #4854
      be88e733
  2. 19 Jul, 2016 5 commits
  3. 15 Jul, 2016 2 commits
  4. 14 Jul, 2016 4 commits
  5. 13 Jul, 2016 2 commits
    • Gvozden Neskovic's avatar
      Add RAID-Z routines for SSE2 instruction set, in x86_64 mode. · ae25d222
      Gvozden Neskovic authored
      
      The patch covers low-end and older x86 CPUs.  Parity generation is
      equivalent to SSSE3 implementation, but reconstruction is somewhat
      slower.  Previous 'sse' implementation is renamed to 'ssse3' to
      indicate highest instruction set used.
      
      Benchmark results:
      scalar_rec_p                    4    720476442
      scalar_rec_q                    4    187462804
      scalar_rec_r                    4    138996096
      scalar_rec_pq                   4    140834951
      scalar_rec_pr                   4    129332035
      scalar_rec_qr                   4    81619194
      scalar_rec_pqr                  4    53376668
      
      sse2_rec_p                      4    2427757064
      sse2_rec_q                      4    747120861
      sse2_rec_r                      4    499871637
      sse2_rec_pq                     4    522403710
      sse2_rec_pr                     4    464632780
      sse2_rec_qr                     4    319124434
      sse2_rec_pqr                    4    205794190
      
      ssse3_rec_p                     4    2519939444
      ssse3_rec_q                     4    1003019289
      ssse3_rec_r                     4    616428767
      ssse3_rec_pq                    4    706326396
      ssse3_rec_pr                    4    570493618
      ssse3_rec_qr                    4    400185250
      ssse3_rec_pqr                   4    377541245
      
      original_rec_p                  4    691658568
      original_rec_q                  4    195510948
      original_rec_r                  4    26075538
      original_rec_pq                 4    103087368
      original_rec_pr                 4    15767058
      original_rec_qr                 4    15513175
      original_rec_pqr                4    10746357
      Signed-off-by: default avatarGvozden Neskovic <neskovic@gmail.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #4783
      ae25d222
    • Gvozden Neskovic's avatar
      Fix handling of errors nvlist in zfs_ioc_recv_new() · 1bf3bf0e
      Gvozden Neskovic authored
      
      zfs_ioc_recv_impl() is changed to always allocate the 'errors'
      nvlist, its callers are responsible for freeing it.
      Signed-off-by: default avatarGvozden Neskovic <neskovic@gmail.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #4829
      1bf3bf0e
  6. 12 Jul, 2016 9 commits
    • Peng's avatar
      Fix PANIC: metaslab_free_dva(): bad DVA X:Y:Z · 81edd3e8
      Peng authored
      
      The following scenario can result in garbage in the dn_spill field.
      The db->db_blkptr must be set to NULL when DNODE_FLAG_SPILL_BLKPTR
      is clear to ensure the dn_spill field is cleared.
      
      Current txg = A.
      * A new spill buffer is created. Its dbuf is initialized with
        db_blkptr = NULL and it's dirtied.
      
      Current txg = B.
      * The spill buffer is modified. It's marked as dirty in this txg.
      * Additional changes make the spill buffer unnecessary because the
        xattr fits into the bonus buffer, so it's removed. The dbuf is
        undirtied in this txg, but it's still referenced and cannot be
        destroyed.
      
      Current txg = C.
      * Starts syncing of txg A
      * dbuf_sync_leaf() is called for the spill buffer. Since db_blkptr
        is NULL, dbuf_check_blkptr() is called.
      * The dbuf starts being written and it reaches the ready state
        (not done yet).
      * A new change makes the spill buffer necessary again.
        sa_build_layouts() ends up calling dbuf_find() to locate the
        dbuf.  It finds the old dbuf because it has not been destroyed yet
        (it will be destroyed when the previous write is done and there
        are no more references). The old dbuf has db_blkptr != NULL.
      * txg A write is complete and the dbuf released. However it's still
        referenced, so it's not destroyed.
      
      Current txg = D.
      * Starts syncing of txg B
      * dbuf_sync_leaf() is called for the bonus buffer. Its contents are
        directly copied into the dnode, overwriting the blkptr area because,
        in txg B, the bonus buffer was big enough to hold the entire xattr.
      * At this point, the db_blkptr of the spill buffer used in txg C
        gets corrupted.
      Signed-off-by: default avatarPeng <peng.hse@xtaotech.com>
      Signed-off-by: default avatarTim Chase <tim@chase2k.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #3937
      81edd3e8
    • Brian Behlendorf's avatar
      Fix RAIDZ_TEST tests · 62b2d54b
      Brian Behlendorf authored
      
      Remove stray trailing } which prevented the raidz stress tests from
      running in-tree.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      62b2d54b
    • Chunwei Chen's avatar
      Kill zp->z_xattr_parent to prevent pinning · 31b6111f
      Chunwei Chen authored
      zp->z_xattr_parent will pin the parent. This will cause huge issue
      when unlink a file with xattr. Because the unlinked file is pinned, it
      will never get purged immediately. And because of that, the xattr
      stuff will never be marked as unlinked. So the whole unlinked stuff
      will stay there until shrink cache or umount.
      
      This change partially reverts e89260a1
      
      .  This is safe because only the
      zp->z_xattr_parent optimization is removed, zpl_xattr_security_init()
      is still called from the zpl outside the inode lock.
      Signed-off-by: default avatarChunwei Chen <david.chen@osnexus.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarChris Dunlop <chris@onthe.net.au>
      Issue #4359
      Issue #3508
      Issue #4413
      Issue #4827
      31b6111f
    • Chunwei Chen's avatar
      xattr dir doesn't get purged during iput · ddae16a9
      Chunwei Chen authored
      
      We need to set inode->i_nlink to zero so iput will purge it. Without this, it
      will get purged during shrink cache or umount, which would likely result in
      deadlock due to zfs_zget waiting forever on its children which are in the
      dispose_list of the same thread.
      Signed-off-by: default avatarChunwei Chen <david.chen@osnexus.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarChris Dunlop <chris@onthe.net.au>
      Issue #4359
      Issue #3508
      Issue #4413
      Issue #4827
      ddae16a9
    • Chunwei Chen's avatar
      fh_to_dentry should return ESTALE when generation mismatch · 6c253064
      Chunwei Chen authored
      
      When generation mismatch, it usually means the file pointed by the file handle
      was deleted. We should return ESTALE to indicate this. We return ENOENT in
      zfs_vget since zpl_fh_to_dentry will convert it to ESTALE.
      Signed-off-by: default avatarChunwei Chen <david.chen@osnexus.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Issue #4828
      6c253064
    • Chunwei Chen's avatar
      Add configure result for xattr_handler · d4701011
      Chunwei Chen authored
      Signed-off-by: default avatarChunwei Chen <david.chen@osnexus.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Issue #4828
      d4701011
    • Chunwei Chen's avatar
      Fix Large kmem_alloc in vdev_metaslab_init · bffb68a2
      Chunwei Chen authored
      
      This allocation can go way over 1MB, so we should use vmem_alloc
      instead of kmem_alloc.
      
        Large kmem_alloc(1430784, 0x1000), please file an issue...
        Call Trace:
         [<ffffffffa0324aff>] ? spl_kmem_zalloc+0xef/0x160 [spl]
         [<ffffffffa17d0c8d>] ? vdev_metaslab_init+0x9d/0x1f0 [zfs]
         [<ffffffffa17d46d0>] ? vdev_load+0xc0/0xd0 [zfs]
         [<ffffffffa17d4643>] ? vdev_load+0x33/0xd0 [zfs]
         [<ffffffffa17c0004>] ? spa_load+0xfc4/0x1b60 [zfs]
         [<ffffffffa17c1838>] ? spa_tryimport+0x98/0x430 [zfs]
         [<ffffffffa17f28b1>] ? zfs_ioc_pool_tryimport+0x41/0x80 [zfs]
         [<ffffffffa17f5669>] ? zfsdev_ioctl+0x4a9/0x4e0 [zfs]
         [<ffffffff811bacdf>] ? do_vfs_ioctl+0x2cf/0x4b0
         [<ffffffff811baf41>] ? SyS_ioctl+0x81/0xa0
      Signed-off-by: default avatarChunwei Chen <david.chen@osnexus.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #4752
      bffb68a2
    • Chunwei Chen's avatar
      Don't allow accessing XATTR via export handle · 7938c2ac
      Chunwei Chen authored
      
      Allow accessing XATTR through export handle is a very bad idea. It
      would allow user to write whatever they want in fields where they
      otherwise could not.
      Signed-off-by: default avatarChunwei Chen <david.chen@osnexus.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Issue #4828
      7938c2ac
    • Chunwei Chen's avatar
      Fix get_zfs_sb race with concurrent umount · 061460df
      Chunwei Chen authored
      
      Certain ioctl operations will call get_zfs_sb, which will holds an active
      count on sb without checking whether it's active or not. This will result
      in use-after-free. We fix this by using atomic_inc_not_zero to make sure
      we got an active sb.
      
      P1                                          P2
      ---                                         ---
      deactivate_locked_super(): s_active = 0
                                                  zfs_sb_hold()
                                                  ->get_zfs_sb(): s_active = 1
      ->zpl_kill_sb()
      -->zpl_put_super()
      --->zfs_umount()
      ---->zfs_sb_free(zsb)
                                                  zfs_sb_rele(zsb)
      Signed-off-by: default avatarChunwei Chen <david.chen@osnexus.com>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      061460df
  7. 11 Jul, 2016 1 commit
  8. 29 Jun, 2016 2 commits
    • Brian Behlendorf's avatar
      Merge branch 'illumos-2605' · 5c27b296
      Brian Behlendorf authored
      
      Adds support for resuming interrupted zfs send streams and include
      all related send/recv bug fixes from upstream OpenZFS.
      
      Unlike the upstream implementation this branch does not change
      the existing ioctl interface.  Instead a new ZFS_IOC_RECV_NEW ioctl
      was added to support resuming zfs send streams.  This was done by
      applying the original upstream patch and then reverting the ioctl
      changes in a follow up patch.  For this reason there are a handful
      on commits between the relevant patches on this branch which are
      not interoperable.  This was done to make it easier to extract
      the new ZFS_IOC_RECV_NEW and submit it upstream.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #4742
      5c27b296
    • Brian Behlendorf's avatar
      Vectorized fletcher_4 must be 128-bit aligned · 0dab2e84
      Brian Behlendorf authored
      
      The fletcher_4_native() and fletcher_4_byteswap() functions may only
      safely use the vectorized implementations when the buffer is 128-bit
      aligned.  This is because both the AVX2 and SSE implementations process
      four 32-bit words per iterations.  Fallback to the scalar implementation
      which only processes a single 32-bit word for unaligned buffers.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarGvozden Neskovic <neskovic@gmail.com>
      Issue #4330
      0dab2e84
  9. 28 Jun, 2016 10 commits
  10. 24 Jun, 2016 3 commits
    • Brian Behlendorf's avatar
      Sync DMU_BACKUP_FEATURE_* flags · 669cf0ab
      Brian Behlendorf authored
      
      Flag 20 was used in OpenZFS as DMU_BACKUP_FEATURE_RESUMING.  The
      DMU_BACKUP_FEATURE_LARGE_DNODE flag must be shifted to 21 and
      then reserved in the upstream OpenZFS implementation.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarNed Bass <bass6@llnl.gov>
      Closes #4795
      669cf0ab
    • Ned Bass's avatar
      Implement large_dnode pool feature · 50c957f7
      Ned Bass authored
      
      Justification
      -------------
      
      This feature adds support for variable length dnodes. Our motivation is
      to eliminate the overhead associated with using spill blocks.  Spill
      blocks are used to store system attribute data (i.e. file metadata) that
      does not fit in the dnode's bonus buffer. By allowing a larger bonus
      buffer area the use of a spill block can be avoided.  Spill blocks
      potentially incur an additional read I/O for every dnode in a dnode
      block. As a worst case example, reading 32 dnodes from a 16k dnode block
      and all of the spill blocks could issue 33 separate reads. Now suppose
      those dnodes have size 1024 and therefore don't need spill blocks.  Then
      the worst case number of blocks read is reduced to from 33 to two--one
      per dnode block. In practice spill blocks may tend to be co-located on
      disk with the dnode blocks so the reduction in I/O would not be this
      drastic. In a badly fragmented pool, however, the improvement could be
      significant.
      
      ZFS-on-Linux systems that make heavy use of extended attributes would
      benefit from this feature. In particular, ZFS-on-Linux supports the
      xattr=sa dataset property which allows file extended attribute data
      to be stored in the dnode bonus buffer as an alternative to the
      traditional directory-based format. Workloads such as SELinux and the
      Lustre distributed filesystem often store enough xattr data to force
      spill bocks when xattr=sa is in effect. Large dnodes may therefore
      provide a performance benefit to such systems.
      
      Other use cases that may benefit from this feature include files with
      large ACLs and symbolic links with long target names. Furthermore,
      this feature may be desirable on other platforms in case future
      applications or features are developed that could make use of a
      larger bonus buffer area.
      
      Implementation
      --------------
      
      The size of a dnode may be a multiple of 512 bytes up to the size of
      a dnode block (currently 16384 bytes). A dn_extra_slots field was
      added to the current on-disk dnode_phys_t structure to describe the
      size of the physical dnode on disk. The 8 bits for this field were
      taken from the zero filled dn_pad2 field. The field represents how
      many "extra" dnode_phys_t slots a dnode consumes in its dnode block.
      This convention results in a value of 0 for 512 byte dnodes which
      preserves on-disk format compatibility with older software.
      
      Similarly, the in-memory dnode_t structure has a new dn_num_slots field
      to represent the total number of dnode_phys_t slots consumed on disk.
      Thus dn->dn_num_slots is 1 greater than the corresponding
      dnp->dn_extra_slots. This difference in convention was adopted
      because, unlike on-disk structures, backward compatibility is not a
      concern for in-memory objects, so we used a more natural way to
      represent size for a dnode_t.
      
      The default size for newly created dnodes is determined by the value of
      a new "dnodesize" dataset property. By default the property is set to
      "legacy" which is compatible with older software. Setting the property
      to "auto" will allow the filesystem to choose the most suitable dnode
      size. Currently this just sets the default dnode size to 1k, but future
      code improvements could dynamically choose a size based on observed
      workload patterns. Dnodes of varying sizes can coexist within the same
      dataset and even within the same dnode block. For example, to enable
      automatically-sized dnodes, run
      
       # zfs set dnodesize=auto tank/fish
      
      The user can also specify literal values for the dnodesize property.
      These are currently limited to powers of two from 1k to 16k. The
      power-of-2 limitation is only for simplicity of the user interface.
      Internally the implementation can handle any multiple of 512 up to 16k,
      and consumers of the DMU API can specify any legal dnode value.
      
      The size of a new dnode is determined at object allocation time and
      stored as a new field in the znode in-memory structure. New DMU
      interfaces are added to allow the consumer to specify the dnode size
      that a newly allocated object should use. Existing interfaces are
      unchanged to avoid having to update every call site and to preserve
      compatibility with external consumers such as Lustre. The new
      interfaces names are given below. The versions of these functions that
      don't take a dnodesize parameter now just call the _dnsize() versions
      with a dnodesize of 0, which means use the legacy dnode size.
      
      New DMU interfaces:
        dmu_object_alloc_dnsize()
        dmu_object_claim_dnsize()
        dmu_object_reclaim_dnsize()
      
      New ZAP interfaces:
        zap_create_dnsize()
        zap_create_norm_dnsize()
        zap_create_flags_dnsize()
        zap_create_claim_norm_dnsize()
        zap_create_link_dnsize()
      
      The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The
      spa_maxdnodesize() function should be used to determine the maximum
      bonus length for a pool.
      
      These are a few noteworthy changes to key functions:
      
      * The prototype for dnode_hold_impl() now takes a "slots" parameter.
        When the DNODE_MUST_BE_FREE flag is set, this parameter is used to
        ensure the hole at the specified object offset is large enough to
        hold the dnode being created. The slots parameter is also used
        to ensure a dnode does not span multiple dnode blocks. In both of
        these cases, if a failure occurs, ENOSPC is returned. Keep in mind,
        these failure cases are only possible when using DNODE_MUST_BE_FREE.
      
        If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0.
        dnode_hold_impl() will check if the requested dnode is already
        consumed as an extra dnode slot by an large dnode, in which case
        it returns ENOENT.
      
      * The function dmu_object_alloc() advances to the next dnode block
        if dnode_hold_impl() returns an error for a requested object.
        This is because the beginning of the next dnode block is the only
        location it can safely assume to either be a hole or a valid
        starting point for a dnode.
      
      * dnode_next_offset_level() and other functions that iterate
        through dnode blocks may no longer use a simple array indexing
        scheme. These now use the current dnode's dn_num_slots field to
        advance to the next dnode in the block. This is to ensure we
        properly skip the current dnode's bonus area and don't interpret it
        as a valid dnode.
      
      zdb
      ---
      The zdb command was updated to display a dnode's size under the
      "dnsize" column when the object is dumped.
      
      For ZIL create log records, zdb will now display the slot count for
      the object.
      
      ztest
      -----
      Ztest chooses a random dnodesize for every newly created object. The
      random distribution is more heavily weighted toward small dnodes to
      better simulate real-world datasets.
      
      Unused bonus buffer space is filled with non-zero values computed from
      the object number, dataset id, offset, and generation number.  This
      helps ensure that the dnode traversal code properly skips the interior
      regions of large dnodes, and that these interior regions are not
      overwritten by data belonging to other dnodes. A new test visits each
      object in a dataset. It verifies that the actual dnode size matches what
      was stored in the ztest block tag when it was created. It also verifies
      that the unused bonus buffer space is filled with the expected data
      patterns.
      
      ZFS Test Suite
      --------------
      Added six new large dnode-specific tests, and integrated the dnodesize
      property into existing tests for zfs allow and send/recv.
      
      Send/Receive
      ------------
      ZFS send streams for datasets containing large dnodes cannot be received
      on pools that don't support the large_dnode feature. A send stream with
      large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be
      unrecognized by an incompatible receiving pool so that the zfs receive
      will fail gracefully.
      
      While not implemented here, it may be possible to generate a
      backward-compatible send stream from a dataset containing large
      dnodes. The implementation may be tricky, however, because the send
      object record for a large dnode would need to be resized to a 512
      byte dnode, possibly kicking in a spill block in the process. This
      means we would need to construct a new SA layout and possibly
      register it in the SA layout object. The SA layout is normally just
      sent as an ordinary object record. But if we are constructing new
      layouts while generating the send stream we'd have to build the SA
      layout object dynamically and send it at the end of the stream.
      
      For sending and receiving between pools that do support large dnodes,
      the drr_object send record type is extended with a new field to store
      the dnode slot count. This field was repurposed from unused padding
      in the structure.
      
      ZIL Replay
      ----------
      The dnode slot count is stored in the uppermost 8 bits of the lr_foid
      field. The bits were unused as the object id is currently capped at
      48 bits.
      
      Resizing Dnodes
      ---------------
      It should be possible to resize a dnode when it is dirtied if the
      current dnodesize dataset property differs from the dnode's size, but
      this functionality is not currently implemented. Clearly a dnode can
      only grow if there are sufficient contiguous unused slots in the
      dnode block, but it should always be possible to shrink a dnode.
      Growing dnodes may be useful to reduce fragmentation in a pool with
      many spill blocks in use. Shrinking dnodes may be useful to allow
      sending a dataset to a pool that doesn't support the large_dnode
      feature.
      
      Feature Reference Counting
      --------------------------
      The reference count for the large_dnode pool feature tracks the
      number of datasets that have ever contained a dnode of size larger
      than 512 bytes. The first time a large dnode is created in a dataset
      the dataset is converted to an extensible dataset. This is a one-way
      operation and the only way to decrement the feature count is to
      destroy the dataset, even if the dataset no longer contains any large
      dnodes. The complexity of reference counting on a per-dnode basis was
      too high, so we chose to track it on a per-dataset basis similarly to
      the large_block feature.
      Signed-off-by: default avatarNed Bass <bass6@llnl.gov>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #3542
      50c957f7
    • Ned Bass's avatar
      Backfill metadnode more intelligently · 68cbd56e
      Ned Bass authored
      
      Only attempt to backfill lower metadnode object numbers if at least
      4096 objects have been freed since the last rescan, and at most once
      per transaction group. This avoids a pathology in dmu_object_alloc()
      that caused O(N^2) behavior for create-heavy workloads and
      substantially improves object creation rates.  As summarized by
      @mahrens in #4636:
      
      "Normally, the object allocator simply checks to see if the next
      object is available. The slow calls happened when dmu_object_alloc()
      checks to see if it can backfill lower object numbers. This happens
      every time we move on to a new L1 indirect block (i.e. every 32 *
      128 = 4096 objects).  When re-checking lower object numbers, we use
      the on-disk fill count (blkptr_t:blk_fill) to quickly skip over
      indirect blocks that don’t have enough free dnodes (defined as an L2
      with at least 393,216 of 524,288 dnodes free). Therefore, we may
      find that a block of dnodes has a low (or zero) fill count, and yet
      we can’t allocate any of its dnodes, because they've been allocated
      in memory but not yet written to disk. In this case we have to hold
      each of the dnodes and then notice that it has been allocated in
      memory.
      
      The end result is that allocating N objects in the same TXG can
      require CPU usage proportional to N^2."
      
      Add a tunable dmu_rescan_dnode_threshold to define the number of
      objects that must be freed before a rescan is performed. Don't bother
      to export this as a module option because testing doesn't show a
      compelling reason to change it. The vast majority of the performance
      gain comes from limit the rescan to at most once per TXG.
      Signed-off-by: default avatarNed Bass <bass6@llnl.gov>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      68cbd56e