Commit Graph

1307 Commits

Author SHA1 Message Date
Chris von Recklinghausen ee2e68420b fuse: ioctl: translate ENOSYS in outarg
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 6a567e920fd0451bf29abc418df96c3365925770
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Jun 7 17:49:21 2023 +0200

    fuse: ioctl: translate ENOSYS in outarg

    Fuse shouldn't return ENOSYS from its ioctl implementation. If userspace
    responds with ENOSYS it should be translated to ENOTTY.

    There are two ways to return an error from the IOCTL request:

     - fuse_out_header.error
     - fuse_ioctl_out.result

    Commit 02c0cab8e734 ("fuse: ioctl: translate ENOSYS") already fixed this
    issue for the first case, but missed the second case.  This patch fixes the
    second case.

    Reported-by: Jonathan Katz <jkatz@eitmlabs.org>
    Closes: https://lore.kernel.org/all/CALKgVmcC1VUV_gJVq70n--omMJZUb4HSh_FqvLTHgNBc+HCLFQ@mail.gmail.com/
    Fixes: 02c0cab8e734 ("fuse: ioctl: translate ENOSYS")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:16:20 -04:00
Chris von Recklinghausen b838054cd9 fuse: convert fuse_try_move_page() to use folios
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 063aaad792eef49a11d7575dc9914b43c0fa3792
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date:   Tue Nov 1 10:53:23 2022 -0700

    fuse: convert fuse_try_move_page() to use folios

    Converts the function to try to move folios instead of pages. Also
    converts fuse_check_page() to fuse_get_folio() since this is its only
    caller. This change removes 15 calls to compound_head().

    Link: https://lkml.kernel.org/r/20221101175326.13265-3-vishal.moola@gmail.com
    Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Acked-by: Miklos Szeredi <mszeredi@redhat.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:30 -04:00
Chris von Recklinghausen c29656bded filemap: convert replace_page_cache_page() to replace_page_cache_folio()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 3720dd6dcac38d03424d6ba38107f39af5318bcf
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date:   Tue Nov 1 10:53:22 2022 -0700

    filemap: convert replace_page_cache_page() to replace_page_cache_folio()

    Patch series "Removing the lru_cache_add() wrapper".

    This patchset replaces all calls of lru_cache_add() with the folio
    equivalent: folio_add_lru().  This is allows us to get rid of the wrapper
    The series passes xfstests and the userfaultfd selftests.

    This patch (of 5):

    Eliminates 7 calls to compound_head().

    Link: https://lkml.kernel.org/r/20221101175326.13265-1-vishal.moola@gmail.com
    Link: https://lkml.kernel.org/r/20221101175326.13265-2-vishal.moola@gmail.com
    Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:29 -04:00
Chris von Recklinghausen 2af7596eac mm: multi-gen LRU: groundwork
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit ec1c86b25f4bdd9dce6436c0539d2a6ae676e1c4
Author: Yu Zhao <yuzhao@google.com>
Date:   Sun Sep 18 02:00:02 2022 -0600

    mm: multi-gen LRU: groundwork

    Evictable pages are divided into multiple generations for each lruvec.
    The youngest generation number is stored in lrugen->max_seq for both
    anon and file types as they are aged on an equal footing. The oldest
    generation numbers are stored in lrugen->min_seq[] separately for anon
    and file types as clean file pages can be evicted regardless of swap
    constraints. These three variables are monotonically increasing.

    Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits
    in order to fit into the gen counter in folio->flags. Each truncated
    generation number is an index to lrugen->lists[]. The sliding window
    technique is used to track at least MIN_NR_GENS and at most
    MAX_NR_GENS generations. The gen counter stores a value within [1,
    MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it
    stores 0.

    There are two conceptually independent procedures: "the aging", which
    produces young generations, and "the eviction", which consumes old
    generations.  They form a closed-loop system, i.e., "the page reclaim".
    Both procedures can be invoked from userspace for the purposes of working
    set estimation and proactive reclaim.  These techniques are commonly used
    to optimize job scheduling (bin packing) in data centers [1][2].

    To avoid confusion, the terms "hot" and "cold" will be applied to the
    multi-gen LRU, as a new convention; the terms "active" and "inactive" will
    be applied to the active/inactive LRU, as usual.

    The protection of hot pages and the selection of cold pages are based
    on page access channels and patterns. There are two access channels:
    one through page tables and the other through file descriptors. The
    protection of the former channel is by design stronger because:
    1. The uncertainty in determining the access patterns of the former
       channel is higher due to the approximation of the accessed bit.
    2. The cost of evicting the former channel is higher due to the TLB
       flushes required and the likelihood of encountering the dirty bit.
    3. The penalty of underprotecting the former channel is higher because
       applications usually do not prepare themselves for major page
       faults like they do for blocked I/O. E.g., GUI applications
       commonly use dedicated I/O threads to avoid blocking rendering
       threads.

    There are also two access patterns: one with temporal locality and the
    other without.  For the reasons listed above, the former channel is
    assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is
    present; the latter channel is assumed to follow the latter pattern unless
    outlying refaults have been observed [3][4].

    The next patch will address the "outlying refaults".  Three macros, i.e.,
    LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in
    this patch to make the entire patchset less diffy.

    A page is added to the youngest generation on faulting.  The aging needs
    to check the accessed bit at least twice before handing this page over to
    the eviction.  The first check takes care of the accessed bit set on the
    initial fault; the second check makes sure this page has not been used
    since then.  This protocol, AKA second chance, requires a minimum of two
    generations, hence MIN_NR_GENS.

    [1] https://dl.acm.org/doi/10.1145/3297858.3304053
    [2] https://dl.acm.org/doi/10.1145/3503222.3507731
    [3] https://lwn.net/Articles/495543/
    [4] https://lwn.net/Articles/815342/

    Link: https://lkml.kernel.org/r/20220918080010.2920238-6-yuzhao@google.com
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Acked-by: Brian Geffon <bgeffon@google.com>
    Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
    Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
    Acked-by: Steven Barrett <steven@liquorix.net>
    Acked-by: Suleiman Souhlal <suleiman@google.com>
    Tested-by: Daniel Byrne <djbyrne@mtu.edu>
    Tested-by: Donald Carr <d@chaos-reins.com>
    Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
    Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
    Tested-by: Sofia Trinh <sofia.trinh@edi.works>
    Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Barry Song <baohua@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michael Larabel <Michael@MichaelLarabel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Qi Zheng <zhengqi.arch@bytedance.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:45 -04:00
Chris von Recklinghausen 48cb06d2f2 iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
Conflicts: fs/cifs/file.c, fs/cifs/misc.c - We don't have
	38c8a9a52082 ("smb: move client and server files to common directory fs/smb")
	so modify them instead of fs/smb/client/file.c and fs/smb/client/misc.c
	like upstream

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 1ef255e257173f4bc44317ef2076e7e0de688fdf
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Jun 9 10:28:36 2022 -0400

    iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()

    Most of the users immediately follow successful iov_iter_get_pages()
    with advancing by the amount it had returned.

    Provide inline wrappers doing that, convert trivial open-coded
    uses of those.

    BTW, iov_iter_get_pages() never returns more than it had been asked
    to; such checks in cifs ought to be removed someday...

    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:04 -04:00
Chris von Recklinghausen bac38e7d31 new iov_iter flavour - ITER_UBUF
Conflicts: include/linux/uio.h - We already have
	de4eda9de2d9 ("use less confusing names for iov_iter direction initializers")
	so we have a preexisting definition of ITER_SOURCE (context)

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit fcb14cb1bdacec5b4374fe161e83fb8208164a85
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun May 22 14:59:25 2022 -0400

    new iov_iter flavour - ITER_UBUF

    Equivalent of single-segment iovec.  Initialized by iov_iter_ubuf(),
    checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
    ones.

    We are going to expose the things like ->write_iter() et.al. to those
    in subsequent commits.

    New predicate (user_backed_iter()) that is true for ITER_IOVEC and
    ITER_UBUF; places like direct-IO handling should use that for
    checking that pages we modify after getting them from iov_iter_get_pages()
    would need to be dirtied.

    DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
    will solve all problems - there's code that uses iter_is_iovec() to
    decide how to poke around in iov_iter guts and for that the predicate
    replacement obviously won't suffice.

    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:12:58 -04:00
Chris von Recklinghausen 6978529595 fuse: ioctl: translate ENOSYS
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 02c0cab8e7345b06f1c0838df444e2902e4138d3
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Jul 21 16:06:18 2022 +0200

    fuse: ioctl: translate ENOSYS

    Overlayfs may fail to complete updates when a filesystem lacks
    fileattr/xattr syscall support and responds with an ENOSYS error code,
    resulting in an unexpected "Function not implemented" error.

    This bug may occur with FUSE filesystems, such as davfs2.

    Steps to reproduce:

      # install davfs2, e.g., apk add davfs2
      mkdir /test mkdir /test/lower /test/upper /test/work /test/mnt
      yes '' | mount -t davfs -o ro http://some-web-dav-server/path \
        /test/lower
      mount -t overlay -o upperdir=/test/upper,lowerdir=/test/lower \
        -o workdir=/test/work overlay /test/mnt

      # when "some-file" exists in the lowerdir, this fails with "Function
      # not implemented", with dmesg showing "overlayfs: failed to retrieve
      # lower fileattr (/some-file, err=-38)"
      touch /test/mnt/some-file

    The underlying cause of this regresion is actually in FUSE, which fails to
    translate the ENOSYS error code returned by userspace filesystem (which
    means that the ioctl operation is not supported) to ENOTTY.

    Reported-by: Christian Kohlschütter <christian@kohlschutter.com>
    Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags")
    Fixes: 59efec7b90 ("fuse: implement ioctl support")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:12:57 -04:00
Chris von Recklinghausen 57e2278ca7 fuse: limit nsec
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 47912eaa061a6a81e4aa790591a1874c650733c0
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Jul 21 16:06:18 2022 +0200

    fuse: limit nsec

    Limit nanoseconds to 0..999999999.

    Fixes: d8a5ba4545 ("[PATCH] FUSE - core")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:12:57 -04:00
Chris von Recklinghausen a3dc88f645 fuse: fix fileattr op failure
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit a679a61520d8a7b0211a1da990404daf5cc80b72
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Fri Feb 18 11:47:51 2022 +0100

    fuse: fix fileattr op failure

    The fileattr API conversion broke lsattr on ntfs3g.

    Previously the ioctl(... FS_IOC_GETFLAGS) returned an EINVAL error, but
    after the conversion the error returned by the fuse filesystem was not
    propagated back to the ioctl() system call, resulting in success being
    returned with bogus values.

    Fix by checking for outarg.result in fuse_priv_ioctl(), just as generic
    ioctl code does.

    Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
    Fixes: 72227eac17 ("fuse: convert to fileattr")
    Cc: <stable@vger.kernel.org> # v5.13
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:12:35 -04:00
Miklos Szeredi 02e101a809 fuse: optional supplementary group in create requests
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134128
Upstream status: Linus
Conflicts: version log removed since version is not bumped

commit 8ed7cb3f279fe67a93f407ee2ec3ea661a483a65
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Nov 10 15:46:33 2022 +0100

    fuse: optional supplementary group in create requests

    Permission to create an object (create, mkdir, symlink, mknod) needs to
    take supplementary groups into account.

    Add a supplementary group request extension.  This can contain an arbitrary
    number of group IDs and can be added to any request.  This extension is not
    added to any request by default.

    Add FUSE_CREATE_SUPP_GROUP init flag to enable supplementary group info in
    creation requests.  This adds just a single supplementary group that
    matches the parent group in the case described above.  In other cases the
    extension is not added.

    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-08 12:45:50 +02:00
Miklos Szeredi 0faa080d2d fuse: add request extension
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2134128
Upstream status: Linus
Conflicts: version log removed since version is not bumped

commit 15d937d7ca8c55d2b0ce9116e20c780fdd0b67cc
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Nov 10 15:46:33 2022 +0100

    fuse: add request extension

    Will need to add supplementary groups to create messages, so add the
    general concept of a request extension.  A request extension is appended to
    the end of the main request.  It has a header indicating the size and type
    of the extension.

    The create security context (fuse_secctx_*) is similar to the generic
    request extension, so include that as well in a backward compatible manner.

    Add the total extension length to the request header.  The offset of the
    extension block within the request can be calculated by:

      inh->len - inh->total_extlen * 8

    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-08 12:41:10 +02:00
Herton R. Krzesinski 28876ec068 Merge: fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2828

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188181
Tested: with xfstests and bz reproducer

Signed-off-by: Pavel Reichl <preichl@redhat.com>

Omitted-fix: 5cadfbd5a11e5495cac217534c5f788168b1afd7

Omitting fix as this is basically the 1st patch of this MR.

Approved-by: Miklos Szeredi <mszeredi@redhat.com>
Approved-by: Carlos Maiolino <cmaiolino@redhat.com>

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
2023-08-07 23:14:56 +00:00
Pavel Reichl c4ea092a2a fuse: add feature flag for expire-only
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188181
Tested: with xfstests and bz reproducer
Upstream Status: RHEL only

Signed-off-by: Pavel Reichl <preichl@redhat.com>

Add an init flag idicating whether the FUSE_EXPIRE_ONLY flag of
FUSE_NOTIFY_INVAL_ENTRY is effective.

This is needed for backports of this feature, otherwise the server could
just check the protocol version.

This patch is not yet in upstream, original author is Miklos Szeredi.

Fixes: 4f8d37020e1f ("fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-07-12 16:18:49 +02:00
Pavel Reichl 328cc1ffc2 fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188181
Tested: with xfstests and bz reproducer
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Signed-off-by: Pavel Reichl <preichl@redhat.com>

Add a flag to entry expiration that lets the filesystem expire a dentry
without kicking it out from the cache immediately.

This makes a difference for overmounted dentries, where plain invalidation
would detach all submounts before dropping the dentry from the cache.  If
only expiry is set on the dentry, then any overmounts are left alone and
until ->d_revalidate() is called.

Note: ->d_revalidate() is not called for the case of following a submount,
so invalidation will only be triggered for the non-overmounted case.  The
dentry could also be mounted in a different mount instance, in which case
any submounts will still be detached.

Suggested-by: Jakob Blomer <jblomer@cern.ch>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 4f8d37020e1fd0bf6ee9381ba918135ef3712efd)
2023-07-12 15:58:37 +02:00
Jan Stancek 619fecce5d Merge: fuse: fix deadlock between atomic O_TRUNC and page invalidation
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2650

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2207472

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Approved-by: German Maglione <gmaglion@redhat.com>
Approved-by: John B. Wyatt IV <jwyatt@redhat.com>

Signed-off-by: Jan Stancek <jstancek@redhat.com>
2023-07-12 09:49:09 +02:00
Miklos Szeredi bded166390 fuse: allow non-extending parallel direct writes on the same file
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2216046

In general, as of now, in FUSE, direct writes on the same file are
serialized over inode lock i.e we hold inode lock for the full duration of
the write request.  I could not find in fuse code and git history a comment
which clearly explains why this exclusive lock is taken for direct writes.
Following might be the reasons for acquiring an exclusive lock but not be
limited to

 1) Our guess is some USER space fuse implementations might be relying on
    this lock for serialization.

 2) The lock protects against file read/write size races.

 3) Ruling out any issues arising from partial write failures.

This patch relaxes the exclusive lock for direct non-extending writes only.
File size extending writes might not need the lock either, but we are not
entirely sure if there is a risk to introduce any kind of regression.
Furthermore, benchmarking with fio does not show a difference between patch
versions that take on file size extension a) an exclusive lock and b) a
shared lock.

A possible example of an issue with i_size extending writes are write error
cases.  Some writes might succeed and others might fail for file system
internal reasons - for example ENOSPACE.  With parallel file size extending
writes it _might_ be difficult to revert the action of the failing write,
especially to restore the right i_size.

With these changes, we allow non-extending parallel direct writes on the
same file with the help of a flag called FOPEN_PARALLEL_DIRECT_WRITES.  If
this flag is set on the file (flag is passed from libfuse to fuse kernel as
part of file open/create), we do not take exclusive lock anymore, but
instead use a shared lock that allows non-extending writes to run in
parallel.  FUSE implementations which rely on this inode lock for
serialization can continue to do so and serialized direct writes are still
the default.  Implementations that do not do write serialization need to be
updated and need to set the FOPEN_PARALLEL_DIRECT_WRITES flag in their file
open/create reply.

On patch review there were concerns that network file systems (or vfs
multiple mounts of the same file system) might have issues with parallel
writes.  We believe this is not the case, as this is just a local lock,
which network file systems could not rely on anyway.  I.e. this lock is
just for local consistency.

Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 153524053bbb0d27bb2e0be36d1b46862e9ce74c)
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-21 17:03:17 +02:00
Miklos Szeredi 9afe2bf7b5 fuse: fix deadlock between atomic O_TRUNC and page invalidation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2207472
Upstream status: Linus

commit 2fdbb8dd01556e1501132b5ad3826e8f71e24a8b
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Fri Apr 22 15:48:53 2022 +0200

    fuse: fix deadlock between atomic O_TRUNC and page invalidation
    
    fuse_finish_open() will be called with FUSE_NOWRITE set in case of atomic
    O_TRUNC open(), so commit 76224355db75 ("fuse: truncate pagecache on
    atomic_o_trunc") replaced invalidate_inode_pages2() by truncate_pagecache()
    in such a case to avoid the A-A deadlock. However, we found another A-B-B-A
    deadlock related to the case above, which will cause the xfstests
    generic/464 testcase hung in our virtio-fs test environment.
    
    For example, consider two processes concurrently open one same file, one
    with O_TRUNC and another without O_TRUNC. The deadlock case is described
    below, if open(O_TRUNC) is already set_nowrite(acquired A), and is trying
    to lock a page (acquiring B), open() could have held the page lock
    (acquired B), and waiting on the page writeback (acquiring A). This would
    lead to deadlocks.
    
    open(O_TRUNC)
    ----------------------------------------------------------------
    fuse_open_common
      inode_lock            [C acquire]
      fuse_set_nowrite      [A acquire]
    
      fuse_finish_open
        truncate_pagecache
          lock_page         [B acquire]
          truncate_inode_page
          unlock_page       [B release]
    
      fuse_release_nowrite  [A release]
      inode_unlock          [C release]
    ----------------------------------------------------------------
    
    open()
    ----------------------------------------------------------------
    fuse_open_common
      fuse_finish_open
        invalidate_inode_pages2
          lock_page         [B acquire]
            fuse_launder_page
              fuse_wait_on_page_writeback [A acquire & release]
          unlock_page       [B release]
    ----------------------------------------------------------------
    
    Besides this case, all calls of invalidate_inode_pages2() and
    invalidate_inode_pages2_range() in fuse code also can deadlock with
    open(O_TRUNC).
    
    Fix by moving the truncate_pagecache() call outside the nowrite protected
    region.  The nowrite protection is only for delayed writeback
    (writeback_cache) case, where inode lock does not protect against
    truncation racing with writes on the server.  Write syscalls racing with
    page cache truncation still get the inode lock protection.
    
    This patch also changes the order of filemap_invalidate_lock()
    vs. fuse_set_nowrite() in fuse_open_common().  This new order matches the
    order found in fuse_file_fallocate() and fuse_do_setattr().
    
    Reported-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
    Tested-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
    Fixes: e4648309b8 ("fuse: truncate pending writes on O_TRUNC")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
----------------------------------------------------------------
fuse_open_common
  inode_lock            [C acquire]
  fuse_set_nowrite      [A acquire]

  fuse_finish_open
    truncate_pagecache
      lock_page         [B acquire]
      truncate_inode_page
      unlock_page       [B release]

  fuse_release_nowrite  [A release]
  inode_unlock          [C release]
----------------------------------------------------------------

open()
----------------------------------------------------------------
fuse_open_common
  fuse_finish_open
    invalidate_inode_pages2
      lock_page         [B acquire]
        fuse_launder_page
          fuse_wait_on_page_writeback [A acquire & release]
      unlock_page       [B release]
----------------------------------------------------------------

Besides this case, all calls of invalidate_inode_pages2() and
invalidate_inode_pages2_range() in fuse code also can deadlock with
open(O_TRUNC).

Fix by moving the truncate_pagecache() call outside the nowrite protected
region.  The nowrite protection is only for delayed writeback
(writeback_cache) case, where inode lock does not protect against
truncation racing with writes on the server.  Write syscalls racing with
page cache truncation still get the inode lock protection.

This patch also changes the order of filemap_invalidate_lock()
vs. fuse_set_nowrite() in fuse_open_common().  This new order matches the
order found in fuse_file_fallocate() and fuse_do_setattr().

Reported-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Tested-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Fixes: e4648309b8 ("fuse: truncate pending writes on O_TRUNC")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 2fdbb8dd01556e1501132b5ad3826e8f71e24a8b)
2023-06-09 16:28:59 +02:00
Miklos Szeredi 36e12ff997 fuse: truncate pagecache on atomic_o_trunc
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2207472
Upstream status: Linus

commit 76224355db7570cbe6b6f75c8929a1558828dd55
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Tue Aug 17 21:05:16 2021 +0200

    fuse: truncate pagecache on atomic_o_trunc
    
    fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic
    O_TRUNC.  This can deadlock with fuse_wait_on_page_writeback() in
    fuse_launder_page() triggered by invalidate_inode_pages2().
    
    Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a
    truncate_pagecache() call.  This makes sense regardless of FOPEN_KEEP_CACHE
    or fc->writeback cache, so do it unconditionally.
    
    Reported-by: Xie Yongji <xieyongji@bytedance.com>
    Reported-and-tested-by: syzbot+bea44a5189836d956894@syzkaller.appspotmail.com
    Fixes: e4648309b8 ("fuse: truncate pending writes on O_TRUNC")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-09 16:28:59 +02:00
Brian Foster 89a82054e6 fuse: wait for writepages in syncfs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2189470
Upstream Status: linux.git

commit 660585b56e63ca034ad506ea53c807c5cdca3196
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Sep 1 12:39:02 2021 +0200

    fuse: wait for writepages in syncfs

    In case of fuse the MM subsystem doesn't guarantee that page writeback
    completes by the time ->sync_fs() is called.  This is because fuse
    completes page writeback immediately to prevent DoS of memory reclaim by
    the userspace file server.

    This means that fuse itself must ensure that writes are synced before
    sending the SYNCFS request to the server.

    Introduce sync buckets, that hold a counter for the number of outstanding
    write requests.  On syncfs replace the current bucket with a new one and
    wait until the old bucket's counter goes down to zero.

    It is possible to have multiple syncfs calls in parallel, in which case
    there could be more than one waited-on buckets.  Descendant buckets must
    not complete until the parent completes.  Add a count to the child (new)
    bucket until the (parent) old bucket completes.

    Use RCU protection to dereference the current bucket and to wake up an
    emptied bucket.  Use fc->lock to protect against parallel assignments to
    the current bucket.

    This leaves just the counter to be a possible scalability issue.  The
    fc->num_waiting counter has a similar issue, so both should be addressed at
    the same time.

    Reported-by: Amir Goldstein <amir73il@gmail.com>
    Fixes: 2d82ab251e ("virtiofs: propagate sync() to file server")
    Cc: <stable@vger.kernel.org> # v5.14
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2023-04-25 07:42:41 -04:00
Brian Foster be43cb8e11 virtio_fs: Modify format for virtio_fs_direct_access
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2189470
Upstream Status: linux.git

commit 73fb2c8b61783e2e8a87f91d141bf72a12404566
Author: Deming Wang <wangdeming@inspur.com>
Date:   Wed Jun 22 17:17:58 2022 -0400

    virtio_fs: Modify format for virtio_fs_direct_access

    We should isolate operators with spaces.

    Signed-off-by: Deming Wang <wangdeming@inspur.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2023-04-25 07:42:33 -04:00
Brian Foster a120256608 virtiofs: delete unused parameter for virtio_fs_cleanup_vqs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2189470
Upstream Status: linux.git

commit 1e5b9e048cda4d284827d22546a4cb0904689c5d
Author: Deming Wang <wangdeming@inspur.com>
Date:   Thu Jun 9 22:08:38 2022 -0400

    virtiofs: delete unused parameter for virtio_fs_cleanup_vqs

    fs parameter not used. So, it needs to be deleted.

    Signed-off-by: Deming Wang <wangdeming@inspur.com>
    Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2023-04-25 07:42:25 -04:00
Brian Foster 9e3be807f1 virtiofs: use strscpy for copying the queue name
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2189470
Upstream Status: linux.git

commit 7c594bbd2de9f03e7c8d808004045696bc9c1a67
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Tue Nov 2 11:08:19 2021 +0100

    virtiofs: use strscpy for copying the queue name

    Always null terminate fsvq->name.

    Reported-by: kernel test robot <lkp@intel.com>
    Fixes: b43b7e81eb ("virtiofs: provide a helper function for virtqueue initialization")
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Brian Foster <bfoster@redhat.com>
2023-04-25 07:42:05 -04:00
Jeff Moyer 358fa83614 Merge branch 'main' into 'guilt/pmem-9.2'
Several patches to this file were backported out of order.  The result of this merge resolution matches upstream after the inclusion of all of the patches we have backported.

# Conflicts:
#   fs/iomap/buffered-io.c
2023-03-30 20:35:46 +00:00
Chris von Recklinghausen 76657b6608 fuse: Convert fuse to read_folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit 5efd00e4899e0a9b294b435d7c7bf53f42343e99
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Apr 29 11:12:16 2022 -0400

    fuse: Convert fuse to read_folio

    This is a "weak" conversion which converts straight back to using pages.
    A full conversion should be performed at some point, hopefully by
    someone familiar with the filesystem.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:59 -04:00
Chris von Recklinghausen c1e99d68fb fs: Remove flags parameter from aops->write_begin
Conflicts: drop changes to fs/ntfs3/inode.c, fs/jffs2/file.c -
	unsupported configs

Bugzilla: https://bugzilla.redhat.com/2160210

commit 9d6b0cd7579844761ed68926eb3073bab1dca87b
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Feb 22 14:31:43 2022 -0500

    fs: Remove flags parameter from aops->write_begin

    There are no more aop flags left, so remove the parameter.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:58 -04:00
Chris von Recklinghausen ef6a91bc18 fs: Remove aop flags parameter from grab_cache_page_write_begin()
Conflicts: drop changes to fs/ntfs3/inode.c, fs/jffs2/file.c -
	unsupported configs

Conflicts: drop changes to fs/ntfs3/inode.c - unsupported config

Bugzilla: https://bugzilla.redhat.com/2160210

commit b7446e7cf15f0926866c8e5de90ab278998bf8c8
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Feb 22 11:25:12 2022 -0500

    fs: Remove aop flags parameter from grab_cache_page_write_begin()

    There are no more aop flags left, so remove the parameter.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:58 -04:00
Jeff Moyer dd24bc140d dax: introduce DAX_RECOVERY_WRITE dax access mode
Bugzilla: https://bugzilla.redhat.com/2162211

commit e511c4a3d2a1f64aafc1f5df37a2ffcf7ef91b55
Author: Jane Chu <jane.chu@oracle.com>
Date:   Fri May 13 15:10:58 2022 -0700

    dax: introduce DAX_RECOVERY_WRITE dax access mode
    
    Up till now, dax_direct_access() is used implicitly for normal
    access, but for the purpose of recovery write, dax range with
    poison is requested.  To make the interface clear, introduce
            enum dax_access_mode {
                    DAX_ACCESS,
                    DAX_RECOVERY_WRITE,
            }
    where DAX_ACCESS is used for normal dax access, and
    DAX_RECOVERY_WRITE is used for dax recovery write.
    
    Suggested-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Jane Chu <jane.chu@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Mike Snitzer <snitzer@redhat.com>
    Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
    Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:55:13 -04:00
Jeff Moyer 81e85c3d95 dax: remove the copy_from_iter and copy_to_iter methods
Bugzilla: https://bugzilla.redhat.com/2162211

commit 7ac5360cd4d02cc7e0eaf10867f599e041822f12
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Dec 15 09:45:08 2021 +0100

    dax: remove the copy_from_iter and copy_to_iter methods
    
    These methods indirect the actual DAX read/write path.  In the end pmem
    uses magic flush and mc safe variants and fuse and dcssblk use plain ones
    while device mapper picks redirects to the underlying device.
    
    Add set_dax_nocache() and set_dax_nomc() APIs to control which copy
    routines are used to remove indirect call from the read/write fast path
    as well as a lot of boilerplate code.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs]
    Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:54:21 -04:00
Jeff Moyer 1352fd9762 dax: remove the DAXDEV_F_SYNC flag
Bugzilla: https://bugzilla.redhat.com/2162211

commit 30c6828a17a572aeb9e3a3bacce05fdcf1106541
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Dec 15 09:45:07 2021 +0100

    dax: remove the DAXDEV_F_SYNC flag
    
    Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
    just let the drivers call set_dax_synchronous directly.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-14 10:54:21 -04:00
Jeff Moyer 1b8a5ffdb9 dax: simplify the dax_device <-> gendisk association
Bugzilla: https://bugzilla.redhat.com/2162211
Conflicts: Minor context differences.

commit fb08a1908cb119a4585611d91461ab6d27756b14
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:38 2021 +0100

    dax: simplify the dax_device <-> gendisk association
    
    Replace the dax_host_hash with an xarray indexed by the pointer value
    of the gendisk, and require explicitly calls from the block drivers that
    want to associate their gendisk with a dax_device.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Mike Snitzer <snitzer@redhat.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-09 03:47:06 -05:00
Jeff Moyer 0856a93408 dax: remove CONFIG_DAX_DRIVER
Bugzilla: https://bugzilla.redhat.com/2162211

commit afd586f0d06ce3d81b7c474499630fec88833828
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Nov 29 11:21:37 2021 +0100

    dax: remove CONFIG_DAX_DRIVER
    
    CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Link: https://lore.kernel.org/r/20211129102203.2243509-4-hch@lst.de
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2023-03-09 03:46:06 -05:00
Frantisek Hrbata 2bd21972e5 Merge: fuse: add file_modified() to fallocate
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1634

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2112621
Upstream status: Linus

commit 4a6f278d4827b59ba26ceae0ff4529ee826aa258
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Fri Oct 28 14:25:20 2022 +0200

    fuse: add file_modified() to fallocate

    Add missing file_modified() call to fuse_file_fallocate().  Without this
    fallocate on fuse failed to clear privileges.

    Fixes: 05ba1f0823 ("fuse: add FALLOCATE operation")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Approved-by: Ian Kent <ikent@redhat.com>
Approved-by: Brian Foster <bfoster@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-12-06 02:53:16 -05:00
Miklos Szeredi cb3b02e858 fuse: lock inode unconditionally in fuse_fallocate()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2112621
Upstream status: Linus

commit 44361e8cf9ddb23f17bdcc40ca944abf32e83e79
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Nov 23 09:10:42 2022 +0100

    fuse: lock inode unconditionally in fuse_fallocate()
    
    file_modified() must be called with inode lock held.  fuse_fallocate()
    didn't lock the inode in case of just FALLOC_KEEP_SIZE flags value, which
    resulted in a kernel Warning in notify_change().
    
    Lock the inode unconditionally, like all other fallocate implementations
    do.
    
    Reported-by: Pengfei Xu <pengfei.xu@intel.com>
    Reported-and-tested-by: syzbot+462da39f0667b357c4b6@syzkaller.appspotmail.com
    Fixes: 4a6f278d4827 ("fuse: add file_modified() to fallocate")
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-11-29 10:37:12 +01:00
Miklos Szeredi 0d4f37f9d2 fuse: add file_modified() to fallocate
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2112621
Upstream status: Linus

commit 4a6f278d4827b59ba26ceae0ff4529ee826aa258
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Fri Oct 28 14:25:20 2022 +0200

    fuse: add file_modified() to fallocate
    
    Add missing file_modified() call to fuse_file_fallocate().  Without this
    fallocate on fuse failed to clear privileges.
    
    Fixes: 05ba1f0823 ("fuse: add FALLOCATE operation")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-11-15 14:20:04 +01:00
Miklos Szeredi 7923e0792b fuse: fix readdir cache race
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2142657
Upstream status: Linus
Conflicts: context only

commit 9fa248c65bdbf5af0a2f74dd38575acfc8dfd2bf
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Oct 20 17:18:58 2022 +0200

    fuse: fix readdir cache race
    
    There's a race in fuse's readdir cache that can result in an uninitilized
    page being read.  The page lock is supposed to prevent this from happening
    but in the following case it doesn't:
    
    Two fuse_add_dirent_to_cache() start out and get the same parameters
    (size=0,offset=0).  One of them wins the race to create and lock the page,
    after which it fills in data, sets rdc.size and unlocks the page.
    
    In the meantime the page gets evicted from the cache before the other
    instance gets to run.  That one also creates the page, but finds the
    size to be mismatched, bails out and leaves the uninitialized page in the
    cache.
    
    Fix by marking a filled page uptodate and ignoring non-uptodate pages.
    
    Reported-by: Frank Sorenson <fsorenso@redhat.com>
    Fixes: 5d7bc7e868 ("fuse: allow using readdir cache")
    Cc: <stable@vger.kernel.org> # v4.20
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-11-15 11:29:32 +01:00
Jeff Moyer 0cca5b4b0c fs: get rid of the res2 iocb->ki_complete argument
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2107743
Conflicts: There are various context differences from the upstream patch
           due to differences in patch application ordering.  The fscache
	   update changed the calling signature of cachefiles_read/write_
	   complete.  That change also got rid of the ret2 argument from
	   the _enter call, so that piece was dropped from this patch.
	   There was a change in the block layer update that streamlined
	   the dio structure, which results in dio->op being translated
	   to dio_op in this patch.  The write hint support code was also
	   dropped by the block layer update, so there is a context diff
	   in fs.h, where you see ki_ioprio in stead of ki_hint.

commit 6b19b766e8f077f29cdb47da5003469a85bbfb9c
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Oct 21 09:22:35 2021 -0600

    fs: get rid of the res2 iocb->ki_complete argument
    
    The second argument was only used by the USB gadget code, yet everyone
    pays the overhead of passing a zero to be passed into aio, where it
    ends up being part of the aio res2 value.
    
    Now that everybody is passing in zero, kill off the extra argument.
    
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
2022-10-27 12:59:04 -04:00
Chris von Recklinghausen 0ccb9258f5 fs: Remove ->readpages address space operation
Bugzilla: https://bugzilla.redhat.com/2120352

commit 704528d895dd3e7b173e672116b4eb2b0a0fceb0
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Mar 23 21:29:04 2022 -0400

    fs: Remove ->readpages address space operation

    All filesystems have now been converted to use ->readahead, so
    remove the ->readpages operation and fix all the comments that
    used to refer to it.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
    Acked-by: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:28:04 -04:00
Chris von Recklinghausen 77da4a630d fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
Bugzilla: https://bugzilla.redhat.com/2120352

commit 46de8b979492e1377947700ecb1e3169088668b2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:22:13 2022 +0000

    fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio

    This is a mechanical change.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:48 -04:00
Chris von Recklinghausen 65044383cc fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio
Conflicts:
	Drop changes to fs/jfs/jfs_metapage.c - CONFIG_JFS_FS is not set
	fs/ext4/inode.c - Comment block in conflict was first modified
		by
		2bb8dd401a4f ("ext4: warn when dirtying page w/o buffers in data
=journal mode")
		and an intermediary form was in merge commit
		9b03992f0c88 ("Merge tag 'ext4_for_linus' of git://git.kernel.or
g/pub/scm/linux/kernel/git/tytso/ext4")
		and that is the form this patch expects.
		Also, change 'WARN_ON_ONCE(!page_has_buffers(page));' to
		WARN_ON_ONCE(!folio_buffers(folio));
	fs/nfs/file.c - We already have
		8786fde8421c ("Convert NFS from readpages to readahead")
		so keep the line that sets .readahead

Bugzilla: https://bugzilla.redhat.com/2120352

commit 187c82cb03808ede4ee6f36aabbeb74213cd4928
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:22:03 2022 +0000

    fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_foli
o

    These filesystems use __set_page_dirty_nobuffers() either directly or
    with a very thin wrapper; convert them en masse.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:48 -04:00
Chris von Recklinghausen f32f899795 fuse: Convert from launder_page to launder_folio
Bugzilla: https://bugzilla.redhat.com/2120352

commit 2bf06b8e64280251775011f63d44e7bfc48dbdfd
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:21:56 2022 +0000

    fuse: Convert from launder_page to launder_folio

    Straightforward conversion although the helper functions still assume
    a single page.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:48 -04:00
Chris von Recklinghausen 05678e63b5 fs: Remove noop_invalidatepage()
Bugzilla: https://bugzilla.redhat.com/2120352

commit 5660a8630dab61a28e07ec00c42bf605b182d725
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:21:35 2022 +0000

    fs: Remove noop_invalidatepage()

    We used to have to use noop_invalidatepage() to prevent
    block_invalidatepage() from being called, but that behaviour is now gone.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:48 -04:00
Al Stone 9510a7b9d5 virtio: wrap config->reset calls
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071830
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=2071840
Tested: This is one of a series of patch sets to enable Arm SystemReady IR
 support in the kernel for NXP i.MX8 platforms.  At this stage, this
 has been tested by ensuring we can survive the CI/CD loop -- i.e.,
 that we have not broken anything else, and a simple boot test.  When
 sufficient drivers have been brought in for i.MX8M, we will be able
 to run further tests.

Conflicts:
    drivers/char/hw_random/virtio-rng.c

    Context differences, but the change is a straightforward replacement
    of a function invocation.

    drivers/gpio/gpio-virtio.c
    drivers/i2c/busses/i2c-virtio.c

    These virtio files are not in the current tree.  All of these changes
    have been dropped as a result.

commit d9679d0013a66849f23057978f92e76b255c50aa
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Wed Oct 13 06:55:44 2021 -0400

    virtio: wrap config->reset calls

    This will enable cleanups down the road.
    The idea is to disable cbs, then add "flush_queued_cbs" callback
    as a parameter, this way drivers can flush any work
    queued after callbacks have been disabled.

    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    (cherry picked from commit d9679d0013a66849f23057978f92e76b255c50aa)

Signed-off-by: Al Stone <ahs3@redhat.com>
2022-08-25 10:45:04 -06:00
Patrick Talbert 379ca607c0 Merge: mm: folio backports part 2
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1097

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Omitted-fix: a04cd1600b831a16625b45226b90a292c8f6e8d9

This is the second part of folio backports for 9.1. Like the first part, I tried
to avoid touching other subsystems as much as possible. Since folio conversions
leave the original functions as compatibility layer, other teams can bring their
subsystems changes whenever they want.

These are not all folio changes for 9.1 and the work will continue in 9.2.

adb11e78c5dc5 was not backported due b74355078b not being present
a04cd1600b831 fixes an issue already fixed by ec4858e07ed62eceb, which is strange because ec4858e07ed62eceb was committed earlier

v2:
- added missing fixes and dependencies
- fixed a backport error on "mm/truncate: Split invalidate_inode_page() into mapping_evict_folio()"
- added Conflicts for everything to keep scripts happy

v3:
- included 3ed4bb77156d patchset as requested

v4:
- fixed bisect build problems

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>

Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Lyude Paul <lyude@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>

Conflicts:
- drivers/gpu/drm/drm_cache.c: context differs due to !717.
- drivers/gpu/drm/nouveau/nouveau_dmem.c: context differs due to !717.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-07-15 10:00:05 +02:00
Patrick Talbert 645c70d483 Merge: virtiofs: Add support for SELinux
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1079

virtiofs: Add support for SELinux
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101526
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>

Patches backported from upstream and these applied with some fuzz but no failures. Built a kernel and tested it with SELinux enabled with virtiofs. Ran a modified version of SELinux test suite to run filesystem tests on virtiofs and it passed.

Approved-by: Miklos Szeredi <mszeredi@redhat.com>
Approved-by: Jeffrey Layton <jlayton@redhat.com>
Approved-by: Benjamin Coddington <bcodding@redhat.com>
Approved-by: Ondrej Mosnáček <omosnacek@gmail.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-07-14 12:07:51 +02:00
Aristeu Rozanski 91c006ec4a mm: don't include <linux/memremap.h> in <linux/mm.h>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
Conflicts: context differences due missing 730ff52194cdb32 and c4386bd8ee3a92

commit dc90f0846df4870b6cc8528c31e5c60f18fb68be
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Feb 16 15:31:36 2022 +1100

    mm: don't include <linux/memremap.h> in <linux/mm.h>

    Move the check for the actual pgmap types that need the free at refcount
    one behavior into the out of line helper, and thus avoid the need to
    pull memremap.h into mm.h.

    Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
    Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
    Reviewed-by: Dan Williams <dan.j.williams@intel.com>
    Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
    Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

    Cc: Alex Deucher <alexander.deucher@amd.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Ben Skeggs <bskeggs@redhat.com>
    Cc: Chaitanya Kulkarni <kch@nvidia.com>
    Cc: Karol Herbst <kherbst@redhat.com>
    Cc: Lyude Paul <lyude@redhat.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:13 -04:00
Vivek Goyal 6428576133 fuse: send security context of inode on file
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101526

commit 3e2b6fdbdc9ab5a02d9d5676a005f30780b97553
Author: Vivek Goyal <vgoyal@redhat.com>
Date: Thu, 11 Nov 2021 09:32:49 -0500

When a new inode is created, send its security context to server along with
creation request (FUSE_CREAT, FUSE_MKNOD, FUSE_MKDIR and FUSE_SYMLINK).
This gives server an opportunity to create new file and set security
context (possibly atomically).  In all the configurations it might not be
possible to set context atomically.

Like nfs and ceph, use security_dentry_init_security() to dermine security
context of inode and send it with create, mkdir, mknod, and symlink
requests.

Following is the information sent to server.

fuse_sectx_header, fuse_secctx, xattr_name, security_context

 - struct fuse_secctx_header
   This contains total number of security contexts being sent and total
   size of all the security contexts (including size of
   fuse_secctx_header).

 - struct fuse_secctx
   This contains size of security context which follows this structure.
   There is one fuse_secctx instance per security context.

 - xattr name string
   This string represents name of xattr which should be used while setting
   security context.

 - security context
   This is the actual security context whose size is specified in
   fuse_secctx struct.

Also add the FUSE_SECURITY_CTX flag for the `flags` field of the
fuse_init_out struct.  When this flag is set the kernel will append the
security context for a newly created inode to the request (create, mkdir,
mknod, and symlink).  The server is responsible for ensuring that the inode
appears atomically (preferrably) with the requested security context.

For example, If the server is using SELinux and backed by a "real" linux
file system that supports extended attributes it can write the security
context value to /proc/thread-self/attr/fscreate before making the syscall
to create the inode.

This patch is based on patch from Chirantan Ekbote <chirantan@chromium.org>

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-06-30 08:52:26 -04:00
Vivek Goyal cb73490675 fuse: extend init flags
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101526

commit 53db28933e952a8536b002ba8b8c9443ccc0e939
Author: Miklos Szeredi <mszeredi@redhat.com>
Date: Thu, 25 Nov 2021 14:05:18 +0100

FUSE_INIT flags are close to running out, so add another 32bits worth of
space.

Add FUSE_INIT_EXT flag to the old flags field in fuse_init_in.  If this
flag is set, then fuse_init_in is extended by 48bytes, in which a flags_hi
field is allocated to contain the high 32bits of the flags.

A flags_hi field is also added to fuse_init_out, allocated out of the
remaining unused fields.

Known userspace implementations of the fuse protocol have been checked to
accept the extended FUSE_INIT request, but this might cause problems with
other implementations.  If that happens to be the case, the protocol
negotiation will have to be extended with an extra initialization request
roundtrip.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2022-06-30 08:52:25 -04:00
Vivek Goyal 4d6cff9117 fuse: add FOPEN_NOFLUSH
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2101526

commit a390ccb316beb8ea594b8695d53926710ca454a3
Author: Amir Goldstein <amir73il@gmail.com>
Date: Sun, 24 Oct 2021 16:26:07 +0300

Add flag returned by FUSE_OPEN and FUSE_CREATE requests to avoid flushing
data cache on close.

Different filesystems implement ->flush() is different ways:
 - Most disk filesystems do not implement ->flush() at all
 - Some network filesystem (e.g. nfs) flush local write cache of
   FMODE_WRITE file and send a "flush" command to server
 - Some network filesystem (e.g. cifs) flush local write cache of
   FMODE_WRITE file without sending an additional command to server

FUSE flushes local write cache of ANY file, even non FMODE_WRITE
and sends a "flush" command to server (if server implements it).

The FUSE implementation of ->flush() seems over agressive and
arbitrary and does not make a lot of sense when writeback caching is
disabled.

Instead of deciding on another arbitrary implementation that makes
sense, leave the choice of per-file flush behavior in the hands of
the server.

Link: https://lore.kernel.org/linux-fsdevel/CAJfpegspE8e6aKd47uZtSYX8Y-1e1FWS0VL0DH2Skb9gQP5RJQ@mail.gmail.com/
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2022-06-30 08:52:25 -04:00
Miklos Szeredi d40487f7fc fuse: clean up error exits in fuse_fill_super()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit 964d32e512670c7b87870e30cfed2303da86d614
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Oct 21 10:01:39 2021 +0200

    fuse: clean up error exits in fuse_fill_super()
    
    Instead of "goto err", return error directly, since there's no error
    cleanup to do now.
    
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:45 +02:00
Miklos Szeredi f9652e3473 fuse: always initialize sb->s_fs_info
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit 80019f1138324b6f35ae728b4f25eeb08899b452
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Oct 21 10:01:39 2021 +0200

    fuse: always initialize sb->s_fs_info
    
    Syzkaller reports a null pointer dereference in fuse_test_super() that is
    caused by sb->s_fs_info being NULL.
    
    This is due to the fact that fuse_fill_super() is initializing s_fs_info,
    which is too late, it's already on the fs_supers list.  The initialization
    needs to be done in sget_fc() with the sb_lock held.
    
    Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into
    fuse_get_tree().
    
    After this ->kill_sb() will always be called with non-NULL ->s_fs_info,
    hence fuse_mount_destroy() can drop the test for non-NULL "fm".
    
    Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com
    Fixes: 5d5b74aa9c76 ("fuse: allow sharing existing sb")
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:45 +02:00
Miklos Szeredi 054f8ed2cd fuse: clean up fuse_mount destruction
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit c191cd07ee948c93081d8e4cba43d23b18b2f3da
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Oct 21 10:01:39 2021 +0200

    fuse: clean up fuse_mount destruction
    
    1. call fuse_mount_destroy() for open coded variants
    
    2. before deactivate_locked_super() don't need fuse_mount destruction since
    that will now be done (if ->s_fs_info is not cleared)
    
    3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the
    regular pattern can be used
    
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:45 +02:00
Miklos Szeredi 6b7aef450b fuse: get rid of fuse_put_super()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit a27c061a49afd7ad2d935e6ac734e2a9f62861b8
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Oct 21 10:01:38 2021 +0200

    fuse: get rid of fuse_put_super()
    
    The ->put_super callback is called from generic_shutdown_super() in case of
    a fully initialized sb.  This is called from kill_***_super(), which is
    called from ->kill_sb instances.
    
    Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the
    reference to the fuse_conn, while it does the same on each error case
    during sb setup.
    
    This patch moves the destruction from fuse_put_super() to
    fuse_mount_destroy(), called at the end of all ->kill_sb instances.  A
    follup patch will clean up the error paths.
    
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:45 +02:00
Miklos Szeredi d25b678d05 fuse: check s_root when destroying sb
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit d534d31d6a45d71de61db22090b4820afb68fddc
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Oct 21 10:01:38 2021 +0200

    fuse: check s_root when destroying sb
    
    Checking "fm" works because currently sb->s_fs_info is cleared on error
    paths; however, sb->s_root is what generic_shutdown_super() checks to
    determine whether the sb was fully initialized or not.
    
    This change will allow cleanup of sb setup error paths.
    
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:44 +02:00
Miklos Szeredi 044b67fbbf fuse: allow sharing existing sb
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit 5d5b74aa9c766f0dd37d5cc1a2a7a94586130501
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Aug 5 05:57:27 2021 +0200

    fuse: allow sharing existing sb
    
    Make it possible to create a new mount from a already working server.
    
    Here's a detailed description of the problem from Jakob:
    
      "The background for this question is occasional problems we see with our
       fuse filesystem [1] and mount namespaces. On a usual client, we have
       system-wide, autofs managed mountpoints. When a new mount namespace is
       created (which can be done unprivileged in combination with user
       namespaces), it can happen that a mountpoint is used inside the new
       namespace but idle in the root mount namespace. So autofs unmounts the
       parent, system-wide mountpoint. But the fuse module stays active and
       still serves mountpoint in the child mount namespace. Because the fuse
       daemon also blocks other system wide resources corresponding to the
       mountpoint, this situation effectively prevents new mounts until the
       child mount namespaces closes.
    
       [1] https://github.com/cvmfs/cvmfs"
    
    Reported-by: Jakob Blomer <jblomer@cern.ch>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:44 +02:00
Miklos Szeredi f8130bb61f fuse: move fget() to fuse_get_tree()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit 62dd1fc8cc6b22e3e568be46ebdb817e66f5d6a5
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Thu Aug 5 05:57:27 2021 +0200

    fuse: move fget() to fuse_get_tree()
    
    Affected call chains:
    
    fuse_get_tree
       -> get_tree_(bdev|nodev)
          -> fuse_fill_super
    
    Needed for following patch.
    
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:44 +02:00
Miklos Szeredi 791e7f6911 fuse: move option checking into fuse_fill_super()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit badc741459f42f51e244533ce1df1cd9ac5ac6d7
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Aug 4 13:22:58 2021 +0200

    fuse: move option checking into fuse_fill_super()
    
    Checking whether the "fd=", "rootmode=", "user_id=" and "group_id=" mount
    options are present can be moved from fuse_get_tree() into
    fuse_fill_super() where the value of the options are consumed.
    
    This relaxes semantics of reusing a fuse blockdev mount using the device
    name.  Before this patch presence of these options were enforced but values
    ignored, after this patch these options are completely ignored in this
    case.
    
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:44 +02:00
Miklos Szeredi 293ac5349d fuse: name fs_context consistently
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2021800
Upstream status: Linus
Testing: reproducer in bugzilla

commit 84c215075b5723ab946708a6c74c26bd3c51114c
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Wed Aug 4 13:22:58 2021 +0200

    fuse: name fs_context consistently
    
    Naming convention under fs/fuse/:
    
            struct fuse_conn *fc;
            struct fs_context *fsc;
    
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-05-26 15:34:44 +02:00
Patrick Talbert 0e06ec8e0e Merge: mm: Optimize list lru memory consumption
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/690

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2013413
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/690
Omitted-fix: b9663a6ff828 ("tools: Add kmem_cache_alloc_lru()")
	The tools/include/linux/slab.h and the radix-tree tests have not
	been merged into CS9 yet.

This MR backports the upstream patch series "Optimize list lru memory
consumption" to reduce memory consumption of kmalloc-32 slab cache
on systems with a large number of memory cgroups (containers). In the
extreme case, this patch series can save GBs of memory.

Signed-off-by: Waiman Long <longman@redhat.com>
~~~
Waiman Long (26):
  Compiler Attributes: add __alloc_size() for better bounds checking
  slab: clean up function prototypes
  slab: add __alloc_size attributes for better bounds checking
  mm/list_lru.c: prefer struct_size over open coded arithmetic
  memcg, kmem: further deprecate kmem.limit_in_bytes
  mm: list_lru: remove holding lru lock
  mm: list_lru: fix the return value of list_lru_count_one()
  mm: list_lru: only add memcg-aware lrus to the global lru list
  memcg: add per-memcg vmalloc stat
  memcg: add per-memcg total kernel memory stat
  mm: list_lru: transpose the array of per-node per-memcg lru lists
  mm: introduce kmem_cache_alloc_lru
  fs: introduce alloc_inode_sb() to allocate filesystems specific inode
  fs: allocate inode by using alloc_inode_sb()
  mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
  xarray: use kmem_cache_alloc_lru to allocate xa_node
  mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online()
  mm: list_lru: allocate list_lru_one only when needed
  mm: list_lru: rename memcg_drain_all_list_lrus to
    memcg_reparent_list_lrus
  mm: list_lru: replace linear array with xarray
  mm: memcontrol: reuse memory cgroup ID for kmem ID
  mm: memcontrol: fix cannot alloc the maximum memcg ID
  mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
  mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
  slab: remove __alloc_size attribute from __kmalloc_track_caller
  NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation

 .../admin-guide/cgroup-v1/memory.rst          |  11 +-
 Documentation/admin-guide/cgroup-v2.rst       |   8 +
 Documentation/filesystems/porting.rst         |   6 +
 Makefile                                      |  15 +
 block/bdev.c                                  |   2 +-
 drivers/dax/super.c                           |   2 +-
 fs/adfs/super.c                               |   2 +-
 fs/affs/super.c                               |   2 +-
 fs/afs/super.c                                |   2 +-
 fs/befs/linuxvfs.c                            |   2 +-
 fs/bfs/inode.c                                |   2 +-
 fs/btrfs/inode.c                              |   2 +-
 fs/ceph/inode.c                               |   2 +-
 fs/cifs/cifsfs.c                              |   2 +-
 fs/coda/inode.c                               |   2 +-
 fs/dcache.c                                   |   3 +-
 fs/ecryptfs/super.c                           |   2 +-
 fs/efs/super.c                                |   2 +-
 fs/erofs/super.c                              |   2 +-
 fs/exfat/super.c                              |   2 +-
 fs/ext2/super.c                               |   2 +-
 fs/ext4/super.c                               |   2 +-
 fs/fat/inode.c                                |   2 +-
 fs/freevxfs/vxfs_super.c                      |   2 +-
 fs/fuse/inode.c                               |   2 +-
 fs/gfs2/super.c                               |   2 +-
 fs/hfs/super.c                                |   2 +-
 fs/hfsplus/super.c                            |   2 +-
 fs/hostfs/hostfs_kern.c                       |   2 +-
 fs/hpfs/super.c                               |   2 +-
 fs/hugetlbfs/inode.c                          |   2 +-
 fs/inode.c                                    |   2 +-
 fs/isofs/inode.c                              |   2 +-
 fs/jffs2/super.c                              |   2 +-
 fs/jfs/super.c                                |   2 +-
 fs/minix/inode.c                              |   2 +-
 fs/nfs/inode.c                                |   2 +-
 fs/nfs/nfs42xattr.c                           |   2 +-
 fs/nilfs2/super.c                             |   2 +-
 fs/ntfs/inode.c                               |   2 +-
 fs/ocfs2/dlmfs/dlmfs.c                        |   2 +-
 fs/ocfs2/super.c                              |   2 +-
 fs/openpromfs/inode.c                         |   2 +-
 fs/orangefs/super.c                           |   2 +-
 fs/overlayfs/super.c                          |   2 +-
 fs/proc/inode.c                               |   2 +-
 fs/qnx4/inode.c                               |   2 +-
 fs/qnx6/inode.c                               |   2 +-
 fs/reiserfs/super.c                           |   2 +-
 fs/romfs/super.c                              |   2 +-
 fs/squashfs/super.c                           |   2 +-
 fs/sysv/inode.c                               |   2 +-
 fs/ubifs/super.c                              |   2 +-
 fs/udf/super.c                                |   2 +-
 fs/ufs/super.c                                |   2 +-
 fs/vboxsf/super.c                             |   2 +-
 fs/xfs/xfs_icache.c                           |   2 +-
 fs/zonefs/super.c                             |   2 +-
 include/linux/compiler-gcc.h                  |   8 +
 include/linux/compiler_attributes.h           |  10 +
 include/linux/compiler_types.h                |  12 +
 include/linux/fs.h                            |  11 +
 include/linux/list_lru.h                      |  17 +-
 include/linux/memcontrol.h                    |  63 ++-
 include/linux/slab.h                          | 101 ++--
 include/linux/swap.h                          |   5 +-
 include/linux/xarray.h                        |   9 +-
 ipc/mqueue.c                                  |   2 +-
 lib/xarray.c                                  |  10 +-
 mm/list_lru.c                                 | 464 ++++++++----------
 mm/memcontrol.c                               | 213 ++------
 mm/shmem.c                                    |   2 +-
 mm/slab.c                                     |  39 +-
 mm/slab.h                                     |  25 +-
 mm/slob.c                                     |   6 +
 mm/slub.c                                     |  42 +-
 mm/vmalloc.c                                  |  13 +-
 mm/workingset.c                               |   2 +-
 net/socket.c                                  |   2 +-
 net/sunrpc/rpc_pipe.c                         |   2 +-
 scripts/checkpatch.pl                         |   3 +-
 81 files changed, 600 insertions(+), 610 deletions(-)

Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Aristeu Rozanski <arozansk@redhat.com>
Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-05-09 09:48:03 +02:00
Waiman Long bda0da4d09 fs: allocate inode by using alloc_inode_sb()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2013413
Conflicts:
 1) Merge conflict in fs/xfs/xfs_icache.c due to missing upstream commit
    182696fb021f ("xfs: rename _zone variables to _cache").
 2) The hunk for fs/9p/vfs_inode.c is dropped due to merge conflict and
    9P filesystem not supported in RHEL9.
 3) The hunk for fs/ntfs3/super.c is dropped due to file not currently
    present.

commit fd60b28842df833477c42da6a6d63d0d114a5fcc
Author: Muchun Song <songmuchun@bytedance.com>
Date:   Tue, 22 Mar 2022 14:41:03 -0700

    fs: allocate inode by using alloc_inode_sb()

    The inode allocation is supposed to use alloc_inode_sb(), so convert
    kmem_cache_alloc() of all filesystems to alloc_inode_sb().

    Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
    Signed-off-by: Muchun Song <songmuchun@bytedance.com>
    Acked-by: Theodore Ts'o <tytso@mit.edu>         [ext4]
    Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Alex Shi <alexs@kernel.org>
    Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Fam Zheng <fam.zheng@bytedance.com>
    Cc: Jaegeuk Kim <jaegeuk@kernel.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Kari Argillander <kari.argillander@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Qi Zheng <zhengqi.arch@bytedance.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Wei Yang <richard.weiyang@gmail.com>
    Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2022-04-07 14:11:13 -04:00
Miklos Szeredi 06c9ee83ae fuse: fix pipe buffer lifetime for direct_io
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2064862
Upstream status: Linus
CVE: CVE-2022-1011

commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Mon Mar 7 16:30:44 2022 +0100

    fuse: fix pipe buffer lifetime for direct_io
    
    In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
    fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
    imports the write buffer with fuse_get_user_pages(), which uses
    iov_iter_get_pages() to grab references to userspace pages instead of
    actually copying memory.
    
    On the filesystem device side, these pages can then either be read to
    userspace (via fuse_dev_read()), or splice()d over into a pipe using
    fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
    
    This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
    the userspace filesystem can mark the request as completed, causing write()
    to return. At that point, the userspace filesystem should no longer have
    access to the pipe buffer.
    
    Fix by copying pages coming from the user address space to new pipe
    buffers.
    
    Reported-by: Jann Horn <jannh@google.com>
    Fixes: c3021629a0 ("fuse: support splice() reading from fuse device")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-03-18 15:43:40 +01:00
Andreas Gruenbacher 6368967997 iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable
Bugzilla: https://bugzilla.redhat.com/1958140
Tested: xfstests, QE tests
Upstream Status: upstream

commit a6294593e8a1290091d0b078d5d33da5e0cd3dfe
Author: Andreas Gruenbacher <agruenba@redhat.com>
Date:   Mon Aug 2 14:54:16 2021 +0200

    iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable

    Turn iov_iter_fault_in_readable into a function that returns the number
    of bytes not faulted in, similar to copy_to_user, instead of returning a
    non-zero value when any of the requested pages couldn't be faulted in.
    This supports the existing users that require all pages to be faulted in
    as well as new users that are happy if any pages can be faulted in.

    Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
    sure this change doesn't silently break things.

    [Adjusted to remove ntfs3 bits because ntfs3 doesn't exist in RHEL9]

    Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-03 14:36:01 +01:00
Dan Williams 96dcb97d0a Merge branch 'for-5.14/dax' into libnvdimm-fixes
Pick up some small dax cleanups that make some of Ira's follow on work
easier.
2021-08-11 12:04:43 -07:00
Ira Weiny 2e29be2e49 fs/fuse: Remove unneeded kaddr parameter
fuse_dax_mem_range_init() does not need the address or the pfn of the
memory requested in dax_direct_access().  It is only calling direct
access to get the number of pages.

Remove the unused variables and stop requesting the kaddr and pfn from
dax_direct_access().

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/20210525172428.3634316-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-07-07 22:10:03 -07:00
Linus Torvalds 8e4f3e1517 fuse update for 5.14
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYORvYQAKCRDh3BK/laaZ
 PCfvAQCbU+PW2RbwlqjZMet6w9qorh29XYe786P5pNRVbMYCygD+N45l66Sbd/Rz
 7M7ioVDseyTW4dnLhb8SzSNB0zr6jQs=
 =MDvD
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Fixes for virtiofs submounts

 - Misc fixes and cleanups

* tag 'fuse-update-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  virtiofs: Fix spelling mistakes
  fuse: use DIV_ROUND_UP helper macro for calculations
  fuse: fix illegal access to inode with reused nodeid
  fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)
  fuse: Make fuse_fill_super_submount() static
  fuse: Switch to fc_mount() for submounts
  fuse: Call vfs_get_tree() for submounts
  fuse: add dedicated filesystem context ops for submounts
  virtiofs: propagate sync() to file server
  fuse: reject internal errno
  fuse: check connected before queueing on fpq->io
  fuse: ignore PG_workingset after stealing
  fuse: Fix infinite loop in sget_fc()
  fuse: Fix crash if superblock of submount gets killed early
  fuse: Fix crash in fuse_dentry_automount() error path
2021-07-06 11:17:41 -07:00
Linus Torvalds d3acb15a3a Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter updates from Al Viro:
 "iov_iter cleanups and fixes.

  There are followups, but this is what had sat in -next this cycle. IMO
  the macro forest in there became much thinner and easier to follow..."

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller
  clean up copy_mc_pipe_to_iter()
  pipe_zero(): we don't need no stinkin' kmap_atomic()...
  iov_iter: clean csum_and_copy_...() primitives up a bit
  copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases
  copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases
  iterate_xarray(): only of the first iteration we might get offset != 0
  pull handling of ->iov_offset into iterate_{iovec,bvec,xarray}
  iov_iter: make iterator callbacks use base and len instead of iovec
  iov_iter: make the amount already copied available to iterator callbacks
  iov_iter: get rid of separate bvec and xarray callbacks
  iov_iter: teach iterate_{bvec,xarray}() about possible short copies
  iterate_bvec(): expand bvec.h macro forest, massage a bit
  iov_iter: unify iterate_iovec and iterate_kvec
  iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec
  iterate_and_advance(): get rid of magic in case when n is 0
  csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter()
  iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant
  [xarray] iov_iter_npages(): just use DIV_ROUND_UP()
  iov_iter_npages(): don't bother with iterate_all_kinds()
  ...
2021-07-03 11:30:04 -07:00
Matthew Wilcox (Oracle) 3a6b216200 mm: move page dirtying prototypes from mm.h
These functions implement the address_space ->set_page_dirty operation and
should live in pagemap.h, not mm.h so that the rest of the kernel doesn't
get funny ideas about calling them directly.

Link: https://lkml.kernel.org/r/20210615162342.1669332-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:48 -07:00
Matthew Wilcox (Oracle) b82a96c925 fs: remove noop_set_page_dirty()
Use __set_page_dirty_no_writeback() instead.  This will set the dirty bit
on the page, which will be used to avoid calling set_page_dirty() in the
future.  It will have no effect on actually writing the page back, as the
pages are not on any LRU lists.

[akpm@linux-foundation.org: export __set_page_dirty_no_writeback() to modules]

Link: https://lkml.kernel.org/r/20210615162342.1669332-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:48 -07:00
Zheng Yongjun c4e0cd4e0c virtiofs: Fix spelling mistakes
Fix some spelling mistakes in comments:
refernce  ==> reference
happnes  ==> happens
threhold  ==> threshold
splitted  ==> split
mached  ==> matched

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:36 +02:00
Wu Bo 6c88632be3 fuse: use DIV_ROUND_UP helper macro for calculations
Replace open coded divisor calculations with the DIV_ROUND_UP kernel macro
for better readability.

Signed-off-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:36 +02:00
Amir Goldstein 15db16837a fuse: fix illegal access to inode with reused nodeid
Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
with ourarg containing nodeid and generation.

If a fuse inode is found in inode cache with the same nodeid but different
generation, the existing fuse inode should be unhashed and marked "bad" and
a new inode with the new generation should be hashed instead.

This can happen, for example, with passhrough fuse filesystem that returns
the real filesystem ino/generation on lookup and where real inode numbers
can get recycled due to real files being unlinked not via the fuse
passthrough filesystem.

With current code, this situation will not be detected and an old fuse
dentry that used to point to an older generation real inode, can be used to
access a completely new inode, which should be accessed only via the new
dentry.

Note that because the FORGET message carries the nodeid w/o generation, the
server should wait to get FORGET counts for the nlookup counts of the old
and reused inodes combined, before it can free the resources associated to
that nodeid.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:36 +02:00
Richard W.M. Jones 6b1bdb56b1 fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)
The current fuse module filters out fallocate(FALLOC_FL_ZERO_RANGE)
returning -EOPNOTSUPP.  libnbd's nbdfuse would like to translate
FALLOC_FL_ZERO_RANGE requests into the NBD command
NBD_CMD_WRITE_ZEROES which allows NBD servers that support it to do
zeroing efficiently.

This commit treats this flag exactly like FALLOC_FL_PUNCH_HOLE.

A way to test this, requiring fuse >= 3, nbdkit >= 1.8 and the latest
nbdfuse from https://gitlab.com/nbdkit/libnbd/-/tree/master/fuse is to
create a file containing some data and "mirror" it to a fuse file:

  $ dd if=/dev/urandom of=disk.img bs=1M count=1
  $ nbdkit file disk.img
  $ touch mirror.img
  $ nbdfuse mirror.img nbd://localhost &

(mirror.img -> nbdfuse -> NBD over loopback -> nbdkit -> disk.img)

You can then run commands such as:

  $ fallocate -z -o 1024 -l 1024 mirror.img

and check that the content of the original file ("disk.img") stays
synchronized.  To show NBD commands, export LIBNBD_DEBUG=1 before
running nbdfuse.  To clean up:

  $ fusermount3 -u mirror.img
  $ killall nbdkit

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:36 +02:00
Greg Kurz 1b53991737 fuse: Make fuse_fill_super_submount() static
This function used to be called from fuse_dentry_automount(). This code
was moved to fuse_get_tree_submount() in the same file since then. It
is unlikely there will ever be another user. No need to be extern in
this case.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:35 +02:00
Greg Kurz 29e0e4df9d fuse: Switch to fc_mount() for submounts
fc_mount() already handles the vfs_get_tree(), sb->s_umount
unlocking and vfs_create_mount() sequence. Using it greatly
simplifies fuse_dentry_automount().

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:35 +02:00
Greg Kurz 266eb3f2fa fuse: Call vfs_get_tree() for submounts
We recently fixed an infinite loop by setting the SB_BORN flag on
submounts along with the write barrier needed by super_cache_count().
This is the job of vfs_get_tree() and FUSE shouldn't have to care
about the barrier at all.

Split out some code from fuse_dentry_automount() to the dedicated
fuse_get_tree_submount() handler for submounts and call vfs_get_tree().

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:35 +02:00
Greg Kurz fe0a7bd81b fuse: add dedicated filesystem context ops for submounts
The creation of a submount is open-coded in fuse_dentry_automount().
This brings a lot of complexity and we recently had to fix bugs
because we weren't setting SB_BORN or because we were unlocking
sb->s_umount before sb was fully configured. Most of these could
have been avoided by using the mount API instead of open-coding.

Basically, this means coming up with a proper ->get_tree()
implementation for submounts and call vfs_get_tree(), or better
fc_mount().

The creation of the superblock for submounts is quite different from
the root mount. Especially, it doesn't require to allocate a FUSE
filesystem context, nor to parse parameters.

Introduce a dedicated context ops for submounts to make this clear.
This is just a placeholder for now, fuse_get_tree_submount() will
be populated in a subsequent patch.

Only visible change is that we stop allocating/freeing a useless FUSE
filesystem context with submounts.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:35 +02:00
Greg Kurz 2d82ab251e virtiofs: propagate sync() to file server
Even if POSIX doesn't mandate it, linux users legitimately expect sync() to
flush all data and metadata to physical storage when it is located on the
same system.  This isn't happening with virtiofs though: sync() inside the
guest returns right away even though data still needs to be flushed from
the host page cache.

This is easily demonstrated by doing the following in the guest:

$ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
sync()                                  = 0 <0.024068>

and start the following in the host when the 'dd' command completes
in the guest:

$ strace -T -e fsync /usr/bin/sync virtiofs/foo
fsync(3)                                = 0 <10.371640>

There are no good reasons not to honor the expected behavior of sync()
actually: it gives an unrealistic impression that virtiofs is super fast
and that data has safely landed on HW, which isn't the case obviously.

Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS
request type for this purpose.  Provision a 64-bit placeholder for possible
future extensions.  Since the file server cannot handle the wait == 0 case,
we skip it to avoid a gratuitous roundtrip.  Note that this is
per-superblock: a FUSE_SYNCFS is send for the root mount and for each
submount.

Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in
the file server is treated as permanent success.  This ensures
compatibility with older file servers: the client will get the current
behavior of sync() not being propagated to the file server.

Note that such an operation allows the file server to DoS sync().  Since a
typical FUSE file server is an untrusted piece of software running in
userspace, this is disabled by default.  Only enable it with virtiofs for
now since virtiofsd is supposedly trusted by the guest kernel.

Reported-by: Robert Krawitz <rlk@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:35 +02:00
Miklos Szeredi 49221cf86d fuse: reject internal errno
Don't allow userspace to report errors that could be kernel-internal.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Fixes: 334f485df8 ("[PATCH] FUSE - device functions")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:35 +02:00
Miklos Szeredi 80ef08670d fuse: check connected before queueing on fpq->io
A request could end up on the fpq->io list after fuse_abort_conn() has
reset fpq->connected and aborted requests on that list:

Thread-1			  Thread-2
========			  ========
->fuse_simple_request()           ->shutdown
  ->__fuse_request_send()
    ->queue_request()		->fuse_abort_conn()
->fuse_dev_do_read()                ->acquire(fpq->lock)
  ->wait_for(fpq->lock) 	  ->set err to all req's in fpq->io
				  ->release(fpq->lock)
  ->acquire(fpq->lock)
  ->add req to fpq->io

After the userspace copy is done the request will be ended, but
req->out.h.error will remain uninitialized.  Also the copy might block
despite being already aborted.

Fix both issues by not allowing the request to be queued on the fpq->io
list after fuse_abort_conn() has processed this list.

Reported-by: Pradeep P V K <pragalla@codeaurora.org>
Fixes: fd22d62ed0 ("fuse: no fc->lock for iqueue parts")
Cc: <stable@vger.kernel.org> # v4.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-22 09:15:35 +02:00
Miklos Szeredi b89ecd60d3 fuse: ignore PG_workingset after stealing
Fix the "fuse: trying to steal weird page" warning.

Description from Johannes Weiner:

  "Think of it as similar to PG_active. It's just another usage/heat
   indicator of file and anon pages on the reclaim LRU that, unlike
   PG_active, persists across deactivation and even reclaim (we store it in
   the page cache / swapper cache tree until the page refaults).

   So if fuse accepts pages that can legally have PG_active set,
   PG_workingset is fine too."

Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Fixes: 1899ad18c6 ("mm: workingset: tell cache transitions from workingset thrashing")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-18 21:16:42 +02:00
Al Viro f0b65f39ac iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant
Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the
callers do *not* need to do iov_iter_advance() after it.  In case when they end
up consuming less than they'd been given they need to do iov_iter_revert() on
everything they had not consumed.  That, however, needs to be done only on slow
paths.

All in-tree callers converted.  And that kills the last user of iterate_all_kinds()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:14 -04:00
Greg Kurz e4a9ccdd1c fuse: Fix infinite loop in sget_fc()
We don't set the SB_BORN flag on submounts. This is wrong as these
superblocks are then considered as partially constructed or dying
in the rest of the code and can break some assumptions.

One such case is when you have a virtiofs filesystem with submounts
and you try to mount it again : virtio_fs_get_tree() tries to obtain
a superblock with sget_fc(). The logic in sget_fc() is to loop until
it has either found an existing matching superblock with SB_BORN set
or to create a brand new one. It is assumed that a superblock without
SB_BORN is transient and the loop is restarted. Forgetting to set
SB_BORN on submounts hence causes sget_fc() to retry forever.

Setting SB_BORN requires special care, i.e. a write barrier for
super_cache_count() which can check SB_BORN without taking any lock.
We should call vfs_get_tree() to deal with that but this requires
to have a proper ->get_tree() implementation for submounts, which
is a bigger piece of work. Go for a simple bug fix in the meatime.

Fixes: bf109c6404 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-09 15:33:40 +02:00
Greg Kurz e3a43f2a95 fuse: Fix crash if superblock of submount gets killed early
As soon as fuse_dentry_automount() does up_write(&sb->s_umount), the
superblock can theoretically be killed. If this happens before the
submount was added to the &fc->mounts list, fuse_mount_remove() later
crashes in list_del_init() because it assumes the submount to be
already there.

Add the submount before dropping sb->s_umount to fix the inconsistency.
It is okay to nest fc->killsb under sb->s_umount, we already do this
on the ->kill_sb() path.

Signed-off-by: Greg Kurz <groug@kaod.org>
Fixes: bf109c6404 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-09 15:33:40 +02:00
Greg Kurz d92d88f056 fuse: Fix crash in fuse_dentry_automount() error path
If fuse_fill_super_submount() returns an error, the error path
triggers a crash:

[   26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
[   26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90
[...]
[   26.247938] Call Trace:
[   26.248300]  fuse_mount_remove+0x2c/0x70 [fuse]
[   26.248892]  virtio_kill_sb+0x22/0x160 [virtiofs]
[   26.249487]  deactivate_locked_super+0x36/0xa0
[   26.250077]  fuse_dentry_automount+0x178/0x1a0 [fuse]

The crash happens because fuse_mount_remove() assumes that the FUSE
mount was already added to list under the FUSE connection, but this
only done after fuse_fill_super_submount() has returned success.

This means that until fuse_fill_super_submount() has returned success,
the FUSE mount isn't actually owned by the superblock. We should thus
reclaim ownership by clearing sb->s_fs_info, which will skip the call
to fuse_mount_remove(), and perform rollback, like virtio_fs_get_tree()
already does for the root sb.

Fixes: bf109c6404 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-06-09 15:33:40 +02:00
Al Viro 8959a23924 fuse_fill_write_pages(): don't bother with iov_iter_single_seg_count()
another rudiment of fault-in originally having been limited to the
first segment, same as in generic_perform_write() and friends.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-03 10:34:55 -04:00
Linus Torvalds 27787ba3fa Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted stuff all over the place"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  useful constants: struct qstr for ".."
  hostfs_open(): don't open-code file_dentry()
  whack-a-mole: kill strlen_user() (again)
  autofs: should_expire() argument is guaranteed to be positive
  apparmor:match_mn() - constify devpath argument
  buffer: a small optimization in grow_buffers
  get rid of autofs_getpath()
  constify dentry argument of dentry_path()/dentry_path_raw()
2021-05-02 09:14:01 -07:00
Linus Torvalds 9ec1efbf9d fuse update for 5.13
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYIwY/wAKCRDh3BK/laaZ
 PNSmAPwLFCBGegvwxUSguiPmIXpDrrlG+USwTzGlxhVOg2ETGgEA6D+Lsz2uCBI3
 xLkPAXD6uTbWLp13YtUSMXK+LR8V5wc=
 =Fl+Q
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Fix a page locking bug in write (introduced in 2.6.26)

 - Allow sgid bit to be killed in setacl()

 - Miscellaneous fixes and cleanups

* tag 'fuse-update-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  cuse: simplify refcount
  cuse: prevent clone
  virtiofs: fix userns
  virtiofs: remove useless function
  virtiofs: split requests that exceed virtqueue size
  virtiofs: fix memory leak in virtio_fs_probe()
  fuse: invalidate attrs when page writeback completes
  fuse: add a flag FUSE_SETXATTR_ACL_KILL_SGID to kill SGID
  fuse: extend FUSE_SETXATTR request
  fuse: fix matching of FUSE_DEV_IOC_CLONE command
  fuse: fix a typo
  fuse: don't zero pages twice
  fuse: fix typo for fuse_conn.max_pages comment
  fuse: fix write deadlock
2021-04-30 15:23:16 -07:00
Linus Torvalds a4f7fae101 Merge branch 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fileattr conversion updates from Miklos Szeredi via Al Viro:
 "This splits the handling of FS_IOC_[GS]ETFLAGS from ->ioctl() into a
  separate method.

  The interface is reasonably uniform across the filesystems that
  support it and gives nice boilerplate removal"

* 'miklos.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
  ovl: remove unneeded ioctls
  fuse: convert to fileattr
  fuse: add internal open/release helpers
  fuse: unsigned open flags
  fuse: move ioctl to separate source file
  vfs: remove unused ioctl helpers
  ubifs: convert to fileattr
  reiserfs: convert to fileattr
  ocfs2: convert to fileattr
  nilfs2: convert to fileattr
  jfs: convert to fileattr
  hfsplus: convert to fileattr
  efivars: convert to fileattr
  xfs: convert to fileattr
  orangefs: convert to fileattr
  gfs2: convert to fileattr
  f2fs: convert to fileattr
  ext4: convert to fileattr
  ext2: convert to fileattr
  btrfs: convert to fileattr
  ...
2021-04-27 11:18:24 -07:00
Linus Torvalds d1466bc583 Merge branch 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode type handling updates from Al Viro:
 "We should never change the type bits of ->i_mode or the method tables
  (->i_op and ->i_fop) of a live inode.

  Unfortunately, not all filesystems took care to prevent that"

* 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  spufs: fix bogosity in S_ISGID handling
  9p: missing chunk of "fs/9p: Don't update file type when updating file attributes"
  openpromfs: don't do unlock_new_inode() until the new inode is set up
  hostfs_mknod(): don't bother with init_special_inode()
  cifs: have cifs_fattr_to_inode() refuse to change type on live inode
  cifs: have ->mkdir() handle race with another client sanely
  do_cifs_create(): don't set ->i_mode of something we had not created
  gfs2: be careful with inode refresh
  ocfs2_inode_lock_update(): make sure we don't change the type bits of i_mode
  orangefs_inode_is_stale(): i_mode type bits do *not* form a bitmap...
  vboxsf: don't allow to change the inode type
  afs: Fix updating of i_mode due to 3rd party change
  ceph: don't allow type or device number to change on non-I_NEW inodes
  ceph: fix up error handling with snapdirs
  new helper: inode_wrong_type()
2021-04-27 10:57:42 -07:00
Al Viro 80e5d1ff5d useful constants: struct qstr for ".."
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-04-15 22:36:45 -04:00
Miklos Szeredi 3c9c14338c cuse: simplify refcount
Put extra reference early in cuse_channel_open().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:58 +02:00
Miklos Szeredi 8217673d07 cuse: prevent clone
For cloned connections cuse_channel_release() will be called more than
once, resulting in use after free.

Prevent device cloning for CUSE, which does not make sense at this point,
and highly unlikely to be used in real life.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:58 +02:00
Miklos Szeredi 0a7419c68a virtiofs: fix userns
get_user_ns() is done twice (once in virtio_fs_get_tree() and once in
fuse_conn_init()), resulting in a reference leak.

Also looks better to use fsc->user_ns (which *should* be the
current_user_ns() at this point).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:58 +02:00
Jiapeng Chong 07595bfa24 virtiofs: remove useless function
Fix the following clang warning:

fs/fuse/virtio_fs.c:130:35: warning: unused function 'vq_to_fpq'
[-Wunused-function].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Connor Kuehl a7f0d7aab0 virtiofs: split requests that exceed virtqueue size
If an incoming FUSE request can't fit on the virtqueue, the request is
placed onto a workqueue so a worker can try to resubmit it later where
there will (hopefully) be space for it next time.

This is fine for requests that aren't larger than a virtqueue's maximum
capacity.  However, if a request's size exceeds the maximum capacity of the
virtqueue (even if the virtqueue is empty), it will be doomed to a life of
being placed on the workqueue, removed, discovered it won't fit, and placed
on the workqueue yet again.

Furthermore, from section 2.6.5.3.1 (Driver Requirements: Indirect
Descriptors) of the virtio spec:

  "A driver MUST NOT create a descriptor chain longer than the Queue
  Size of the device."

To fix this, limit the number of pages FUSE will use for an overall
request.  This way, each request can realistically fit on the virtqueue
when it is decomposed into a scattergather list and avoid violating section
2.6.5.3.1 of the virtio spec.

Signed-off-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Luis Henriques c79c5e0178 virtiofs: fix memory leak in virtio_fs_probe()
When accidentally passing twice the same tag to qemu, kmemleak ended up
reporting a memory leak in virtiofs.  Also, looking at the log I saw the
following error (that's when I realised the duplicated tag):

  virtiofs: probe of virtio5 failed with error -17

Here's the kmemleak log for reference:

unreferenced object 0xffff888103d47800 (size 1024):
  comm "systemd-udevd", pid 118, jiffies 4294893780 (age 18.340s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 80 90 02 a0 ff ff ff ff  ................
  backtrace:
    [<000000000ebb87c1>] virtio_fs_probe+0x171/0x7ae [virtiofs]
    [<00000000f8aca419>] virtio_dev_probe+0x15f/0x210
    [<000000004d6baf3c>] really_probe+0xea/0x430
    [<00000000a6ceeac8>] device_driver_attach+0xa8/0xb0
    [<00000000196f47a7>] __driver_attach+0x98/0x140
    [<000000000b20601d>] bus_for_each_dev+0x7b/0xc0
    [<00000000399c7b7f>] bus_add_driver+0x11b/0x1f0
    [<0000000032b09ba7>] driver_register+0x8f/0xe0
    [<00000000cdd55998>] 0xffffffffa002c013
    [<000000000ea196a2>] do_one_initcall+0x64/0x2e0
    [<0000000008f727ce>] do_init_module+0x5c/0x260
    [<000000003cdedab6>] __do_sys_finit_module+0xb5/0x120
    [<00000000ad2f48c6>] do_syscall_64+0x33/0x40
    [<00000000809526b5>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Fixes: a62a8ef9d9 ("virtio-fs: add virtiofs filesystem")
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Vivek Goyal 3466958beb fuse: invalidate attrs when page writeback completes
In fuse when a direct/write-through write happens we invalidate attrs
because that might have updated mtime/ctime on server and cached
mtime/ctime will be stale.

What about page writeback path.  Looks like we don't invalidate attrs
there.  To be consistent, invalidate attrs in writeback path as well.  Only
exception is when writeback_cache is enabled.  In that case we strust local
mtime/ctime and there is no need to invalidate attrs.

Recently users started experiencing failure of xfstests generic/080,
geneirc/215 and generic/614 on virtiofs.  This happened only newer "stat"
utility and not older one.  This patch fixes the issue.

So what's the root cause of the issue.  Here is detailed explanation.

generic/080 test does mmap write to a file, closes the file and then checks
if mtime has been updated or not.  When file is closed, it leads to
flushing of dirty pages (and that should update mtime/ctime on server).
But we did not explicitly invalidate attrs after writeback finished.  Still
generic/080 passed so far and reason being that we invalidated atime in
fuse_readpages_end().  This is called in fuse_readahead() path and always
seems to trigger before mmaped write.

So after mmaped write when lstat() is called, it sees that atleast one of
the fields being asked for is invalid (atime) and that results in
generating GETATTR to server and mtime/ctime also get updated and test
passes.

But newer /usr/bin/stat seems to have moved to using statx() syscall now
(instead of using lstat()).  And statx() allows it to query only ctime or
mtime (and not rest of the basic stat fields).  That means when querying
for mtime, fuse_update_get_attr() sees that mtime is not invalid (only
atime is invalid).  So it does not generate a new GETATTR and fill stat
with cached mtime/ctime.  And that means updated mtime is not seen by
xfstest and tests start failing.

Invalidating attrs after writeback completion should solve this problem in
a generic manner.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Vivek Goyal 550a7d3bc0 fuse: add a flag FUSE_SETXATTR_ACL_KILL_SGID to kill SGID
When posix access ACL is set, it can have an effect on file mode and it can
also need to clear SGID if.

- None of caller's group/supplementary groups match file owner group.
AND
- Caller is not priviliged (No CAP_FSETID).

As of now fuser server is responsible for changing the file mode as
well. But it does not know whether to clear SGID or not.

So add a flag FUSE_SETXATTR_ACL_KILL_SGID and send this info with SETXATTR
to let file server know that sgid needs to be cleared as well.

Reported-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Vivek Goyal 52a4c95f4d fuse: extend FUSE_SETXATTR request
Fuse client needs to send additional information to file server when it
calls SETXATTR(system.posix_acl_access), so add extra flags field to the
structure.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Alessio Balsini 6076f5f341 fuse: fix matching of FUSE_DEV_IOC_CLONE command
With commit f8425c9396 ("fuse: 32-bit user space ioctl compat for fuse
device") the matching constraints for the FUSE_DEV_IOC_CLONE ioctl command
are relaxed, limited to the testing of command type and number.  As Arnd
noticed, this is wrong as it wouldn't ensure the correctness of the data
size or direction for the received FUSE device ioctl.

Fix by bringing back the comparison of the ioctl received by the FUSE
device to the originally generated FUSE_DEV_IOC_CLONE.

Fixes: f8425c9396 ("fuse: 32-bit user space ioctl compat for fuse device")
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00
Bhaskar Chowdhury aa6ff555f0 fuse: fix a typo
s/reponsible/responsible/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-14 10:40:57 +02:00