172 lines
		
	
	
		
			7.1 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
			
		
		
	
	
			172 lines
		
	
	
		
			7.1 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
| This document gives a brief introduction to the caching
 | |
| mechanisms in the sunrpc layer that is used, in particular,
 | |
| for NFS authentication.
 | |
| 
 | |
| CACHES
 | |
| ======
 | |
| The caching replaces the old exports table and allows for
 | |
| a wide variety of values to be caches.
 | |
| 
 | |
| There are a number of caches that are similar in structure though
 | |
| quite possibly very different in content and use.  There is a corpus
 | |
| of common code for managing these caches.
 | |
| 
 | |
| Examples of caches that are likely to be needed are:
 | |
|   - mapping from IP address to client name
 | |
|   - mapping from client name and filesystem to export options
 | |
|   - mapping from UID to list of GIDs, to work around NFS's limitation
 | |
|     of 16 gids.
 | |
|   - mappings between local UID/GID and remote UID/GID for sites that
 | |
|     do not have uniform uid assignment
 | |
|   - mapping from network identify to public key for crypto authentication.
 | |
| 
 | |
| The common code handles such things as:
 | |
|    - general cache lookup with correct locking
 | |
|    - supporting 'NEGATIVE' as well as positive entries
 | |
|    - allowing an EXPIRED time on cache items, and removing
 | |
|      items after they expire, and are no longe in-use.
 | |
| 
 | |
|    Future code extensions are expect to handle
 | |
|    - making requests to user-space to fill in cache entries
 | |
|    - allowing user-space to directly set entries in the cache
 | |
|    - delaying RPC requests that depend on as-yet incomplete
 | |
|      cache entries, and replaying those requests when the cache entry
 | |
|      is complete.
 | |
|    - maintaining last-access times on cache entries
 | |
|    - clean out old entries when the caches become full
 | |
| 
 | |
| The code for performing a cache lookup is also common, but in the form
 | |
| of a template.  i.e. a #define.
 | |
| Each cache defines a lookup function by using the DefineCacheLookup
 | |
| macro, or the simpler DefineSimpleCacheLookup macro
 | |
| 
 | |
| Creating a Cache
 | |
| ----------------
 | |
| 
 | |
| 1/ A cache needs a datum to cache.  This is in the form of a
 | |
|    structure definition that must contain a
 | |
|      struct cache_head
 | |
|    as an element, usually the first.
 | |
|    It will also contain a key and some content.
 | |
|    Each cache element is reference counted and contains
 | |
|    expiry and update times for use in cache management.
 | |
| 2/ A cache needs a "cache_detail" structure that
 | |
|    describes the cache.  This stores the hash table, and some
 | |
|    parameters for cache management.
 | |
| 3/ A cache needs a lookup function.  This is created using
 | |
|    the DefineCacheLookup macro.  This lookup function is used both
 | |
|    to find entries and to update entries.  The normal mode for
 | |
|    updating an entry is to replace the old entry with a new
 | |
|    entry.  However it is possible to allow update-in-place
 | |
|    for those caches where it makes sense (no atomicity issues
 | |
|    or indirect reference counting issue)
 | |
| 4/ A cache needs to be registered using cache_register().  This
 | |
|    includes in on a list of caches that will be regularly
 | |
|    cleaned to discard old data.  For this to work, some
 | |
|    thread must periodically call cache_clean
 | |
|    
 | |
| Using a cache
 | |
| -------------
 | |
| 
 | |
| To find a value in a cache, call the lookup function passing it a the
 | |
| datum which contains key, and possibly content, and a flag saying
 | |
| whether to update the cache with new data from the datum.   Depending
 | |
| on how the cache lookup function was defined, it may take an extra
 | |
| argument to identify the particular cache in question.
 | |
| 
 | |
| Except in cases of kmalloc failure, the lookup function
 | |
| will return a new datum which will store the key and
 | |
| may contain valid content, or may not.
 | |
| This datum is typically passed to cache_check which determines the
 | |
| validity of the datum and may later initiate an upcall to fill
 | |
| in the data.
 | |
| 
 | |
| cache_check can be passed a "struct cache_req *".  This structure is
 | |
| typically embedded in the actual request and can be used to create a
 | |
| deferred copy of the request (struct cache_deferred_req).  This is
 | |
| done when the found cache item is not uptodate, but the is reason to
 | |
| believe that userspace might provide information soon.  When the cache
 | |
| item does become valid, the deferred copy of the request will be
 | |
| revisited (->revisit).  It is expected that this method will
 | |
| reschedule the request for processing.
 | |
| 
 | |
| 
 | |
| Populating a cache
 | |
| ------------------
 | |
| 
 | |
| Each cache has a name, and when the cache is registered, a directory
 | |
| with that name is created in /proc/net/rpc
 | |
| 
 | |
| This directory contains a file called 'channel' which is a channel
 | |
| for communicating between kernel and user for populating the cache.
 | |
| This directory may later contain other files of interacting
 | |
| with the cache.
 | |
| 
 | |
| The 'channel' works a bit like a datagram socket. Each 'write' is
 | |
| passed as a whole to the cache for parsing and interpretation.
 | |
| Each cache can treat the write requests differently, but it is
 | |
| expected that a message written will contain:
 | |
|   - a key
 | |
|   - an expiry time
 | |
|   - a content.
 | |
| with the intention that an item in the cache with the give key
 | |
| should be create or updated to have the given content, and the
 | |
| expiry time should be set on that item.
 | |
| 
 | |
| Reading from a channel is a bit more interesting.  When a cache
 | |
| lookup fail, or when it suceeds but finds an entry that may soon
 | |
| expiry, a request is lodged for that cache item to be updated by
 | |
| user-space.  These requests appear in the channel file.
 | |
| 
 | |
| Successive reads will return successive requests.
 | |
| If there are no more requests to return, read will return EOF, but a
 | |
| select or poll for read will block waiting for another request to be
 | |
| added.
 | |
| 
 | |
| Thus a user-space helper is likely to:
 | |
|   open the channel.
 | |
|     select for readable
 | |
|     read a request
 | |
|     write a response
 | |
|   loop.
 | |
| 
 | |
| If it dies and needs to be restarted, any requests that have not be
 | |
| answered will still appear in the file and will be read by the new
 | |
| instance of the helper.
 | |
| 
 | |
| Each cache should define a "cache_parse" method which takes a message
 | |
| written from user-space and processes it.  It should return an error
 | |
| (which propagates back to the write syscall) or 0.
 | |
| 
 | |
| Each cache should also define a "cache_request" method which
 | |
| takes a cache item and encodes a request into the buffer
 | |
| provided.
 | |
| 
 | |
| 
 | |
| Note: If a cache has no active readers on the channel, and has had not
 | |
| active readers for more than 60 seconds, further requests will not be
 | |
| added to the channel but instead all looks that do not find a valid
 | |
| entry will fail.  This is partly for backward compatibility: The
 | |
| previous nfs exports table was deemed to be authoritative and a
 | |
| failed lookup meant a definite 'no'.
 | |
| 
 | |
| request/response format
 | |
| -----------------------
 | |
| 
 | |
| While each cache is free to use it's own format for requests
 | |
| and responses over channel, the following is recommended are
 | |
| appropriate and support routines are available to help:
 | |
| Each request or response record should be printable ASCII
 | |
| with precisely one newline character which should be at the end.
 | |
| Fields within the record should be separated by spaces, normally one.
 | |
| If spaces, newlines, or nul characters are needed in a field they
 | |
| much be quotes.  two mechanisms are available:
 | |
| 1/ If a field begins '\x' then it must contain an even number of
 | |
|    hex digits, and pairs of these digits provide the bytes in the
 | |
|    field.
 | |
| 2/ otherwise a \ in the field must be followed by 3 octal digits
 | |
|    which give the code for a byte.  Other characters are treated
 | |
|    as them selves.  At the very least, space, newlines nul, and
 | |
|    '\' must be quoted in this way.
 | |
|    
 |