Anatomy of ObjectContext, the Ceph in core representation of an object

An ObjectContext is created when a ReplicatedPG applies operations on an object.

read/write mutual exclusion

The C_OSD_OndiskWriteUnlock callback is registered to be called after a transaction (read in this case) completes. It will signal the writes and reads waiting if all writes are done.
Before adding an entry that will write an object to a transaction ( for instance when ReplicatedPG::mark_object_lost sets the object_info_t::lost data member to true ) the ObjectContext::ondisk_write_lock method is called and will Cond::Wait until all reads complete. The caller of mark_object_lost adds the ObjectContext to a list that is used to build the C_OSD_OndiskWriteUnlockList callback that will be called when the transaction completes and call ondisk_write_unlock on each object.

The logic is similar when reverting an object to a prior version, applying a replica operation, or RepliatedPG::handle_pull_response.

ondisk_read_lock is called by ReplicatedPG::do_op and unlocked after calling prepare_transaction. ReplicatedPG::recover_object_replicas and ReplicatedPG::push_backfill_object do the same.

ondisk_write_lock will wait until there are no more read operations waiting ( readers_waiting ) or read being processed ( readers ). ondisk_read_lock will wait for ongoing writes to finish ( unstable_writes ) but will take the lock even if writers are waiting ( writers_waiting ) therefore taking precedence over write.There can be any number of simultaneous write ( unstable_writes > 1 ) as long as there are no ongoing reads ( readers < 1 ). There can be any number of simultaneous readers ( readers > 1 ) as long as there are no ongoing writes ( unstable_writes < 1 ). ondisk_read_unlock will signal waiting writers if there is no more readers ( !readers ). ondisk_write_unlock will signal waiting readers if there is no more writers ( !unstable_writers ).

blocking and blocked_by

ObjectContext has a blocking and a blocked_by data members. When an operation on an object is made of multiple operations, all of them must be about an object by the same name but can be about different versions. If a variation of the object is degraded, it is blocked by the degraded object and the is added to the list blocked by the degraded object.
Before peering the ReplicatedPG::on_change is called and for each object in the waiting_for_degraded_object list it will loop over the objects it is blocking, remove it from the list and unblock it. The same happens whenever an object has been pushed.

How does AccessMode controls read/write processing in Ceph ?

An operation ( read, write etc. ) may be added to the mode.waiting queue if the ReplicatedPG::AccessMode does not allow of it, yet. For instance, if an operation may_write but AccessMode::try_write finds the current state to be RMW_FLUSHING, it will return false and the operation will be added to mode.waiting. However, if it finds that it is IDLE, it will change to RMW and return true
When handling a write message eval_repop is called at the end to figure out if the operation must be sent to other OSDs in the acting set. If mode.is_rmw_mode(), it will call apply_repop(repop); which will create a transaction to write to the ObjectStore and give it a C_OSD_OpApplied callback which will call ReplicatedPG::op_applied when it completes. ReplicatedPG::op_applied will then call mode.write_applied() and if there is no pending write operation it will set wake = true.
The put_object_context is called immediately after mode.write_applied() to release the ObjectContext and it also checks for mode.wake and will requeue the operations that were previously added to the mode.waiting list because mode.state did not allow them to be processed. The put_object_context method is called in many places in ReplicatedPG, each of them is an opportunity to requeue the operations found in mode.waiting