pull in changes from https://github.com/googleapis/gapic-generator-python/pull/1755 and https://github.com/googleapis/python-api-core/pull/527 to improve rpc performance. I measured this change as giving a 6% throughput boost to the point read benchmark in a prod environment