In my experience maximizing throughput requires running multiple clients. You should be able to at least double if not triple your throughput by running 16 clients - assuming you have the computational capacity on your laptop. You may also want to try the trick mentioned here to see if it gives you a small boost: https://github.com/hyperledger/fabric/wiki/Go-Performance-Portal#morestack
We created a noop system chain code that does nothing but store an invoke transaction. Since it is a system chain code written in go and deployed at startup, it does not require the grpc-docker roundtrip at invoke.
Nevertheless we were only able to push 400 tx/s (few hundred bytes of random tx data each) with no consensus module activate on a single node on a laptop.
Do you have a more promising result or an idea how to reach magnitudes higher rates?