Javaアプリケーションのコンテナに割り当てたCPUリソースの影響

結論：垂直方向（CPUコア数の増減）より、水平方向（コンテナ数の増減）で検討するほうが有意義だと思う

TL;DR

PostgreSQLへ接続するWeb API（Spring Bootアプリケーション）のOCIイメージを、CPUコア数を指定して実行したとき、スループットやレイテンシに影響するかどうかを調べたかった
- GCスレッドやFork/Joinフレームワークなど、CPUコア数に応じて割り当て数の変化する内部要素があるから
- Servletエンジンはスレッドベースの処理モデルなので、性能に直接影響すると思ったから
スループットやレイテンシへの影響は限定的だった
垂直方向（CPUコア数の増減）より、水平方向（コンテナ数の増減）で検討するほうが有意義だと思う

詳細

PostgreSQLへ接続するWeb API（Spring Bootアプリケーション）のOCIイメージ

yujiorama/spring-petclinic-rest
spring-petclinic/spring-petclinic-restのfork
- 依存ライブラリのバージョンを調整 (pom.xml)
  - Spring Boot: 2.4.2 -> 2.5.3
- アプリケーションプロパティを調整 (src/main/resources/application-postgresql.properties)
  - spring.sql.init.schema-locations と spring.sql.init.data-locations を追加
Spring Boot Maven プラグインの build-image ゴールでOCIイメージを作成
コネクションプールサイズは環境変数で指定
スレッドプールサイズは環境変数で指定

以下のコマンドを実行すると、springcommunity/spring-petclinic-rest:latest というOCIイメージを作成する。

mvnw clean spring-boot:build-image

CPUコア数を指定してOCIイメージを実行

Windows 10 の Docker Desktop で実行
Paketo Buildpacksで構築したOCIイメージは、paketo-buildpacks/libjvmが、-XX:ActiveProcessorCountに指定する値を、Goのruntime.NumCPU()で計算する
- JVMの計算式（JDK-8146115）と違う値になるので指定するパラメータをいろいろ確認…
  - cpu_shares - 一致しない（変わらない）
  - cpus - 一致しない（変わらない）
  - cpuset - 一致する（同じ値）
mem_limit は 1024m とする（適当）
- ヒープサイズはpaketo-buildpacks/libjvmが計算して 592m になった（mem_limit の 60% 弱）
コネクションプールサイズは CPUコア数*20 とする（適当）
スレッドプールサイズは CPUコア数*60 とする（適当）

以下のコマンドを実行すると、OCIイメージを実行する。

# CPUコア数1(cpuset: 0,1)
$ docker compose -f analysis/docker-compose.yml up app1
[+] Running 3/3
 - Network analysis_default       Created      0.8s
 - Container analysis_postgres_1  Started      2.1s
 - Container analysis_app1_1      Started      5.5s
Attaching to app1_1
app1_1  | Setting Active Processor Count to 2
app1_1  | Calculated JVM Memory Configuration: -XX:MaxDirectMemorySize=10M -Xmx606261K -XX:MaxMetaspaceSize=124874K -XX:ReservedCodeCacheSize=240M -Xss1M (Total Memory: 1G, Thread Count: 60, Loaded Class Count: 19633, Headroom: 0%)
app1_1  | Adding 129 container CA certificates to JVM truststore
app1_1  | Spring Cloud Bindings Enabled
app1_1  | Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Djava.security.properties=/layers/paketo-buildpacks_bellsoft-liberica/java-security-properties/java-security.properties -agentpath:/layers/paketo-buildpacks_bellsoft-liberica/jvmkill/jvmkill-1.16.0-RELEASE.so=printHeapHistogram=1 -XX:ActiveProcessorCount=2 -XX:MaxDirectMemorySize=10M -Xmx606261K -XX:MaxMetaspaceSize=124874K -XX:ReservedCodeCacheSize=240M -Xss1M -Dorg.springframework.cloud.bindings.boot.enable=true

# CPUコア数2(cpusets: 0,1,2,3)
$ docker compose -f analysis/docker-compose.yml up app2
[+] Running 2/2
 - Container analysis_postgres_1  Started      1.6s
 - Container analysis_app2_1      Started      5.2s
Attaching to app2_1
app2_1  | Calculated JVM Memory Configuration: -XX:MaxDirectMemorySize=10M -Xmx544821K -XX:MaxMetaspaceSize=124874K -XX:ReservedCodeCacheSize=240M -Xss1M (Total Memory: 1G, Thread Count: 120, Loaded Class Count: 19633, Headroom: 0%)
app2_1  | Adding 129 container CA certificates to JVM truststore
app2_1  | Spring Cloud Bindings Enabled
app2_1  | Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Djava.security.properties=/layers/paketo-buildpacks_bellsoft-liberica/java-security-properties/java-security.properties -agentpath:/layers/paketo-buildpacks_bellsoft-liberica/jvmkill/jvmkill-1.16.0-RELEASE.so=printHeapHistogram=1 -XX:ActiveProcessorCount=4 -XX:MaxDirectMemorySize=10M -Xmx544821K -XX:MaxMetaspaceSize=124874K -XX:ReservedCodeCacheSize=240M -Xss1M -Dorg.springframework.cloud.bindings.boot.enable=true

# CPUコア数3(cpusets: 0,1,2,3,4,5)
$ docker compose -f analysis/docker-compose.yml up app3
[+] Running 2/2
 - Container analysis_postgres_1  Started      1.5s
 - Container analysis_app3_1      Started      3.4s
Attaching to app3_1
app3_1  | Calculated JVM Memory Configuration: -XX:MaxDirectMemorySize=10M -Xmx483381K -XX:MaxMetaspaceSize=124874K -XX:ReservedCodeCacheSize=240M -Xss1M (Total Memory: 1G, Thread Count: 180, Loaded Class Count: 19633, Headroom: 0%)
app3_1  | Adding 129 container CA certificates to JVM truststore
app3_1  | Spring Cloud Bindings Enabled
app3_1  | Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Djava.security.properties=/layers/paketo-buildpacks_bellsoft-liberica/java-security-properties/java-security.properties -agentpath:/layers/paketo-buildpacks_bellsoft-liberica/jvmkill/jvmkill-1.16.0-RELEASE.so=printHeapHistogram=1 -XX:ActiveProcessorCount=6 -XX:MaxDirectMemorySize=10M -Xmx483381K -XX:MaxMetaspaceSize=124874K -XX:ReservedCodeCacheSize=240M -Xss1M -Dorg.springframework.cloud.bindings.boot.enable=true

# CPUコア数4(cpusets: 0,1,2,3,4,5,6,7)
$ docker compose -f analysis/docker-compose.yml up app4
[+] Running 2/2
 - Container analysis_postgres_1  Started      1.5s
 - Container analysis_app4_1      Started      3.5s
Attaching to app4_1
app4_1  | Calculated JVM Memory Configuration: -XX:MaxDirectMemorySize=10M -Xmx421941K -XX:MaxMetaspaceSize=124874K -XX:ReservedCodeCacheSize=240M -Xss1M (Total Memory: 1G, Thread Count: 240, Loaded Class Count: 19633, Headroom: 0%)
app4_1  | Adding 129 container CA certificates to JVM truststore
app4_1  | Spring Cloud Bindings Enabled
app4_1  | Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Djava.security.properties=/layers/paketo-buildpacks_bellsoft-liberica/java-security-properties/java-security.properties -agentpath:/layers/paketo-buildpacks_bellsoft-liberica/jvmkill/jvmkill-1.16.0-RELEASE.so=printHeapHistogram=1 -XX:ActiveProcessorCount=8 -XX:MaxDirectMemorySize=10M -Xmx421941K -XX:MaxMetaspaceSize=124874K -XX:ReservedCodeCacheSize=240M -Xss1M -Dorg.springframework.cloud.bindings.boot.enable=true

$ docker compose down -v --remove-orphans
[+] Running 6/6
 - Container analysis_app1_1      Removed      0.2s
 - Container analysis_app2_1      Removed      0.2s
 - Container analysis_app4_1      Removed      0.2s
 - Container analysis_app3_1      Removed      0.2s
 - Container analysis_postgres_1  Removed      1.1s
 - Network analysis_default       Removed

スループットやレイテンシに影響するかどうか

tsenart/vegetaで負荷をかけた結果を分析
- CPUコア数は1から4まで
- リクエストレートは100rps刻みで最小100rpsから最大1000rpsまで
以下の処理をCPUコア数とリクエストレートの組み合わせごとに繰り返し
- 既存のコンテナとリソースを全て破棄
- PostgreSQLのコンテナを起動
- アプリケーションコンテナを起動
  - SQL スクリプトでデータベーススキーマを初期化し、初期データを登録
  - ログ出力を監視して起動が完了するのを確認
- 暖機運転のつもりで負荷をかける
  - 継続時間: 120秒
  - リクエストレート: 10rps
  - 以下の順にHTTPリクエストを送信
    - /petclinic/api/vets
    - /petclinic/api/specialties
    - /petclinic/api/pettypes
    - /petclinic/api/owners
    - /petclinic/api/pets
    - /petclinic/api/visits
  - 結果は /dev/null へ捨てる
- 計測用の負荷をかける
  - 継続時間: 30秒
  - 以下の順にHTTPリクエストを送信
    - /petclinic/api/vets
    - /petclinic/api/specialties
    - /petclinic/api/pettypes
    - /petclinic/api/owners
    - /petclinic/api/pets
    - /petclinic/api/visits
  - 結果はファイルへ保存

以下のコマンドを実行すると、起動したアプリケーションに負荷をかけて結果をファイルに保存していく。

$ cd analysis
$ ./exp.sh
$ ls *.bin
cpus-1-r-100.bin
cpus-1-r-1000.bin
cpus-1-r-200.bin
cpus-1-r-300.bin
cpus-1-r-400.bin
cpus-1-r-500.bin
cpus-1-r-600.bin
cpus-1-r-700.bin
cpus-1-r-800.bin
cpus-1-r-900.bin
cpus-2-r-100.bin
cpus-2-r-1000.bin
cpus-2-r-200.bin
cpus-2-r-300.bin
cpus-2-r-400.bin
cpus-2-r-500.bin
cpus-2-r-600.bin
cpus-2-r-700.bin
cpus-2-r-800.bin
cpus-2-r-900.bin
cpus-3-r-100.bin
cpus-3-r-1000.bin
cpus-3-r-200.bin
cpus-3-r-300.bin
cpus-3-r-400.bin
cpus-3-r-500.bin
cpus-3-r-600.bin
cpus-3-r-700.bin
cpus-3-r-800.bin
cpus-3-r-900.bin
cpus-4-r-100.bin
cpus-4-r-1000.bin
cpus-4-r-200.bin
cpus-4-r-300.bin
cpus-4-r-400.bin
cpus-4-r-500.bin
cpus-4-r-600.bin
cpus-4-r-700.bin
cpus-4-r-800.bin
cpus-4-r-900.bin

以下のコマンドを実行すると、結果ファイルを整形できる。

$ vegeta report -type=json cpus-1-r-100.bin | jq -r .
{
  "latencies": {
    "total": 30420977900,
    "mean": 10140325,
    "50th": 7314004,
    "95th": 25419793,
    "99th": 50560487,
    "max": 210357100
  },
  "bytes_in": {
    "total": 3924500,
    "mean": 1308.1666666666667
  },
  "bytes_out": {
    "total": 0,
    "mean": 0
  },
  "earliest": "2021-08-23T09:04:16.3703487+09:00",
  "latest": "2021-08-23T09:04:46.3560457+09:00",
  "end": "2021-08-23T09:04:46.3615908+09:00",
  "duration": 29985697000,
  "wait": 5545100,
  "requests": 3000,
  "rate": 100.04769940815449,
  "throughput": 100.02920152480114,
  "success": 1,
  "status_codes": {
    "200": 3000
  },
  "errors": []
}

スループットやレイテンシへの影響

次のコマンドを実行すると、json 形式のデータをシェルスクリプトへ置き換えて桁数指定した数値を出力する。

throughput はそのまま。小数点以下は3桁にする
latencies はナノ秒なので、1e6=10^6 で割り算してミリ秒へ変換。小数点以下は3桁にする
- 雑な計測なので 50%tile を取る
success はそのまま

for c in $(seq 1 1 4); do
for r in $(seq 100 100 1000); do
vegeta report -type=json cpus-${c}-r-${r}.bin | jq -r --arg c $c --arg r $r '["printf", "%d %d %.3f %.3f %.2f\\n", $c, $r, .throughput, .latencies."50th"/1e6, .success] | @sh'
done
done | cat - | bash
1 100 100.029 7.314 1.00
1 200 200.062 7.517 1.00
1 300 295.395 1240.134 1.00
1 400 385.287 1021.586 1.00
1 500 315.498 14293.348 1.00
1 600 183.968 16563.441 0.92
1 700 164.379 21230.119 0.70
1 800 193.639 20411.846 0.63
1 900 197.498 18327.375 0.56
1 1000 209.119 16120.388 0.52
...

整形したのが以下。

boot-apps-performance-by-container-cpus

具体的な計測環境

コンポーネント	環境1	環境 2
ホストOS	Microsoft Windows 21H1 (19043.1165)	Microsoft Windows 21H1 (19043.1165)
CPU	i7-8565U	i7-8650U
Docker Desktop	3.6.0(67351)	3.6.0(67351)
Engine	20.10.8	20.10.8
メモリ	8 GB	4 GB
ストレージ	40 GB	20 GB