気の向くままに辿るIT/ICT/IoT

ハードウェア

AMD Ryzen AIチュートリアルTorchVision Inference

AMD Ryzen AIソフトウェアって？

AMD Ryzen AIチュートリアルTorchVision Inference

2025/04/04

　AMD Ryzen AIソフトウェアとNPUドライバをインストールした自身初購入のミニパソコンRyzen 7 8845HS搭載AOOSTAR GEM12 Pro MAX/Windows 11 Proにamd / RyzenAI-SWをgit cloneしてAI学習するTorchVision Inference編。

torchvision_inferenceの概要
torchvision_inferenceを実行する前に
Ryzen AI SoftwareチュートリアルTorchVision Inferenceの実行
終わったらconda deactivateを忘れずに

torchvision_inferenceの概要

　詳細は、torchvision_inferenceルート直下のREADME.md。

　torchvision_inferenceは、TorchとTorchVisionによる膨大な画像データセットを参考に提示された画像が何かを推論させるというチュートリアル。

torchvision_inferenceを実行する前に

　当初、torchvision_inferenceで仮想環境をアクティベート、requirement.txtをpip install後、予め、Hugging Faceから必要だとされるデータをダウンロード、calib_data/以下に展開、classification.pyを実行したところ、「データが分類されてないじゃんかよ！こんなん使えるか！」と言われて、「そのファイル名からして、あんたが分類してくれるんじゃないんかい！」と言ってみたところで仕方もなく、あえなく失敗。

　まだ、advanced quark quantizeもyolov8も手を付ける前で、一瞬でも手作業で頭をよぎったことすら恥ずかしくなるほどの膨大な画像データ数、それこそAIの出番でしょと思ったものの、深追いするとチュートリアルすら終わらないので他の2つをやってみることに...したら、功を奏す結果に。

　で、yolov8については、成功事例もあるも時間的にはそうでもないながら、仕様的には、かなり古いらしく、Ryzen AI Software 1.0/1.0.1/1.1/1.2/1.3/1.4と1年程度で更新されている模様でその間にあったものがなくなっていたり、名称が変わっていたり、配置場所や確認方法も異なり、検索にヒットする公式ドキュメントがリンク切れ、しかも1つや2つじゃなかったりと根が深そうなので後回しにすることに。

　というわけでquark quantizationは終わっていて、そもそも、難なくいけそうなadvanced quark quantizeに着手。

　2つのモデルの内、1つは一通りできたところで、先のHuggin Faceからダウンロード・展開したファイル群をval_data/に展開、これを元データとして量子化、校正の過程で分類もしてくれた結果をcalib_data/以下に...、え？これって、もしや、torchvision_inferenceの元データにしたら、classification.py通るんじゃ？と思ってやってみたら、ドンピシャ、あっさり完了。

　そもそも、そういう想定のチュートリアルでありつつ、READMEには書かず内緒にしていたのか、torchvision_inferenceのREADMEの手落ちでホントは、advanced quark quantizeの成果物を使うなりして加工しておいてねという説明が欠落していたのか。

　というわけで今日時点、torchvision_inference用にHugging Faceからダウンロードしたデータは、そのままではなく、分類が必要という罠が仕掛けてあるよという話。

　ちなみに今回、torchvision_inference用としてダウンロードしていたHugging Faceデータをadvanced quark quantize(quark quantizationフォルダ)に移動、結果できたcalib_data/をtorchvision_inferenceフォルダにコピーするに至ったのは言うまでもありません。

　あと、Ryzen AI Softwareインストールパスにあるというvaip_config.json、トップを見てもないじゃんとちゃんと探すことなく、yolov8フォルダ内にあったのをtorchvision_inferenceにコピーしていたのですが、[C:\ProgramFiles\RyzenAI\1.4.0\voe-4.0-win.amd64]内で発見してみるとサイズがちょっと違ったので改めて当該フォルダ内のものをコピーしてきました。

Ryzen AI SoftwareチュートリアルTorchVision Inferenceの実行

path\to\torchvision_inference> conda activate ryzen-ai-1.4.0

(ryzen-ai-1.4.0) path\to\torchvision_inference>

　お約束としてquicktest時と同様に仮想環境を作って、仮想環境を有効にしてから始めます。

　ryzen-ai-1.4.0は、Ryzen AIソフトウェアのインストール中に指定時、デフォルトのconda仮想環境名、もしくは、それをベースに変更したconda仮想環境名です。

(ryzen-ai-1.4.0) path\to\torchvision_inference> pip install -r requirements.txt

　続いて当該プロジェクトで必要なパッケージのインストール。

(ryzen-ai-1.4.0) path\to\torchvision_inference> cp C:\ProgramFiles\RyzenAI\1.4.0\voe-4.0-win.amd64\vaip_config.json .

(ryzen-ai-1.4.0) path\to\torchvision_inference>

　vaip_config.jsonを[C:\ProgramFiles\RyzenAI\1.4.0\voe-4.0-win.amd64]からコピーしておきます。

　そして、READMEに従うなら、次のようにHugging Faceから指定データをダウンロード...といきたいところですが、冒頭述べたとおり、指定データをダウンロード後、当該データそのままだとエラーになるので、まず、quark_quantizationのadvanced_quark_quantize.pyで分類・校正した後のcalib_data/等を用意します。

　前述の通り、自身は、advanced_quark_quantizeを試しつつ、前後してtorchvision_inferenceにもチャレンジしていて同じ圧縮ファイルをダウンロード済みだったので、これをquark_quantizationに移動してadvanced_quark_quantize.pyの元データとして利用、できたcalib_data/を、このtorchvision_inferenceにコピー、元データとしたら、続く手順のjupyter notebook classification.ipynbやclassification.pyがあっさり通り、一挙解決した次第。

　次にHugging FaceのILSVRC/imagenet-1kからval_images.tar.gzファイルをダウンロードします。

　尚、今日時点では、この時、Hugging Faceへの会員登録とログイン後に表示される規約をAccept(許諾)する必要がありました。

　会員登録については、当該ページの少し下に[Log in] or [Sign up]というボタンがあり、登録なら後者をクリックします。

　登録後、同じ位置あたりに規約と[Accept]といったボタンがあると思うのでクリックしてから、その下にあるリストからダウンロードします。

(ryzen-ai-1.4.0) path\to\torchvision_inference> mkdir val_data && tar -xzf val_images.tar.gz -C val_data

(ryzen-ai-1.4.0) path\to\torchvision_inference> python prepare_data.py val_data calib_data

　プロジェクトルートのtorchvision_inference上でHugging Faceからダウンロードしたval_images.tar.gzをこのようにval_dataフォルダに展開し、prepare_data.pyを実行、val_dataを処理した結果をcalib_dataというフォルダに保存します。

(ryzen-ai-1.4.0) path\to\torchvision_inference>python classification.py

[QUARK-INFO]: Checking custom ops library ...

[QUARK-WARNING]: The custom ops library C:\Users\reg\anaconda3\envs\ryzen-ai-1.4.0\lib\site-packages\quark\onnx\operators\custom_ops\lib\custom_ops.dll does NOT exist.

[QUARK-INFO]: Start compiling CPU version of custom ops library.

[QUARK-INFO]: CPU version of custom ops library compiled successfully.

[QUARK-INFO]: removing file: [WinError 5] アクセスが拒否されました。: 'C:\\Users\\reg\\anaconda3\\envs\\ryzen-ai-1.4.0\\lib\\site-packages\\quark\\onnx\\operators\\custom_ops\\build_cpu\\libcustom_ops.pyd'

[QUARK-INFO]: Checked custom ops library.

Model exported to ONNX at: models\resnet50.onnx

[QUARK_INFO]: Time information:

2025-04-09 10:54:25.793720

[QUARK_INFO]: OS and CPU information:

system --- Windows

node --- gem12promax

release --- 10

version --- 10.0.26100

machine --- AMD64

processor --- AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD

[QUARK_INFO]: Tools version information:

python --- 3.10.0

onnx --- 1.16.1

onnxruntime --- 1.20.1

quark.onnx --- 0.8+2fc870b

[QUARK_INFO]: Quantized Configuration information:

model_input --- models\resnet50.onnx

model_output --- models\resnet50_quantized.onnx

calibration_data_reader --- <utils_custom.ImageDataReader object at 0x0000018E01DD91E0>

calibration_data_path --- None

quant_format --- QDQ

input_nodes --- []

output_nodes --- []

op_types_to_quantize --- []

extra_op_types_to_quantize --- []

per_channel --- False

reduce_range --- False

activation_type --- QUInt8

weight_type --- QInt8

nodes_to_quantize --- []

nodes_to_exclude --- []

subgraphs_to_exclude --- []

optimize_model --- True

use_external_data_format --- False

calibrate_method --- PowerOfTwoMethod.MinMSE

execution_providers --- ['CPUExecutionProvider']

enable_npu_cnn --- True

enable_npu_transformer --- False

specific_tensor_precision --- False

debug_mode --- False

convert_fp16_to_fp32 --- False

convert_nchw_to_nhwc --- False

include_cle --- False

include_sq --- False

include_rotation --- False

include_fast_ft --- False

extra_options --- {'ActivationSymmetric': True}

[QUARK-INFO]: The input ONNX model models\resnet50.onnx can create InferenceSession successfully

[QUARK-INFO]: Obtained calibration data with 60 iters

[QUARK-INFO]: Removed initializers from input

[QUARK-INFO]: Simplified model sucessfully

[QUARK-INFO]: Duplicate the shared initializers in the model for separate quantization use across different nodes!

[QUARK-INFO]: Loading model...

[QUARK-INFO]: The input ONNX model C:/Users/reg/AppData/Local/Temp/vai.cpinit.i1tqjeu3/model_cpinit.onnx can run inference successfully

[QUARK-INFO]: optimize the model for better hardware compatibility.

[QUARK-WARNING]: The opset version is 13 < 17. Skipping fusing layer normalization.

[QUARK-WARNING]: The opset version is 13 < 20. Skipping fusing Gelu.

[QUARK-INFO]: Start calibration...

[QUARK-INFO]: Start collecting data, runtime depends on your model size and the number of calibration dataset.

[QUARK-INFO]: Finding optimal threshold for each tensor using PowerOfTwoMethod.MinMSE algorithm ...

[QUARK-INFO]: Use all calibration data to calculate min mse

Computing range: 100%|█████████████████████████████████████████████████| 123/123 [18:13<00:00, 8.89s/tensor]

[QUARK-INFO]: Finished the calibration of PowerOfTwoMethod.MinMSE which costs 1230.4s

[QUARK-INFO]: Remove QuantizeLinear & DequantizeLinear on certain operations(such as conv-relu).

[QUARK-INFO]: Rescale GlobalAveragePool /avgpool/GlobalAveragePool with factor 1.0048828125 to simulate DPU behavior.

[QUARK-INFO]: Adjust the quantize info to meet the compiler constraints

[QUARK-INFO]: Input pos of pooling layer /avgpool/GlobalAveragePool is 1. Output pos of pooling layer /avgpool/GlobalAveragePool is 4.Modify opos from 4 to 1.

[QUARK-INFO]: Adjust the quantize info to meet the compiler constraints

The operation types and their corresponding quantities of the input float model is shown in the table below.

┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓

┃ Op Type ┃ Float Model ┃

┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩

│ Conv │ 53 │

│ Relu │ 49 │

│ MaxPool │ 1 │

│ Add │ 16 │

│ GlobalAveragePool │ 1 │

│ Flatten │ 1 │

│ Gemm │ 1 │

├──────────────────────┼────────────────────────────────┤

│ Quantized model path │ models\resnet50_quantized.onnx │

└──────────────────────┴────────────────────────────────┘

The quantized information for all operation types is shown in the table below.

The discrepancy between the operation types in the quantized model and the float model is due to the application of graph optimization.

┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓

┃ Op Type ┃ Activation ┃ Weights ┃ Bias ┃

┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩

│ Conv │ UINT8(53) │ INT8(53) │ INT8(53) │

│ MaxPool │ UINT8(1) │ │ │

│ Add │ UINT8(16) │ │ │

│ GlobalAveragePool │ UINT8(1) │ │ │

│ Flatten │ UINT8(1) │ │ │

│ Gemm │ UINT8(1) │ INT8(1) │ INT8(1) │

└───────────────────┴────────────┴──────────┴──────────┘

Quark Quantized model saved at: models\resnet50_quantized.onnx

Image size: (224, 224)

----------------------------------------

Final top prediction is: Golden Retriever

----------------------------------------

Inference time: 55.22 ms

----------------------------------------

------------ Top 5 labels are: ----------------------------

['Golden Retriever' 'Labrador Retriever' 'Curly-coated Retriever'

'Norwich Terrier' 'Standard Poodle']

-----------------------------------------------------------

----------------------------------------

Final top prediction is: Golden Retriever

----------------------------------------

Inference time: 581.93 ms

----------------------------------------

------------ Top 5 labels are: ----------------------------

['Golden Retriever' 'Labrador Retriever' 'Curly-coated Retriever'

'Norwich Terrier' 'Standard Poodle']

-----------------------------------------------------------

APU Type: PHX/HPT

Setting environment for PHX/HPT

XLNX_VART_FIRMWARE= C:\Program Files\RyzenAI\1.4.0\voe-4.0-win_amd64\xclbins\phoenix\1x4.xclbin

NUM_OF_DPU_RUNNERS= 1

XLNX_TARGET_NAME= AMD_AIE2_Nx4_Overlay

WARNING: Logging before InitGoogleLogging() is written to STDERR

I20250409 11:15:14.207437 18696 vitisai_compile_model.cpp:1144] Vitis AI EP Load ONNX Model Success

I20250409 11:15:14.208303 18696 vitisai_compile_model.cpp:1145] Graph Input Node Name/Shape (1)

I20250409 11:15:14.208303 18696 vitisai_compile_model.cpp:1149] input : [-1x3x224x224]

I20250409 11:15:14.208303 18696 vitisai_compile_model.cpp:1155] Graph Output Node Name/Shape (1)

I20250409 11:15:14.208303 18696 vitisai_compile_model.cpp:1159] output : [-1x1000]

[Vitis AI EP] No. of Operators : CPU 2 NPU 393

[Vitis AI EP] No. of Subgraphs : NPU 1 Actually running on NPU 1

2025-04-09 11:15:37.6553798 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.

2025-04-09 11:15:37.6626815 [W:onnxruntime:, session_state.cc:1170 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

----------------------------------------

Final top prediction is: Golden Retriever

----------------------------------------

Inference time: 12.96 ms

----------------------------------------

------------ Top 5 labels are: ----------------------------

['Golden Retriever' 'Labrador Retriever' 'Norwich Terrier'

'Curly-coated Retriever' 'Flat-Coated Retriever']

-----------------------------------------------------------

(ryzen-ai-1.4.0) path\to\torchvision_inference>

　こちらは、PowerShell上でpython classification.pyを実行した結果。

　CPU/iGPU/NPU、それぞれの推論時間は、NPUが圧倒、次いでCPU、そしてiGPU。

(CPU) Inference time: 55.22 ms

(iGPU) Inference time: 581.93 ms

(NPU) Inference time: 12.96 ms

　PowerShell上で~~jupyterをpip install後~~(jupyterは、Ryzen AIソフトウェアインストール時に指定した系統のconda仮想環境なら既にあると思われ)、jupiter notebook classification.ipynbを実行した結果をHTML出力した全画面スクショ。

　.pyに続く2度めにあたる.ipynbの実行では、CPU/iGPU/NPU、それぞれの推論時間は、次のようになっており、何れも改善、CPUとNPUは僅かながら速くなっており、iGPUのパフォーマンスはかなり改善されつつも、順位は変わらずNPUが圧倒、次いでCPU、そしてiGPU。

(CPU) Inference time: 49.15 ms

(iGPU) Inference time: 91.21 ms

(NPU) Inference time: 11.51 ms

　ちゃんとゴールデンレトリーバだとわかってくれてますね。

　data/フォルダにあるサンプル画像のワンちゃん、天パっぽい毛の僅かな巻き具合から、プードルの可能性も捨てきれなかったのかな？

　さておき、ケースにもよるでしょうが、1枚の画像からの推論については、何れも同じ結論に至りつつ、NPUは、かなり優秀ですね。

終わったらconda deactivateを忘れずに

(ryzen-ai-1.4.0) path\to\torchvision_inference> conda deactivate

path\to\torchvision_inference>

　condaの仮想環境での作業が一通り終わったら、conda deactivate(activateの反対語deactivate)しましょう。

　これを忘れると、たいてい痛い目にあうので要注意。

conda info -e

# conda environments

base path\to\anaconda3

ryzen-ai-1.4.0 path\to\anaconda3\envs\ryzen-ai-1.4.0

　conda deactivateしたら、都度、conda info -eで[*]のついた仮想環境がないことを確認しておくと安心です。

　なぜなら、一見、作業ディレクトリパス先頭から(ryzen-ai-1.4.0)といった仮想環境の表記が消えてもconda info -eで確認すると、まだ[*]が付いているなんてことがあったり、なぜか、代わりに(base)という仮想環境に入ってしまっていたり、知らずにAnacondaコマンドプロンプトを閉じて戻っても同様だったりすることもあったりするので。

　仮想環境baseになってしまう場合、ディレクトリパス先頭に表示されるとは言え、本来の仮想環境については、deactivateコマンドを実行した確信が手伝うのか、意外と気づかないことも多く...。

　何れにせよ、condaの仮想環境にいることを、知らずに、そのまま作業し続けると何かとハマりやすいので要注意。

2025/04/09 訂正・追記

(ryzen-ai-1.4.0) path\to\torchvision_inference> python -m virtualenv .venv

(ryzen-ai-1.4.0) path\to\torchvision_inference> .\.venv\Scripts\activate

　まずは、quicktest時と同様に仮想環境を作って、仮想環境を有効にしてから始めます。

　ここでは、condaではなく、pipを使いますが、pip install virtualenvは既に済んでいるものとします。

(ryzen-ai-1.4.0) path\to\torchvision_inference> pip install -r requirements.txt

　続いて当該プロジェクトで必要なパッケージのインストール。

　次にHugging FaceのILSVRC/imagenet-1kからval_images.tar.gzファイルをダウンロードします。

　尚、今日時点では、この時、Hugging Faceへの会員登録とログイン後に表示される規約をAccept(許諾)する必要がありました。

　会員登録については、当該ページの少し下に[Log in] or [Sign up]というボタンがあり、登録なら後者をクリックします。

　登録後、同じ位置あたりに規約と[Accept]といったボタンがあると思うのでクリックしてから、その下にあるリストからダウンロードします。

(ryzen-ai-1.4.0) path\to\torchvision_inference> mkdir val_data && tar -xzf val_images.tar.gz -C val_data

(ryzen-ai-1.4.0) path\to\torchvision_inference> python prepare_data.py val_data calib_data

(venv) PS path\to\torchvision_inference> python .\classification.py

[QUARK-INFO]: Checking custom ops library ...

[QUARK-WARNING]: The custom ops library path\to\torchvision_inference\venv\Lib\site-packages\quark\onnx\operators\custom_ops\lib\custom_ops.dll does NOT exist.

[QUARK-INFO]: Start compiling CPU version of custom ops library.

path\to\torchvision_inference\venv\Lib\site-packages\torch\utils\cpp_extension.py:414: UserWarning: Error checking compiler version for cl: [WinError 2] 指定されたファイルが見つかりません。

warnings.warn(f'Error checking compiler version for {compiler}: {error}')

INFO: Could not find files for the given pattern(s).

[QUARK-ERROR]: CPU version of custom ops library compilation failed:Command '['where', 'cl']' returned non-zero exit status 1.

[QUARK-WARNING]: Custom ops library compilation failed: CPU version of custom ops library compilation failed:Command '['where', 'cl']' returned non-zero exit status 1..

[QUARK-INFO]: Checked custom ops library.

Model exported to ONNX at: models\resnet50.onnx

[QUARK_INFO]: Time information:

2025-04-06 12:53:27.317460

[QUARK_INFO]: OS and CPU information:

system --- Windows

node --- gem12promax

release --- 11

version --- 10.0.26100

machine --- AMD64

processor --- AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD

[QUARK_INFO]: Tools version information:

python --- 3.12.0

onnx --- 1.17.0

onnxruntime --- 1.20.1

quark.onnx --- 0.8+103c340fe2

[QUARK_INFO]: Quantized Configuration information:

model_input --- models\resnet50.onnx

model_output --- models\resnet50_quantized.onnx

calibration_data_reader --- <utils_custom.ImageDataReader object at 0x000001D0687E8410>

calibration_data_path --- None

quant_format --- QDQ

input_nodes --- []

output_nodes --- []

op_types_to_quantize --- []

extra_op_types_to_quantize --- []

per_channel --- False

reduce_range --- False

activation_type --- QUInt8

weight_type --- QInt8

nodes_to_quantize --- []

nodes_to_exclude --- []

subgraphs_to_exclude --- []

optimize_model --- True

use_external_data_format --- False

calibrate_method --- PowerOfTwoMethod.MinMSE

execution_providers --- ['CPUExecutionProvider']

enable_npu_cnn --- True

enable_npu_transformer --- False

specific_tensor_precision --- False

debug_mode --- False

convert_fp16_to_fp32 --- False

convert_nchw_to_nhwc --- False

include_cle --- False

include_sq --- False

include_rotation --- False

include_fast_ft --- False

extra_options --- {'ActivationSymmetric': True}

[QUARK-INFO]: The input ONNX model models\resnet50.onnx can create InferenceSession successfully

[QUARK-INFO]: Obtained calibration data with 60 iters

[QUARK-INFO]: Removed initializers from input

[QUARK-INFO]: Simplified model sucessfully

[QUARK-INFO]: Duplicate the shared initializers in the model for separate quantization use across different nodes!

[QUARK-INFO]: Loading model...

[QUARK-INFO]: The input ONNX model C:/Users/reg/AppData/Local/Temp/vai.cpinit.yxju0wi1/model_cpinit.onnx can run inference successfully

[QUARK-INFO]: optimize the model for better hardware compatibility.

[QUARK-WARNING]: The opset version is 13 < 17. Skipping fusing layer normalization.

[QUARK-WARNING]: The opset version is 13 < 20. Skipping fusing Gelu.

[QUARK-INFO]: Start calibration...

[QUARK-INFO]: Start collecting data, runtime depends on your model size and the number of calibration dataset.

[QUARK-INFO]: Finding optimal threshold for each tensor using PowerOfTwoMethod.MinMSE algorithm ...

[QUARK-INFO]: Use all calibration data to calculate min mse

Computing range: 56%|█████████████████████████████████▋ | 69/123 [09:00<02:58, 3.30Computing range: 57%|██████████████████████████████████▏ | 70/123 [09:03<02:51, 3.24Computing range: 58%|██████████████████████████████████▋ | 71/123 [09:09<03:31, 4.06Computing range: 59%|███████████████████████████████████ | 72/123 [09:16<03:58, 4.67Computing range: 59%|███████████████████████████████████▌ | 73/123 [09:16<02:54, 3.50Computing range: 60%|████████████████████████████████████ | 74/123 [09:43<08:38, 10.58Computing range: 61%|████████████████████████████████████▌ | 75/123 [10:11<12:29, 15.61Computing range: 62%|█████████████████████████████████████ | 76/123 [10:12<08:45, 11.17Computing range: 63%|█████████████████████████████████████▌ | 77/123 [10:23<08:44, 11.40Computing range: 63%|██████████████████████████████████████ | 78/123 [10:27<06:40, 8.90Computing range: 64%|██████████████████████████████████████▌ | 79/123 [10:30<05:14, 7.16Computing range: 66%|███████████████████████████████████████▌ | 81/123 [10:31<02:57, 4.23Computing range: 67%|████████████████████████████████████████ | 82/123 [10:35<02:44, 4.01Computing range: 67%|████████████████████████████████████████▍ | 83/123 [10:36<02:08, 3.20Computing range: 68%|████████████████████████████████████████▉ | 84/123 [10:37<01:47, 2.77Computing range: 69%|█████████████████████████████████████████▍ | 85/123 [10:43<02:21, 3.73Computing range: 70%|█████████████████████████████████████████▉ | 86/123 [10:49<02:40, 4.35Computing range: 71%|██████████████████████████████████████████▍ | 87/123 [10:51<02:07, 3.53Computing range: 72%|██████████████████████████████████████████▉ | 88/123 [10:52<01:42, 2.94Computing range: 72%|███████████████████████████████████████████▍ | 89/123 [10:55<01:40, 2.95Computing range: 73%|███████████████████████████████████████████▉ | 90/123 [10:57<01:23, 2.53Computing range: 74%|████████████████████████████████████████████▍ | 91/123 [11:00<01:24, 2.65Computing range: 75%|████████████████████████████████████████████▉ | 92/123 [11:12<02:52, 5.55Computing range: 76%|█████████████████████████████████████████████▎ | 93/123 [11:17<02:37, 5.26Computing range: 76%|█████████████████████████████████████████████▊ | 94/123 [11:23<02:37, 5.44Computing range: 77%|██████████████████████████████████████████████▎ | 95/123 [11:29<02:37, 5.61Computing range: 78%|██████████████████████████████████████████████▊ | 96/123 [11:43<03:44, 8.33Computing range: 79%|███████████████████████████████████████████████▎ | 97/123 [11:56<04:08, 9.56Computing range: 80%|███████████████████████████████████████████████▊ | 98/123 [12:02<03:31, 8.46Computing range: 80%|████████████████████████████████████████████████▎ | 99/123 [12:14<03:51, 9.65Computing range: 81%|███████████████████████████████████████████████▉ | 100/123 [12:20<03:16, 8.53Computing range: 82%|████████████████████████████████████████████████▍ | 101/123 [12:21<02:21, 6.41Computing range: 83%|████████████████████████████████████████████████▉ | 102/123 [12:27<02:10, 6.23Computing range: 84%|█████████████████████████████████████████████████▍ | 103/123 [12:30<01:45, 5.25Computing range: 85%|█████████████████████████████████████████████████▉ | 104/123 [12:33<01:26, 4.56Computing range: 86%|██████████████████████████████████████████████████▊ | 106/123 [12:39<01:04, 3.80Computing range: 87%|███████████████████████████████████████████████████▎ | 107/123 [12:45<01:11, 4.44Computing range: 88%|███████████████████████████████████████████████████▊ | 108/123 [12:47<00:55, 3.67Computing range: 89%|████████████████████████████████████████████████████▎ | 109/123 [13:16<02:29, 10.70Computing range: 89%|████████████████████████████████████████████████████▊ | 110/123 [13:17<01:43, 7.96Computing range: 90%|█████████████████████████████████████████████████████▏ | 111/123 [13:20<01:18, 6.54Computing range: 91%|█████████████████████████████████████████████████████▋ | 112/123 [13:23<01:00, 5.48Computing range: 92%|██████████████████████████████████████████████████████▏ | 113/123 [13:48<01:52, 11.25Computing range: 93%|██████████████████████████████████████████████████████▋ | 114/123 [13:51<01:19, 8.84Computing range: 93%|███████████████████████████████████████████████████████▏ | 115/123 [13:54<00:56, 7.11Computing range: 94%|███████████████████████████████████████████████████████▋ | 116/123 [14:00<00:47, 6.80Computing range: 95%|████████████████████████████████████████████████████████ | 117/123 [14:06<00:39, 6.58Computing range: 96%|████████████████████████████████████████████████████████▌ | 118/123 [14:08<00:25, 5.08Computing range: 97%|█████████████████████████████████████████████████████████ | 119/123 [14:34<00:45, 11.45Computing range: 98%|█████████████████████████████████████████████████████████▌ | 120/123 [14:36<00:25, 8.50Computing range: 98%|██████████████████████████████████████████████████████████ | 121/123 [14:42<00:15, 7.81Computing range: 99%|██████████████████████████████████████████████████████████▌| 122/123 [14:48<00:07, 7.35Computing range: 100%|███████████████████████████████████████████████████████████| 123/123 [15:16<00:00, 13.37Computing range: 100%|███████████████████████████████████████████████████████████| 123/123 [15:16<00:00, 7.45s/tensor]

[QUARK-INFO]: Finished the calibration of PowerOfTwoMethod.MinMSE which costs 1046.7s

[QUARK-INFO]: Remove QuantizeLinear & DequantizeLinear on certain operations(such as conv-relu).

[QUARK-INFO]: Rescale GlobalAveragePool /avgpool/GlobalAveragePool with factor 1.0048828125 to simulate DPU behavior.

[QUARK-INFO]: Adjust the quantize info to meet the compiler constraints

[QUARK-INFO]: Input pos of pooling layer /avgpool/GlobalAveragePool is 1. Output pos of pooling layer /avgpool/GlobalAveragePool is 4.Modify opos from 4 to 1.

[QUARK-INFO]: Adjust the quantize info to meet the compiler constraints

The operation types and their corresponding quantities of the input float model is shown in the table below.

┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓

┃ Op Type ┃ Float Model ┃

┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩

│ Conv │ 53 │

│ Relu │ 49 │

│ MaxPool │ 1 │

│ Add │ 16 │

│ GlobalAveragePool │ 1 │

│ Flatten │ 1 │

│ Gemm │ 1 │

├──────────────────────┼────────────────────────────────┤

│ Quantized model path │ models\resnet50_quantized.onnx │

└──────────────────────┴────────────────────────────────┘

The quantized information for all operation types is shown in the table below.

The discrepancy between the operation types in the quantized model and the float model is due to the application of graph optimization.

┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓

┃ Op Type ┃ Activation ┃ Weights ┃ Bias ┃

┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩

│ Conv │ UINT8(53) │ INT8(53) │ INT8(53) │

│ MaxPool │ UINT8(1) │ │ │

│ Add │ UINT8(16) │ │ │

│ GlobalAveragePool │ UINT8(1) │ │ │

│ Flatten │ UINT8(1) │ │ │

│ Gemm │ UINT8(1) │ INT8(1) │ INT8(1) │

└───────────────────┴────────────┴──────────┴──────────┘

Quark Quantized model saved at: models\resnet50_quantized.onnx

Image size: (224, 224)

----------------------------------------

Final top prediction is: Golden Retriever

----------------------------------------

Inference time: 42.2 ms

----------------------------------------

------------ Top 5 labels are: ----------------------------

['Golden Retriever' 'Labrador Retriever' 'Curly-coated Retriever'

'Norwich Terrier' 'Standard Poodle']

-----------------------------------------------------------

----------------------------------------

Final top prediction is: Golden Retriever

----------------------------------------

Inference time: 37.03 ms

----------------------------------------

------------ Top 5 labels are: ----------------------------

['Golden Retriever' 'Labrador Retriever' 'Curly-coated Retriever'

'Norwich Terrier' 'Standard Poodle']

-----------------------------------------------------------

APU Type: PHX/HPT

Setting environment for PHX/HPT

XLNX_VART_FIRMWARE= C:\Program Files\RyzenAI\1.4.0\voe-4.0-win_amd64\xclbins\phoenix\1x4.xclbin

NUM_OF_DPU_RUNNERS= 1

XLNX_TARGET_NAME= AMD_AIE2_Nx4_Overlay

----------------------------------------

Final top prediction is: Golden Retriever

----------------------------------------

Inference time: 38.0 ms

----------------------------------------

------------ Top 5 labels are: ----------------------------

['Golden Retriever' 'Labrador Retriever' 'Curly-coated Retriever'

'Norwich Terrier' 'Standard Poodle']

-----------------------------------------------------------

(venv) PS path\to\torchvision_inference>

　こちらは、PowerShell上でpython classification.pyを実行した結果。

　ちゃんとゴールデンレトリーバだとわかってくれてますね。

　data/フォルダにあるサンプル画像のワンちゃん、天パっぽい毛の僅かな巻き具合から、プードルの可能性も捨てきれなかったのかな？

AMD Ryzen AIチュートリアルTorchVision Inference

AMD Ryzen AIチュートリアルTorchVision Inference

AMD Ryzen AIチュートリアルTorchVision Inference

torchvision_inferenceの概要

torchvision_inferenceを実行する前に

Ryzen AI SoftwareチュートリアルTorchVision Inferenceの実行

終わったらconda deactivateを忘れずに

関連リンク