takao-yThursday, November 12th, 2020 at 11:48:20 AM GMT+09:00吉兼です。現在までと今後のワークフローをまとめてみました。step 3はYinさんと私が担当いたします。JAXAのサーバでの実行環境の詳細について教えていただければ幸いです。もし可能であれば、一度引き継ぎの打ち合わせを作業担当者(小林さん、Yinさん、私、その他?)だけで行えればと考えていますが、いかがでしょうか。step 4(小林さん担当?)との関係もあると思いますので、細かい作業内容について確認ができればと思います。よろしくお願いいたします。
Workflow
Restec
1. Obtain the optimal values of hyper-parameters
We have to investigate the following four factors such as hyper-parameters and the size of feature vector (simulated precipitation distribution) before estimating the precipitation in all grids.
Hyper-prameters: Gamma (Gm), Cost (C), Epsilon (Ep)
The size of feature (Fv) initial 21x21 (rg2=10) rg2: the number of grid points from the center of FV
Initial valuese and assumed ranges are obtained from random search at some points.
Step 1: For hyper parameters
1.1 Gm (Change the values), C (Fixed), Ep (Fixed), Fv (Fixed)
If the minimum or maximum values are selected, please extend the range and try it again.
1.2 Gm (Fixed), C (Change), Ep (Fixed), Fv (Fixed)
If the minimum or maximum values are selected, please extend the range and try it again.
1.3 Gm (Fixed), C (Fixed), Ep (Change), Fv (Fixed)
If the minimum or maximum values are selected, please extend the range and try it again.
1.4 Gm (Change), C (Fixed), Ep (Fixed), Fv (Fixed)
If the Gm values are not largely different from the values of 1.1, the oprtimal values are determined.
If not, please repeat the cycle from 1.2 to 1.4
We confirm the sensitivity of C and Ep is not large.
For the reason, we fixed the C and Ep by optimal values obtained in the step1.
Step 2: For the size of FV
2.1 Gm (Change the values), C (Fixed by Optimal values), Ep (Fixed by Optimal values), Fv (rg2=4)
If the minimum or maximum values are selected, please extend the range and try it again.
2.2 Gm (Change the values), C (Fixed), Ep (Fixed), Fv (rg2=6)
If the minimum or maximum values are selected, please extend the range and try it again.
2.3 Gm (Change the values), C (Fixed), Ep (Fixed), Fv (rg2=8)
If the minimum or maximum values are selected, please extend the range and try it again.
2.4 Gm (Change the values), C (Fixed), Ep (Fixed), Fv (rg2=12)
If the minimum or maximum values are selected, please extend the range and try it again.
If the performance does not largely improved, the size of FV (rg2=10) is determined.
2.5 Gm (Change the values), C (Fixed), Ep (Fixed), Fv (rg2=14)
If the minimum or maximum values are selected, please extend the range and try it again.
If the performance does not largely improved, the size of FV (rg2=12) is determined.
We confirmed the performance in the case of rg2=12 does not largely change as those of rg2=10 in all cases.
For the reason, we determined the rg2 as 10.
It is necessary to obtain the optimal values in January, April, July, and October in all regions (A,B,C,D, and E).
----- Almost finished? (2020.11.12)
Step 3 Estimate the precipitation in all grid points by ML + CDF (transform?) using optimal values
( Yin-san and I are in charge.)
This process will be conducted on the JAXA’s server.
We’d appreciate it if you could tell us about the environmental setting on JAXA’s server.
3.1 Need to modify the “ml-bin” according to the environment of JAXA’s server.
How can we get the Input data and adjust the domains?
3.2 Check the server performance.
Is the concurrent execution (e.g. using 10 cores) of the machine learning system possible?
3.3 Execute the ML system using analysis data of MSM-GPV from 2007 to 2018.
Estimation of precipitation in all months
From December to February using optimal parameters in January.
From March to May using optimal parameters in April.
From June to August using optimal parameters in July.
From September to November using optimal parameters in October.
3.4 Execute the ML system using forecast data of MSM-GPV in 2019?
Estimation of precipitation in all months by ML with the classifiers produced by the training form 2007 to 2018.
(CDF-transform method would be applied.)
From December to February using optimal parameters in January.
From March to May using optimal parameters in April.
From June to August using optimal parameters in July.
From September to November using optimal parameters in October.
3.5 Check the performance
Advantages and disadvantages of this method
Which month is the best? And why?
How can we improve the problems?
3.6 Submit a paper to a scientific journal (Yin-san).
Step 4 Construction of ML system for TE-Japan
( Kobayashi-san is in charge?)
4.1 Construction of ML system using the classifiers of old version.
4.2 ML system is improved using the classifiers produced by step 3.4.
The classifiers are produced by using the data of MSMGPV and Radar_AMeDAS from 2007 to 2018.
4.3 ML system is improved by applying the CDF-transform method?