banner
Home / Blog / Data
Blog

Data

Dec 20, 2023Dec 20, 2023

Scientific Reports volume 13, Article number: 7517 (2023) Cite this article

405 Accesses

1 Altmetric

Metrics details

The complete automation of materials manufacturing with high productivity is a key problem in some materials processing. In floating zone (FZ) crystal growth, which is a manufacturing process for semiconductor wafers such as silicon, an operator adaptively controls the input parameters in accordance with the state of the crystal growth process. Since the operation dynamics of FZ crystal growth are complicated, automation is often difficult, and usually the process is manually controlled. Here we demonstrate automated control of FZ crystal growth by reinforcement learning using the dynamics predicted by Gaussian mixture modeling (GMM) from small numbers of trajectories. Our proposed method of constructing the control model is completely data-driven. Using an emulator program for FZ crystal growth, we show that the control model constructed by our proposed model can more accurately follow the ideal growth trajectory than demonstration trajectories created by human operation. Furthermore, we reveal that policy optimization near the demonstration trajectories realizes accurate control following the ideal trajectory.

The application of informatics has enabled us to realize efficient optimization, automation and advances in materials processing1,2,3,4,5,6,7,8,9. The design of conditions and environments for materials processing has been efficiently optimized using surrogate models built by neural networks or other machine learning algorithms1,2,6,10,11,12,13. Bayesian optimization can successfully reduce the number of trials for the acquisition of favorable conditions for materials processing14,15,16,17. On the other hand, some materials processing requires manual control according to information obtained during operation, and is difficult to automate. For example, in floating-zone (FZ) crystal growth, which is used to produce silicon wafers and various kinds of crystalline materials such as semiconductors, oxides, metals, and intermetallic compounds, an operator adaptively controls the input parameters to maintain preferred conditions for single-crystal growth by monitoring the status of the melt in the chamber18,19,20,21,22,23,24,25,26,27,28. In the present study, we aimed to construct a control model for automated operation of FZ crystal growth from a small number of operation trajectories.

FZ crystal growth was developed to produce high-purity silicon single crystals without the molten zone touching with any foreign materials. Despite its advantage in growing high-purity crystals, enlargement of crystal diameter is difficult compared to other crystal growth technique such as Czochralski method. Relatively small silicon wafers are manufactured by FZ crystal growth using RF heating. Figure 1 shows a schematic illustration of FZ crystal growth. In this method, part of a polycrystalline rod is heated to create an FZ melt, and the upper (feed) rod and lower (seed) rod are moved downwards to maintain the FZ melt by surface tension; finally, the crystal grows on the seed rod. An operator controls the input parameters, such as the heating power and speed of the feed rod, so that the FZ melt does not separate or drip off. In addition, the operator must form a certain shape in which the crystal diameter is first reduced (called "necking") and then increase the diameter of the crystal to obtain a single crystal. Since the dynamics of the melt state depending on the input parameters are non-linear and complicated, it is difficult to simulate the FZ crystal growth process, as has been achieved for other crystal growth methods29,30,31,32,33. Thus, it is necessary to predict the dynamics of FZ crystal growth from the operation trajectories. Due to the difficulty of acquiring numerous operation trajectories for FZ crystal growth, recently we proposed adaptation of the Gaussian mixture model (GMM) to predict the dynamics of FZ crystal growth, and demonstrated that GMM can precisely predict the operation trajectories from only five trajectories used for training34. In the present study, we constructed a control model by reinforcement learning using proximal policy optimization (PPO) and dynamics predicted by GMM.

Schematic illustration of floating-zone crystal growth. A floating-zone melt with the height of h is formed by heater power P. A feed with diameter d0 and a crystal are moved downward with speeds v and u0, respectively. As a result, a crystal with diameter d is grown.

For control of FZ crystal growth with a small number of demonstration trajectories, we applied reinforcement learning by PPO with the dynamics predicted by GMM. Here we describe how to construct a control model for FZ crystal growth combining GMM and PPO based on the literature35. The state of the floating-zone melt at time (t + 1), which is assumed to be composed of the height (h) and diameter of the grown crystal (d) and described as st+1 = (ht+1, dt+1), is determined by the state of the melt at time t (st), and input parameters, which include the power (P) and the movement speed of the feed (v), for example, and described as at = (Pt, vt).

f stands for the true dynamics for FZ crystal growth. Once the GMM is constructed from the demonstration trajectories, the state of the melt at time (t + 1) can be predicted by the state of the melt and the input parameters at time t:

The circumflex (^) represents that the value is predicted, and \({\varvec{f}}_{{{\varvec{GMM}}}}\) stands for a dynamics model trained by GMM. The details of the training of GMM are described in Ref. 34. In PPO, the parameterized policies function \(\pi_{{{\varvec{\theta}}_{{\varvec{p}}} }} \left( {{\varvec{a}}_{{\varvec{t}}} {|}{\varvec{s}}_{{\varvec{t}}} } \right)\) with parameter vector \({\varvec{\theta}}_{{\varvec{p}}}\), which generates input values at from the current state xt as a probability distribution, is iteratively optimized using a clipped surrogate objective \(L^{CLIP} \left( {{\varvec{\theta}}_{{\varvec{p}}} } \right)\) instead of a policy gradient35,36,37.

\(\in\) is a hyper-parameter determining a clipped region. \(A\left( {{\varvec{s}}_{{\varvec{t}}} ,{\varvec{a}}_{{\varvec{t}}} } \right)\) is the advantage function described as follows:

where \(Q\left( {{\varvec{s}}_{{\varvec{t}}} ,{\varvec{a}}_{{\varvec{t}}} } \right)\) is the state-action value function and \(V\left( {{\varvec{s}}_{{\varvec{t}}} } \right)\) is the state-value function. Here we approximately represent \(Q\left( {{\varvec{s}}_{{\varvec{t}}} ,{\varvec{a}}_{{\varvec{t}}} } \right)\) as follows:

where \(R_{t} \left( {{\varvec{s}}_{{\varvec{t}}} ,{\varvec{a}}_{{\varvec{t}}} } \right)\) and γ are the reward function and the discount factor, respectively. The advantage function represents whether the action in which the input value \({\varvec{a}}_{{\varvec{t}}}\) is set under the melt state described as \({\varvec{s}}_{{\varvec{t}}}\) is preferable. When the action is preferable, the advantage function takes on a positive value and the policy is updated to increase the probability ratio \(r_{t} \left( {{\varvec{\theta}}_{{\varvec{p}}} } \right)\) by maximizing the surrogate objective. On the other hand, the advantage function takes on a negative value and the policy is updated to decrease the probability ratio when the action is not preferable. Under conditions that the policy and dynamics are given, state sequences are generated as a probability distribution, and a state-value function can be calculated:

where T is the length of the trajectories and the expected value is calculated over the probability distribution of the state sequences. In PPO, the state-value function is predicted from the training data without assigning a policy. Thus, the predicted state-value function parameterized with \({\varvec{\theta}}_{{\varvec{v}}}\) \(\left( {\hat{V}_{{{\varvec{\theta}}_{{\varvec{v}}} }} \left( {{\varvec{s}}_{{\varvec{t}}} } \right)} \right)\) is optimized using the square-error loss \(L^{VF} \left( {{\varvec{\theta}}_{{\varvec{v}}} } \right)\);

Once the state-value function is predicted, the action-value function \(\left( {\hat{Q}\left( {{\varvec{s}}_{{\varvec{t}}} ,{\varvec{a}}_{{\varvec{t}}} } \right)} \right)\) and the advantage function \(\left( {\hat{A}_{t} } \right)\) are also predicted by eqs. (6) and (5), respectively. In addition to the clipped surrogate objective and the state-value function error, an entropy bonus is added to ensure sufficient exploration and the following objective is maximized for each iteration in PPO38:

where c1 and c2 are weights. Maximizing \(L^{CLIP} \left( {{\varvec{\theta}}_{{\varvec{p}}} } \right)\) means acquiring the optimized policy \(\pi_{{{\varvec{\theta}}_{{\varvec{p}}} }} \left( {{\varvec{a}}_{{\varvec{t}}} {|}{\varvec{s}}_{{\varvec{t}}} } \right)\) as described in Eq. (3) and (4). Minimizing \(L^{VF} \left( {{\varvec{\theta}}_{{\varvec{v}}} } \right)\) means that the state-value function is predicted without assuming a policy as described in Eq. (8). Maximizing \(S\left[ {\pi_{{{\varvec{\theta}}_{{\varvec{p}}} }} } \right]\left( {{\varvec{s}}_{{\varvec{t}}} } \right)\) is an entropy of policy that is a regularization term for training. In PPO, \({\varvec{\theta}}_{{\varvec{p}}} ,\user2{ \theta }_{{\varvec{v}}}\) is simultaneously optimized in each iteration. Although LCLIP depends on \({\varvec{\theta}}_{{\varvec{v}}}\) via \(A\left( {{\varvec{s}}_{{\varvec{t}}} ,{\varvec{a}}_{{\varvec{t}}} } \right)\) and LVF depends on \({\varvec{\theta}}_{{\varvec{p}}}\) via \(V_{\pi } \left( {{\varvec{s}}_{{\varvec{t}}} } \right)\), in the iterative optimization process, \({\varvec{\theta}}_{{\varvec{v}}}\) in LCLIP and \({\varvec{\theta}}_{{\varvec{p}}}\) in LVF are regarded as constant values and not optimized, and the values of the previous step are applied.

In order to optimize the policy, it is necessary to specify the dynamics to calculate the state-value function by Eq. (7). In our algorithm, GMM dynamics were used for calculation of the state-value function. Thus, the algorithm is completely data-driven without any simulations, which is different from other methods such as the "sim-to-real" approach39,40. However, the GMM dynamics can reliably predict actual dynamics only in the vicinity of the training trajectories. Therefore, we proposed a method to optimize the policy near the training trajectories, where GMM dynamics reliably predict the actual dynamics, and obtain a policy that can transfer to actual FZ crystal growth. To search the policy space near the training trajectories, firstly, we performed pretraining to make the policy closer to the training trajectories. Secondly, we introduced the error from the averaged action sequences to the reward function in addition to the error from the ideal trajectory in the diameter \(\left( {d_{t}^{ideal} } \right)\). The reward function used in our proposed algorithm is as follows:

\(\overline{{{\varvec{a}}_{{\varvec{t}}}^{\user2{*}} }}\) and \(\lambda\) denote the averaged action sequences of training trajectories and a weight.

To validate the automated control of FZ crystal growth by the algorithm using PPO with GMM dynamics, we prepared datasets for training (\(D = \left\{ {\left( {{\varvec{s}}_{{\varvec{t}}}^{\user2{*}} ,{\varvec{a}}_{{\varvec{t}}}^{\user2{*}} } \right)_{1} ,\left( {{\varvec{s}}_{{\varvec{t}}}^{\user2{*}} ,{\varvec{a}}_{{\varvec{t}}}^{\user2{*}} } \right)_{2} , \ldots ,\left( {{\varvec{s}}_{{\varvec{t}}}^{\user2{*}} ,{\varvec{a}}_{{\varvec{t}}}^{\user2{*}} } \right)_{N} } \right\}\), where N is the number of training datasets) by use of an emulator program for FZ crystal growth with a given set of dynamics34. We prepared 12 datasets aiming to create an ideal crystal shape \(\left( {d_{t}^{ideal} } \right)\) as shown in Fig. 2a considering the necking process for single crystal growth. Figure 2b–d show the prepared datasets aiming to create the ideal shape. The trajectories were different from each other and did not perfectly follow the ideal shape, because they were manually prepared.

(a) An ideal trajectory for the diameter of the crystal, (b) trajectories of the diameter for training, and (c, d) operation trajectories of the power and movement speed of the feed.

Prior to the reinforcement learning, we constructed a data-driven prediction model for FZ crystal growth by GMM as we previously reported34. The number of Gaussian mixtures, which is a hyper-parameter of GMM, was set to 50. Since the prediction of the dynamics by GMM is reliable only near the training trajectories, the accuracy of the prediction is significantly poorer when the trajectories deviate greatly from the ideal trajectory as discussed in "Results and discussion" section especially with showing Fig. 4 in detail. If we start to optimize with the random default policy, the state sequences generated by GMM will be far from the actual state sequences and fail to reach the ideal trajectory shown in Fig. 2a. Thus, we performed pretraining using the training trajectories before optimization of the policy by PPO. In the pretraining, the policy was trained to become closer to the averaged action sequences of the training trajectories. The following loss function is minimized in the pretraining:

where σ and \(\hat{\user2{\mu }}_{{{\varvec{\theta}}_{{\varvec{p}}} }} \left( {{\varvec{s}}_{{\varvec{t}}} } \right)\) represent the variance parameter and the predicted averaged values of inputs values under the state \({\varvec{s}}_{{\varvec{t}}}^{\user2{*}}\) in a training trajectory. \(\hat{\user2{\mu }}_{{{\varvec{\theta}}_{{\varvec{p}}} }} \left( {{\varvec{s}}_{{\varvec{t}}} } \right)\) and \(\hat{V}_{{{\varvec{\theta}}_{{\varvec{v}}} }} \left( {{\varvec{s}}_{{\varvec{t}}} } \right)\) are modeled by neural networks. The number, node number, and activation function of the hidden layers are 2, 64, and hyperbolic tangent (tanh), respectively. A sigmoid function is used as the activation function of the output layer of the policy network, and the output layer of the networks of the state-value function has no activation function. Both networks share weight values, except for the output layers. Training of the neural networks was performed by the Adam method with a learning rate of 1 × 10–5 and a batch size of 12841. The probabilistic policy was generated by the \(\hat{\user2{\mu }}_{{{\varvec{\theta}}_{{\varvec{p}}} }} \left( {{\varvec{s}}_{{\varvec{t}}} } \right)\) and variance parameters.

The detailed algorithm for pretraining the policy and state-value function is shown in Algorithm 1. After the pretraining of the policy, the policy was optimized by PPO while maximizing the objective shown in Eq. (8). Hyper-parameters used for the pretraining and training by PPO are summarized in Table 1. Our program about PPO for the FZ crystal growth trajectory is uploaded in GitHub42.

Figure 3 shows the results of automated control by the trained policy with our proposed algorithm. Note that the training of the policy was performed by the dynamics predicted by GMM from only the training trajectories. The obtained trajectory follows the ideal trajectory well in terms of diameter. Table 2 summarizes the mean square error (MSE) from the ideal trajectory in diameter d for control by PPO and by humans (training trajectories). The deviation from the ideal trajectory for control by PPO is smaller than that for human control. We successfully constructed a control algorithm for FZ crystal growth with a defined ideal shape from several training trajectories.

Trajectory of the diameter generated by the control model trained by our proposed algorithm.

Pretraining of the policy before PPO is crucially important. Without pretraining, the learning of policy never progresses at all. Figure 4 shows the evolution of the averaged absolute error from the ideal trajectory in diameter d during training starting after pretraining and with randomly set initial values. With pretraining, the policy was well trained and the error decreased with increasing iteration and became saturated. On the other hand, the error from the ideal trajectory never decreased with increasing iteration without pretraining. Furthermore, the error of GMM dynamics from the true dynamics along the generated trajectory was consistently higher without pretraining than that after pretraining. These results indicate that the policy space was appropriately searched with GMM dynamics with high accuracy after the pretraining.

(a) Mean absolute error (MAE) from the ideal trajectory and (b) MAE of GMM dynamics along the generated trajectory during training with and without pretraining.

Design of the reward function, adding the error from the averaged action sequences in addition to the error from the ideal trajectory, is also important for policy optimization. Without the second term in Eq. (11), the deviation from the ideal trajectory is larger than our proposed reward shown in Eq. (11), especially around t = 400 and t > 600 (Fig. 5a). In these periods, the error of GMM dynamics for the trajectory generated by the reward without the second term in Eq. (11) is higher than that for the trajectory generated by our reward function (Fig. 5b). These results indicate that adding the second term in Eq. (11) successfully achieves optimization of the policy with the GMM dynamics within high accuracy by proper setting of the reward function.

(a) Absolute errors from the ideal trajectory and (b) absolute errors of GMM dynamics along the trajectory generated with and without the second term in Eq. (11) in the reward function.

The current demonstration shows that automated control of FZ crystal growth is possible by our proposed method from a small number of demonstration trajectories. Since our methods determine the policy based on the dynamics predicted by GMM, it is necessary to make the generated trajectory closer to the demonstration trajectory during policy optimization. Pretraining of the policy and proper design of the reward function successfully achieve optimization of the policy by the GMM dynamics within reliable prediction margins. Our proposed method will be able to be applied to other materials processes that require adaptive control according to the process status. Although the present demonstration was based on data obtained by an emulator program, our proposed methodology will work with actual FZ crystal growth.

We have constructed a control model for FZ crystal growth by reinforcement learning using PPO with dynamics predicted by GMM. Our proposed method is completely data-driven and can construct the control model from only a small number of demonstration trajectories. We have verified our method to by a virtual experiment using the emulator program of FZ crystal growth. As a result, the control model was revealed to operate more accurately to follow an ideal trajectory in melt diameter than demonstration trajectories created by human operation. Since our methods determine the policy based on the dynamics predicted by GMM, it is necessary to make the generated trajectory closer to the demonstration trajectory during policy optimization. Pretraining of the policy near training trajectories and proper design of the reward function successfully achieved optimization of the policy by GMM dynamics within reliable prediction margins. Our proposed method will lead to the automation of materials processing in which adaptive operation is required and help realize high productivity in materials manufacturing. It is expected that the actual FZ crystal growth process can be automated from small number of demonstration trajectories operated by human.

The data that support the findings of this study are available from the corresponding author, SH, upon reasonable request.

Tsunooka, Y. et al. High-speed prediction of computational fluid dynamics simulation in crystal growth. CrystEngComm 20, 47 (2018).

Article Google Scholar

Dropka, N. & Holena, M. Optimization of magnetically driven directional solidification of silicon using artificial neural networks and Gaussian process models. J. Cryst. Growth 471, 53–61 (2017).

Article ADS CAS Google Scholar

Wang, L. et al. Optimal control of SiC crystal growth in the RF-TSSG system using reinforcement learning. Crystals (Basel) 10, 791 (2020).

Article CAS Google Scholar

Takehara, Y., Sekimoto, A., Okano, Y., Ujihara, T. & Dost, S. Bayesian optimization for a high- and uniform-crystal growth rate in the top-seeded solution growth process of silicon carbide under applied magnetic field and seed rotation. J. Cryst. Growth 532, 125437 (2020).

Article CAS Google Scholar

Wang, C., Tan, X. P., Tor, S. B. & Lim, C. S. Machine learning in additive manufacturing: State-of-the-art and perspectives. Addit. Manuf. 36, 101538 (2020).

Google Scholar

Yu, W. et al. Geometrical design of a crystal growth system guided by a machine learning algorithm. CrystEngComm 23, 2695–2702 (2021).

Article CAS Google Scholar

Kawata, A., Murayama, K., Sumitani, S. & Harada, S. Design of automatic detection algorithm for dislocation contrasts in birefringence images of SiC wafers. Jpn. J. Appl. Phys. 60, SBBD06 (2021).

Article Google Scholar

Harada, S., Tsujimori, K. & Matsushita, Y. Automatic detection of Basal plane dislocations in a 150-mm SiC epitaxial wafer by photoluminescence imaging and template-matching algorithm. J. Electron. Mater. 52, 1243–1248 (2022).

Google Scholar

Tsujimori, K., Hirotani, J. & Harada, S. Application of Bayesian super-resolution to spectroscopic data for precise characterization of spectral peak shape. J. Electron. Mater. 51, 712–717 (2022).

Article ADS CAS Google Scholar

Dropka, N., Holena, M., Ecklebe, S., Frank-Rotsch, C. & Winkler, J. Fast forecasting of VGF crystal growth process by dynamic neural networks. J. Cryst. Growth 521, 9–14 (2019).

Article ADS CAS Google Scholar

Dang, Y. et al. Adaptive process control for crystal growth using machine learning for high-speed prediction: Application to SiC solution growth. CrystEngComm 23, 1982–1990 (2021).

Article CAS Google Scholar

Isono, M. et al. Optimization of flow distribution by topological description and machine learning in solution growth of SiC. Adv. Theory Simul. 5, 202200302 (2022).

Article Google Scholar

Honda, T. et al. Virtual experimentations by deep learning on tangible materials. Commun. Mater. 2, 1–8 (2021).

Article Google Scholar

Shimizu, R., Kobayashi, S., Watanabe, Y., Ando, Y. & Hitosugi, T. Autonomous materials synthesis by machine learning and robotics. APL Mater. 8, 111110 (2020).

Article ADS CAS Google Scholar

Miyagawa, S., Gotoh, K., Kutsukake, K., Kurokawa, Y. & Usami, N. Application of Bayesian optimization for improved passivation performance in TiOx/SiOy/c-Si heterostructure by hydrogen plasma treatment. Appl. Phys. Express 14, 025503 (2021).

Article ADS CAS Google Scholar

Osada, K. et al. Adaptive Bayesian optimization for epitaxial growth of Si thin films under various constraints. Mater. Today Commun. 25, 101538 (2020).

Article CAS Google Scholar

Wakabayashi, Y. K. et al. Machine-learning-assisted thin-film growth: Bayesian optimization in molecular beam epitaxy of SrRuO3 thin films. APL Mater. 7, 101114 (2019).

Article ADS Google Scholar

Campbell, T. A., Schweizer, M., Dold, P., Cröll, A. & Benz, K. W. Float zone growth and characterization of Ge1−xSix (x ⩽10 at%) single crystals. J. Cryst. Growth 226, 231–239 (2001).

Article ADS CAS Google Scholar

Calverley, A. & Lever, R. F. The floating-zone melting of refractory metals by electron bombardment. J. Sci. Instrum. 34, 142 (1957).

Article ADS CAS Google Scholar

Inui, H., Oh, M. H., Nakamura, A. & Yamaguchi, M. Room-temperature tensile deformation of polysynthetically twinned (PST) crystals of TiAl. Acta Metall. Mater. 40, 3095–3104 (1992).

Article CAS Google Scholar

Hirano, T. & Mawari, T. Unidirectional solidification of Ni3Al by a floating zone method. Acta Metall. Mater. 41, 1783–1789 (1993).

Article CAS Google Scholar

Balbashov, A. M. & Egorov, S. K. Apparatus for growth of single crystals of oxide compounds by floating zone melting with radiation heating. J. Cryst. Growth 52, 498–504 (1981).

Article ADS CAS Google Scholar

Koohpayeh, S. M., Fort, D. & Abell, J. S. The optical floating zone technique: A review of experimental procedures with special reference to oxides. Prog. Cryst. Growth Charact. Mater. 54, 121–137 (2008).

Article CAS Google Scholar

Harada, S. et al. Crossover from incoherent to coherent thermal conduction in bulk titanium oxide natural superlattices. Scr. Mater. 208, 114326 (2022).

Article CAS Google Scholar

Christensen, A. N. The crystal growth of the transition metal compounds TiC, TiN, and ZrN by a floating zone technique. J. Cryst. Growth 33, 99–104 (1976).

Article ADS CAS Google Scholar

Nørlund Christensen, A. Crystal growth and characterization of the transition metal silicides MoSi2 and WSi2. J. Cryst. Growth 129, 266–268 (1993).

Article ADS Google Scholar

Harada, S. et al. Crystal structure refinement of ReSi1.75 with an ordered arrangement of silicon vacancies. Philos. Mag. 91, 3108–3127 (2011).

Article ADS CAS Google Scholar

Harada, S. et al. Direct observation of vacancies and local thermal vibration in thermoelectric rhenium silicide. Appl. Phys. Express 5, 035203 (2012).

Article ADS Google Scholar

Muiznieks, A., Virbulis, J., Lüdge, A., Riemann, H. & Werner, N. Floating Zone Growth of Silicon. in Handbook of Crystal Growth: Bulk Crystal Growth: Second Edition vol. 2 241–279 (Elsevier, 2015).

Derby, J. J. & Brown, R. A. Thermal-capillary analysis of Czochralski and liquid encapsulated Czochralski crystal growth: I. Simulation. J. Cryst. Growth 74, 605–624 (1986).

Article ADS CAS Google Scholar

Meziere, J. et al. Modeling and simulation of SiC CVD in the horizontal hot-wall reactor concept. J. Cryst. Growth 267, 436–451 (2004).

Article ADS CAS Google Scholar

Karpov, SYu., Makarov, Yu. N. & Ramm, M. S. Simulation of sublimation growth of SiC single crystals. Physica Status Solidi (b) 202, 201–220 (2001).

3.0.CO;2-T" data-track-action="article reference" href="https://doi.org/10.1002%2F1521-3951%28199707%29202%3A1%3C201%3A%3AAID-PSSB201%3E3.0.CO%3B2-T" aria-label="Article reference 32" data-doi="10.1002/1521-3951(199707)202:13.0.CO;2-T">Article ADS Google Scholar

Dang, Y. et al. Numerical investigation of solute evaporation in crystal growth from solution: A case study of SiC growth by TSSG method. J. Cryst. Growth 579, 126448 (2022).

Article CAS Google Scholar

Omae, R., Sumitani, S., Tosa, Y. & Harada, S. Prediction of operating dynamics in floating-zone crystal growth using Gaussian mixture model. Sci. Technol. Adv. Mater. Methods 2, 294–301 (2022).

Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Openai, O. K. Proximal policy optimization algorithms. https://doi.org/10.48550/arxiv.1707.06347 (2017).

Schulman, J., Levine, S., Abbeel, P., Jordan, M. & Moritz, P. Trust region policy optimization. Proc. Mach. Learn. Rec. 37, 1889–1897 (2015).

Google Scholar

Sutton, R. S., McAllester, D., Singh, S. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 447 (1999).

Google Scholar

Mnih, V. et al. Asynchronous Methods for Deep Reinforcement Learning. 33rd International Conference on Machine Learning, ICML 2016 4, 2850–2869 (2016).

Christiano, P. et al. Transfer from simulation to real world through learning deep inverse dynamics model. https://doi.org/10.48550/arxiv.1610.03518 (2016).

Peng, X. B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Sim-to-real transfer of robotic control with dynamics randomization. Proc. IEEE Int. Conf. Robot. Autom. https://doi.org/10.1109/ICRA.2018.8460528 (2017).

Article Google Scholar

Kingma, D. P. & Ba, J. L. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2014) doi:https://doi.org/10.48550/arxiv.1412.6980.

https://github.com/AnamorResearch/fz_rl

Download references

This paper was supported by JSPS KAKENHI Grant Number JP21H01681. The authors are thankful to Mr. Okuno and his colleagues in Sanko Co. Ltd. for fruitful discussions on the application of actual FZ crystal growth furnaces.

Anamorphosis Networks, 50 Higashionmaeda-Cho, Nishishichijo, Shimogyo-Ku, Kyoto, 600-8898, Japan

Yusuke Tosa, Ryo Omae, Ryohei Matsumoto & Shogo Sumitani

Center for Integrated Research of Future Electronics (CIRFE), Institute of Materials and Systems for Sustainability (IMaSS), Nagoya University, Furo-Cho, Chikusa-Ku, Nagoya, 464-8601, Japan

Shunta Harada

Department of Materials Process Engineering, Nagoya University, Furo-Cho, Chikusa-Ku, Nagoya, 464-8603, Japan

Shunta Harada

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

S.H. and S.S. conceptualized the basic idea and the application to the materials process. Y.T. constructed algorithm and programs for analysis under the guidance of S.S. with the assistance of R.O. and in continuous discussion with all authors. The manuscript was written by S.H. and Y.T. in discussion with all other authors.

Correspondence to Shunta Harada.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

Tosa, Y., Omae, R., Matsumoto, R. et al. Data-driven automated control algorithm for floating-zone crystal growth derived by reinforcement learning. Sci Rep 13, 7517 (2023). https://doi.org/10.1038/s41598-023-34732-5

Download citation

Received: 07 March 2023

Accepted: 06 May 2023

Published: 09 May 2023

DOI: https://doi.org/10.1038/s41598-023-34732-5

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.