In personal identity recognition systems, detecting a person's age, gender, and language using voice signal characteristics is a crucial issue, especially because of the importance of security considerations. Age, gender, and language classification problems are important in signal processing because they are used to analyze and understand human behavior, interactions, and preferences. This can be especially useful in the fields of human-computer interaction, psychology, and social science research. In this paper, a new system for detecting a speaker's age, gender, and language based on deep learning models is presented. Deep learning models have shown great efficacy in various fields of signal processing. For this paper, a range of deep models were tested, including convolutional neural networks (CNNs), recurrent neural network (RNN), and a fine-tuning ResNet34 architecture. Additionally, techniques such as transfer learning were applied to improve the efficiency of the proposed system. The input voice signals are preprocessed by applying the spectro-temporal transform to obtain additional features that can be fed to the ResNet34 model, which is designed specifically for the task of voice signal processing. The dataset used in this paper was sourced from the Mozilla common voice initiative, which is dedicated to advancing speech recognition and language identification technologies. The performance of the proposed algorithm was evaluated in the presence of Gaussian noise to determine its robustness. The experimental results demonstrated that the proposed algorithm significantly outperformed basic algorithms and other deep neural networks in terms of age and gender recognition from voice signals.