Deep learning approach for singer classification on Vietnamese popular music

Pham Van Toan - Tran Ngo Quang Ngoc - Ta Minh Thanh SoICT 2019 - The 10th international symposium on information and communication technology

Techinical: Speech Separation, Vocal Segmentation, Singer Classification, Deep Neural Network

Singer voice classification is a meaningful task in the digital era. With a huge number of songs today, identifying a singer is very helpful for music information retrieval, music properties indexing, and so on. In this paper, we propose a method to identify the singer's name in Vietnamese popular music. We employ the use of vocal segment detection and singing voice separation as the pre-processing steps. The purpose of these steps is to extract the singer's voice from the mixture sound. To build a singer classifier, we propose a neural network architecture working with Mel Frequency Cepstral Coefficient (MFCC) as extracted input features from vocal.

Self Attention based for Recommendation System

Pham Hoang Anh SoICT 2019 - The 10th international symposium on information and communication technology

Techinical: Self Attention, Recommendation System, Deep Neural Network

Proposing a neural network architecture for session-based recommendation which utilizes the Transformer’s ability on sequential problems. In particular, we adapt the Trans-former to session-based recommendation by adding or modifying several elements of the Transformer. We also investigate the model that combines two Transformer components which are trained separately and each of them has its own responsibility. The first component extracts the features of users’ sequential behavior and the other one captures the main purpose of the current session.

Building a Vietnamese speech synthesis system using Tacotron 2

Pham Huu Quang International workshop on Vietnamese Language and Speech Processing (VLSP 2019)

Techinical: Text-to-Speech, Vietnamese speech synthesis, Deep learning, Tacotron, Tacotron2, Signal Processing, Transfer learning, Fine-tuning

Building a traditional speech synthesis system often requires a lot of people possessing extensive domain experts and may contain brittle design choices. In this paper, we describe how we build a Vietnamese speech synthesis system(TTS) based on Deep Learning techniques. We completed the build of two speech synthesis systems, with BigCorpus(MOS of 3.32) and SmallCorpus(MOS of 4.11) in text-to-speech shared-tasks of VLSP 2019.

Automated Hate Speech Detection on Vietnamese Social Networks

Pham Huu Quang - Pham Hoang Anh - Nguyen Trung Son International workshop on Vietnamese Language and Speech Processing (VLSP 2019)

Techinical: Vietnamese Hate Speech Detection, Natural Language Processing, Text Mining, Supervised Learning, Social Networks, Feature engineering, Text Processing

On social network sites (SNSs), such as Facebook, Twitter, "hate content" is defined as contents that are rude, disrespectful or otherwise likely to make someone leave a discussion or feel unpleasant when reading them. In this shared-task, we aim at solving the problem of detecting hate content on SNSs to support more effective conversations. We proposed a novel method for solving this problem by a multi-class classification model to classify content into 3 labels: HATE, OFFENSIVE, and CLEAN. With the Vietnamese dataset of the competition VLSP-SHARED Task, our experimental results have the first position on the final leaderboard.

Simultaneous convolutional neural network for highly efficient image steganography

Pham Van Toan - Hoang Dinh Thoi - Do Hoang Thai Duong - Ta Minh Thanh 2019 19th International Symposium on Communications and Information Technologies (ISCIT)

Techinical: Information Security, Image Steganography, Secure Data Transmission, Deep Convolutional Neural Network

In this paper, our work focuses on solving image steganography with Deep Learning models. The main job is to hide an image (secret image) inside another image of the same size (cover photo). Through our tests, we have proven that this method offers optimum performance. The results compared with the research of Google Research and Shanghai University show that our method has superior advantages over similar research.

Deep Neural Networks based Invisible Steganography for Audio-into-Image Algorithm

Pham Huu Quang - Hoang Dinh Thoi - Pham Van Toan - Ta Minh Thanh 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE 2019)

Techinical: Information Security, Steganography, Secure Data Transmission, Deep Convolutional Neural Network.

Steganography is the science of concealing secret information inside usual forms of data. In this paper, the use of deep learning techniques to hide secret audio into the digital images is proposed. Extensive experiments are carried out with a set of 24K images and an audio dataset named VIVOS Corpus. Through experimental results, it has been confirmed that our method is more effective than traditional approaches. The integrity of both image and audio is well preserved while the length of the hidden audio is significantly improved.

Proposal of feature matching technique using similarity featuresfiltering for image alignment

Pham Van Toan, Ta Minh Thanh, Nguyen Thanh Trung, Pham Thi Hong Anh Proceedings of the ISSAT International Conference on Data Science in Business, Finance and Industry (DSBFI 2019)

Techinical: Image alignment, similarity features filtering, feature matching, feature-extraction.

In this paper, we propose a new approach for feature matching method called similarity features filtering and some techniques applying on invoices image pre-processing to improve the image alignment accuracy. The experimental results show that our proposed approach can achieve better results than other feature-based methods.

Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-directional LSTM

Pham Van Toan, Nguyen Thanh Hau and Ta Minh Thanh 2018 5th NAFOSTED Conference on Information and Computer Science (NICS)

Techinical: Natural language processing, audio processing with MFCC, sequence length, recurrent neural network with tensorflow.

The paper proposes a novel approach using deep learning to address the problem of phonetic recognition. Specifically, we combine the Mel Frequency Cepstral Coefficients (MFCC) method with sequence-length to present the acoustic features of speech and use different RNN architectures to phonetic classification. Besides, the well-known TIMIT dataset is used in both the training phase and evaluation phase. Currently, we have achieved the lowest error rate (13.05% PER) by using Bidirectional LSTM, which is the best result in TIMIT dataset with the reduction of about 3.5% compared to the last best result.

Large scale fashion search system with deep learning and quantization indexing

Pham Van Toan, Hoang Dinh Thoi, Pham Hoang Anh, Nguyen Thanh Hau, Ta Minh Thanh Proceedings of the Ninth International Symposium on Information and Communication Technology. ACM, 2018.

Techinical: Object detection with SSD MobilenetV2, Triplet loss,Quantization indexing, Similarity learning, image retrieval.

In the paper, we propose a fashion search system, which automatically recognizes clothes and suggests multiple similar clothing items with an impressively low latency. Through extensive experiments, it is verified that our system outperforms all existing systems in term of clothing item retrieval time.

A Practical Solution to the ACM RecSys Challenge 2018

Pham Thi Hong Anh ACM RecSys challenge 2018

Techinical: Recommendation with Colaborative Filtering and SVD, Matrix Factorization, Content based learning.

In the ACM RecSys challenge 2018, the goal is to build a recommendation system which can automatically recommend multiple suitable songs for users. With the provided dataset by Spotify, we have employed different algorithms and techniques and achieved the top 15 best result.

Deep learning ASR-based approach to non-native learner mispronunciation detection

Pham Van Toan - Ta Minh Thanh - Nguyen Thanh Hau The 2018 Vietnam joint Conference on Artificial Intelligence for Life (AI4Life-2018)

Techinical: Speech Recognition, Mispronunciation Evaluation, Goodness of Pronunciation Estimation.

In these paper, we tried some models Deep learning like CNN, RNN and combining them for phonetic classification Japanese. The study was applied in Talky Bird - a mobile application that detects Japanese learners' pronunciation errors.

Aggregation of non linear features LASSO in real estate pricing

Pham Van Toan, Nguyen Hoang Huy Vietnam Mathematics and Applications 2016

Techinical: Lasso Regression, Combine Features, Feature Extraction for Real Estate data.

In this paper we propose a new technique to predict the real estate pricing in Long Bien districs, Viet Nam and Montreal district, Canada. The experimental result has verified that our proposed method can generate better real estate pricing prediction than both traditional linear regression algorithm and support vector machine (SVM).

Vietnamese Text Classification based on BoW and Keywords Extraction with Neural Network

Pham Van Toan, Ta Minh Thanh The 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems Conference 2017

Techinical: Bag of Word, Keywords Extraction, Neural Network, Text Classification.

Text classification has become one of the main applications in the field of natural language processing. There have been many proposed approaches to address this problem; however, most of them only applied to English documents. In this paper, we employ Bag of Words (BoW), keywords extraction technique, and Neural Network approach to classify Vietnamese news. According to the experimental evaluation, the accuracy is reported to be 99.75%.