- vừa được xem lúc

NeuralNotes — Music generation using Transformers

0 0 13

Người đăng: Minh Le

Theo Viblo Asia

Introduction

Currently, I have pursued a topic that using deep learning for automatic music generation during my bachelor thesis at VGU. Indeed, because of its generality, deep learning and more generally machine learning manners are being utilized to produce musical material.

In my opinion, exploiting potentials of deep learning on music generation task, it allows people to create new and original music without having to start from scratch, saving time and effort in the creative process. Furthermore, it helps people to explore different musical possibilities and ideas. However, it is difficult to deny that current AI-generated music products do not bring the harmony expected in a piece of music, such as timbre compatibility, note repetition, or rhythm, melodies and wandering music have no direction, are imitative and have the risk of plagiarism.

When talking about development, it is talking about a whole process. With the current rate of technology development, I do believe that an AI can compose music at the same level as an artist is a conceivable prospect.

In this post, I would talk about what I have achieved and what I have done during the process. The content of the post will be:

Implementation: I will discuss about libraries, frameworks, a brief introduction about a deep learning architecture that I used. System Architecture: I would go into detail about the system I propose so that users of all computer skill levels can interact with and experience the model. Future works and Conclusion: I would give several comments on my proposed method. I would welcome you to experience my project.

Implementation

To develop the project, since Python is the programming language the I used, I choose:

  • FastAPI to deploy web application. It has very high performance, also, it is designed to be simple to understand and utilize, able to lessen the quantity of redundant code. Recently, ML-projects that used FastAPI are growing.
  • Jinja, which is a well-liked option for template engines for Python web frameworks.
  • MongoDB. It is a NoSQL database so does not have a fixed schema, making it simple to add and remove fields from documents without changing the database schema.
  • PyTorch, which is an open-source machine learning framework. It is known for its ease of use and flexibility. It is also very efficient, making it a good choice for production environments.

About the core technology which is deep learning based, responsible for generating music, I choose MuseMorphose. The MuseMorphose architecture is composed of a KL divergence regularized latent space for the representation of musical bars between a standard Transformer encoder acting at the bar level and a standard Transformer decoder accepting segment-level conditions via the in-attention mechanism.

In my settings, since the Decoder is in charge of generation, I makes a little modification from the its positional encoding, allowing it achieves better performances. image.png

One more reason that I choose this model is because it also introduce a new methods to represent music in an symbolical approach other than MIDI, namely, REMI (Revamped MIDI). REMI has more attributes than its ancestor, it was born to benefit Transformer-based models from the human knowledge of music.

System Architecture

The flow of application is simple to understand. The user makes a request from a web interface to a back-end service. The agent is prompted, and as a result, songs are generated and returned to the user. image.png

As can be seen from the Figure, within the procedure, there are two descriptions that are depicted by the solid and dashed lines. I want the system to utilize the data that it receives over time. The ”Data Storage” would be timed, automatically update the local data, and broaden the range of the data.

  • Updating the data set at a local point provides diversity in songs and music genres, improving the user experience.
  • The computing and storage should be located near the data source. To the best of my understanding, this deployment can minimize latency by putting computing and storage closer to the data sources and can increase performance by minimizing the distance that data needs to travel before being processed.
  • It enhances security since attack strategies that may be used to exploit data during transmission are removed by keeping data near the source.

The Figure also depicts three clusters, which I have labeled as the application, the lab, and the storage. The application has a 3-tier architecture which includes an interface for user communication, the application server, and the data source. The lab includes the agent and local storage, which in my configuration is a directory that stores receive, result, and resource data. The storage’s primary function is to hold data.

For instance, a person with extensive experience in model research will work in ”the lab,” where their only concentration is on investigating and making use of models and data to produce successful outcomes. Alternately, ”the storage” will be the location where users or engineers skilled in information processing use their understanding of system users to analyze recent activities, musical genres the user appears to be interested in, or whether the user’s musical preferences are polarized or concentrated in particular profiles

Early deployment of the system may only require a small amount of knowledge, but as it is developed, the complexity of the required expertise will distinguish the skills of developers in each cluster. In other words, the system has to be separated into smaller clusters for future development due to the disparity in knowledge across each module. This makes the division into three clusters reasonable, making the system easy to manage, maintain, and grow.

Future works and Conlusion

Overall, I have proposed a method that using deep learning to generate music automatically, also, an application architecture that ensures a diversity in musical data, so that, during the generation process, users can experience different tastes from a given piece of music.

In the future, I would focus on employing RLHF to my application, so that my agent can produce music “locally”.


Thank you for reading this article; I hope it added something to your knowledge bank! Just before you leave:

👉 Be sure to press the like button and follow me. It would be a great motivation for me.

👉 Follow me: Linkedin | Github

Bình luận

Bài viết tương tự

- vừa được xem lúc

Tấn công và phòng thủ bậc nhất cực mạnh cho các mô hình học máy

tấn công bậc nhất cực mạnh = universal first-order adversary. Update: Bleeding edge của CleverHans đã lên từ 3.1.0 đến 4.

0 0 42

- vừa được xem lúc

[Deep Learning] Key Information Extraction from document using Graph Convolution Network - Bài toán trích rút thông tin từ hóa đơn với Graph Convolution Network

Các nội dung sẽ được đề cập trong bài blog lần này. . Tổng quan về GNN, GCN. Bài toán Key Information Extraction, trích rút thông tin trong văn bản từ ảnh.

0 0 219

- vừa được xem lúc

Trích xuất thông tin bảng biểu cực đơn giản với OpenCV

Trong thời điểm nhà nước đang thúc đẩy mạnh mẽ quá trình chuyển đổi số như hiện nay, Document Understanding nói chung cũng như Table Extraction nói riêng đang trở thành một trong những lĩnh vực được quan tâm phát triển và chú trọng hàng đầu. Vậy Table Extraction là gì? Document Understanding là cái

0 0 230

- vừa được xem lúc

Con đường AI của tôi

Gần đây, khá nhiều bạn nhắn tin hỏi mình những câu hỏi đại loại như: có nên học AI, bắt đầu học AI như nào, làm sao tự học cho đúng, cho nhanh, học không bị nản, lộ trình học AI như nào... Sau nhiều lần trả lời, mình nghĩ rằng nên viết hẳn một bài để trả lời chi tiết hơn, cũng như để các bạn sau này

0 0 157

- vừa được xem lúc

[B5'] Smooth Adversarial Training

Đây là một bài trong series Báo khoa học trong vòng 5 phút. Được viết bởi Xie et. al, John Hopkins University, trong khi đang intern tại Google. Hiện vẫn là preprint do bị reject tại ICLR 2021.

0 0 45

- vừa được xem lúc

Deep Learning với Java - Tại sao không?

Muốn tìm hiểu về Machine Learning / Deep Learning nhưng với background là Java thì sẽ như thế nào và bắt đầu từ đâu? Để tìm được câu trả lời, hãy đọc bài viết này - có thể kỹ năng Java vốn có sẽ giúp bạn có những chuyến phiêu lưu thú vị. DJL là tên viết tắt của Deep Java Library - một thư viện mã ng

0 0 139