- vừa được xem lúc

Text Summarization — What is it?

0 0 2

Người đăng: Minh Le

Theo Viblo Asia

Introduction

Recently, text summarization in deep learning strikes my mind, making me obsess with it for days and nights. In this post, I would want to contribute my knowledge on this topic and explain how to complete the assignment in a black-box manner in this post, thanks to Huggingface.

image.png

It is evident that as technology progresses, there is an increasing flow of information that is quite astounding. The benefit does, however, bring up an unsettling phenomena wherein someone who needs to conduct research or whose work involves document searching drowns. We genuinely need a program that can extract the most crucial elements from documents.

image.png

This is when automatic text summary comes in handy. The act of summarizing involves communicating key points from the source text or texts. The goal of automatic text summarizing is to create a succinct and readable summary while maintaining the main ideas and content of the original text. Since the activity needs human understanding and language flexibility, it is typically questioned. Nonetheless, numerous studies have been carried out which demonstrate that, while the quality may not be outstanding, the potential is both implementable and developable.

Personally, I feel that this topic is very intriguing. The reason for that is when I am doing my research, there are tons of documents that I have to deal with. I thought to myself that “it must be an easy way for this”. An idea came to my mind that using machine to perform the task would be a breakthough in the NLP field. For a moment, I believed that I became an inventor, a game changer in this field. However, reality beats fantasy. At the end, I became in love with the topic and decided to delve deeper into the text summarization with deep learning.

There are many companies specialise this feature, making their products become “hard-to-deal-with” in the NLP field, for instance, Quillbot and Grammarly. You should try it once!

Main Schools

There are 2 main schools in this interesting field:

Extractive approach: this approach is a technique for creating summaries of texts by selecting and combining existing sentences from the original document. These sentences are chosen based on their relevance and importance to the overall theme of the text. In other words, Extractive summarization picks out the most important pieces of information from each document and puts them together to create a concise overview of the entire topic. Here are some key features of extractive summarization:

  • Simple and efficient: It’s relatively easy to implement and works well on short texts.
  • Preserves factual accuracy: The sentences are taken directly from the source text, so they are guaranteed to be factual.
  • Maintains coherence: By selecting sentences that are close together in the original text, the summary generally flows smoothly.
  • May lack originality: The summary may not contain any new insights or ideas, as it simply reflects the content of the original text.
  • Potentially misses key points: If important information is not contained in complete sentences, it may be missed by the summarization process.

Extractive summarization is a widely used technique for creating summaries of news articles, scientific papers, and other factual texts. It is a good choice when you need a quick and reliable summary that is guaranteed to be factually accurate. image.png Abstractive approach: this approach is a sophisticated approach to summarizing text that takes things a step further than its extractive counterpart. Instead of simply picking and choosing existing sentences, it delves deeper, understanding the main ideas and concepts of the original text, and then rephrases them in its own words, creating a concise and informative summary. Here are some key features of abstractive summarization:

  • Highly insightful: Can capture the main ideas and concepts beyond the surface level of the text, providing deeper understanding.
  • More concise: Often produces shorter summaries compared to extractive methods, focusing on the most essential information.
  • Original and creative: Generates new sentences and phrases, potentially expressing insights not explicitly stated in the source text.
  • More complex to implement: Requires advanced models and training techniques, typically involving deep learning methods like neural networks.
  • May introduce potential inaccuracies: The generated text, while creative, can deviate from the factual content of the original source.

Abstractive summarization is well-suited for complex and creative texts where capturing the essence of the content is crucial. It shines in applications like summarizing news articles, research papers, or even literary works, where understanding the main points and conveying them succinctly is paramount. image.png Comparing to our daily life, we can see that we tend to do extractive summarize whenever a job requires the task. For instance, we have to do research for our companies. We want the research to be done as fast as possible but also be concise. Therefore, we would look up multiple available resources and search for the most valuable information. Practicing abstractive approach requires more than that. This is due to the fact that abstractive summarization techniques handle issues that data-driven techniques like sentence extraction find easier to handle, such as semantic representation, inference, and natural language production.

Code Implementation

Recently, Huggingface just released a noteworthy feature that makes it easier for learners who are not experienced with deep learning to approach LLM in a way that makes a great deal interest.

Imagine having a vast library of the world’s most advanced AI models at your fingertips, ready to tackle your unique tasks with just a few lines of code. That’s precisely what the Hugging Face API-Inference offers, empowering you to harness the power of machine learning without the complexities of training and infrastructure.

Note that you will need huggingface access token, therefore, you should register one. Make sure that you would create one, and ready to stand on the shoulders of giants.

To begin with, you would choose your favor model and implement neccessary lines of code. It can be done by following lines:

import requests
model_id = "facebook/bart-large-cnn" # https://huggingface.co/facebook/bart-large-cnn
API_TOKEN = "..." # Replace your api_token here
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = f"https://api-inference.huggingface.co/models/{model_id}" def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json()

Taking the second section of this post as an example, the task would return:

text = """
There are 2 main schools in this interesting field:
Extractive approach: this approach is a technique for creating summaries of texts by selecting and combining existing sentences from the original document. These sentences are chosen based on their relevance and importance to the overall theme of the text. In other words, Extractive summarization picks out the most important pieces of information from each document and puts them together to create a concise overview of the entire topic. Here are some key features of extractive summarization:
Simple and efficient: It's relatively easy to implement and works well on short texts.
Preserves factual accuracy: The sentences are taken directly from the source text, so they are guaranteed to be factual.
Maintains coherence: By selecting sentences that are close together in the original text, the summary generally flows smoothly.
May lack originality: The summary may not contain any new insights or ideas, as it simply reflects the content of the original text.
Potentially misses key points: If important information is not contained in complete sentences, it may be missed by the summarization process. Abstractive approach: this approach is a sophisticated approach to summarizing text that takes things a step further than its extractive counterpart. Instead of simply picking and choosing existing sentences, it delves deeper, understanding the main ideas and concepts of the original text, and then rephrases them in its own words, creating a concise and informative summary. Here are some key features of abstractive summarization:
Highly insightful: Can capture the main ideas and concepts beyond the surface level of the text, providing deeper understanding.
More concise: Often produces shorter summaries compared to extractive methods, focusing on the most essential information.
Original and creative: Generates new sentences and phrases, potentially expressing insights not explicitly stated in the source text.
More complex to implement: Requires advanced models and training techniques, typically involving deep learning methods like neural networks.
May introduce potential inaccuracies: The generated text, while creative, can deviate from the factual content of the original source. Comparing to our daily life, we can see that we tend to do extractive summarize whenever a job requires the task. For instance, we have to do research for our companies. We want the research to be done as fast as possible but also be concise. Therefore, we would look up multiple available resources and search for the most valuable information. Practicing abstractive approach requires more than that. This is due to the fact that abstractive summarization techniques handle issues that data-driven techniques like sentence extraction find easier to handle, such as semantic representation, inference, and natural language production.
""" data = query( { "inputs": text, "parameters": {"do_sample": False}, }
) >> [{'summary_text': 'There are 2 main schools in this interesting field: extractive and abstractive. Extractive summarization picks out the most important pieces of information from each document and puts them together to create a concise overview of the entire topic. The abstractive approach delves deeper, understanding the main ideas and concepts of the original text, and then rephrases them in its own words.'}]

Conclusion

Text summarization is a very interesting topic. In fact, it becomes viral and is believed to be a must-have feature when a product comes to the LLM topic.

Thank you for reading this article; I hope it added something to your knowledge bank! Just before you leave:

👉 Be sure to press the like button and follow me. It would be a great motivation for me.

👉 Follow me: Linkedin | Github

Bình luận

Bài viết tương tự

- vừa được xem lúc

Tấn công và phòng thủ bậc nhất cực mạnh cho các mô hình học máy

tấn công bậc nhất cực mạnh = universal first-order adversary. Update: Bleeding edge của CleverHans đã lên từ 3.1.0 đến 4.

0 0 29

- vừa được xem lúc

[Deep Learning] Key Information Extraction from document using Graph Convolution Network - Bài toán trích rút thông tin từ hóa đơn với Graph Convolution Network

Các nội dung sẽ được đề cập trong bài blog lần này. . Tổng quan về GNN, GCN. Bài toán Key Information Extraction, trích rút thông tin trong văn bản từ ảnh.

0 0 204

- vừa được xem lúc

Trích xuất thông tin bảng biểu cực đơn giản với OpenCV

Trong thời điểm nhà nước đang thúc đẩy mạnh mẽ quá trình chuyển đổi số như hiện nay, Document Understanding nói chung cũng như Table Extraction nói riêng đang trở thành một trong những lĩnh vực được quan tâm phát triển và chú trọng hàng đầu. Vậy Table Extraction là gì? Document Understanding là cái

0 0 216

- vừa được xem lúc

Con đường AI của tôi

Gần đây, khá nhiều bạn nhắn tin hỏi mình những câu hỏi đại loại như: có nên học AI, bắt đầu học AI như nào, làm sao tự học cho đúng, cho nhanh, học không bị nản, lộ trình học AI như nào... Sau nhiều lần trả lời, mình nghĩ rằng nên viết hẳn một bài để trả lời chi tiết hơn, cũng như để các bạn sau này

0 0 137

- vừa được xem lúc

[B5'] Smooth Adversarial Training

Đây là một bài trong series Báo khoa học trong vòng 5 phút. Được viết bởi Xie et. al, John Hopkins University, trong khi đang intern tại Google. Hiện vẫn là preprint do bị reject tại ICLR 2021.

0 0 33

- vừa được xem lúc

Deep Learning với Java - Tại sao không?

Muốn tìm hiểu về Machine Learning / Deep Learning nhưng với background là Java thì sẽ như thế nào và bắt đầu từ đâu? Để tìm được câu trả lời, hãy đọc bài viết này - có thể kỹ năng Java vốn có sẽ giúp bạn có những chuyến phiêu lưu thú vị. DJL là tên viết tắt của Deep Java Library - một thư viện mã ng

0 0 124