Read docx file
In this post, I guide you how to read a Microsoft Word file by Ruby language.
In the same folder, create file foo.docx has content:
Fdsf Dsfds
Fsd
F
Sd
Fsd
F
Ds
f
Create Gemfile
has content
source "https://rubygems.org" ruby "3.2.3" # Bundle edge Rails instead: gem "rails", github: "rails/rails", branch: "main"
gem "docx"
then run
rake
or
bundle install
to install dependency of https://github.com/donhuvy/docx
then create file hehe.rb has content
require 'docx' doc = Docx::Document.open('vy.docx') doc.paragraphs.each do |p| puts p
end doc.bookmarks.each_pair do |bookmark_name, bookmark_object| puts bookmark_name
end
Run the program, you have result
C:/Ruby32-x64/bin/ruby.exe -x C:\Ruby32-x64\bin\bundle exec C:\Ruby32-x64\bin\ruby.exe C:\Ruby32-x64\lib\ruby\gems\3.2.0\gems\ruby-debug-ide-3.0.0.beta.17\bin\rdebug-ide --key-value --step-over-in-blocks --disable-int-handler --evaluation-timeout 10 --evaluation-control --time-limit 100 --memory-limit 0 --full-value-time-limit 20000 --full-value-memory-limit 0 --rubymine-protocol-extensions --port 57243 --host 0.0.0.0 --dispatcher-port 57244 -- C:/Users/dnvy/RubymineProjects/vy1/hehe.rb
Fast Debugger (ruby-debug-ide 3.0.0.beta.17, debase 3.0.0.beta.11, file filtering is supported, block breakpoints supported, smart steps supported, obtaining return values supported, partial obtaining of instance variables supported) listens on 0.0.0.0:57243
Fdsf Dsfds
Fsd
F
Sd
Fsd
F
Ds
f
dan
gd
gg Process finished with exit code 0
Read bookmark poins in docx file
Add bookmark(s) inside Microsoft Word (2016)
Ruby snippet what gets bookmark(s)
doc.bookmarks.each_pair do |bookmark_name, bookmark_object| puts bookmark_name
end
result
dan
xxx
gd
yyy
gg
Read docx table
# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('van_table.docx') first_table = doc.tables[0]
puts first_table.row_count
puts first_table.column_count
puts first_table.rows[0].cells[0].text
puts first_table.columns[0].cells[0].text # Iterate through tables
doc.tables.each do |table| table.rows.each do |row| # Row-based iteration row.cells.each do |cell| puts cell.text end end table.columns.each do |column| # Column-based iteration column.cells.each do |cell| puts cell.text end end
end
result
source code https://github.com/donhuvy/vy_ruby_docx_table