Python将MarkDown转PDF(完美转换HTML,含LaTeX、表格等渲染)

白色玫瑰 程序猿

时间: 2023-07-11 阅读: 1 字数:16449

{}
原理为MarkDown转HTML转PDF,转HTML达到较好的渲染效果。包括流程图、甘特图、KaTeX等高级元素

文章目录

<a href="#_3">问题描述</a> <a href="#_24">解决方案</a> <a href="#_89">引入更多扩展</a> <a href="#_128">引入数学包</a> <a href="#PDF_161">网页转PDF</a> <a href="#_175">封装</a> <a href="#_312">额外</a>

<a href="#_315">代码行号</a>    <a href="#_384">进度条</a>      <a href="#_513">参考文献</a>   <a href="#_532">显示效果</a>   

问题描述

将MarkDown转PDF

本文比较麻烦,还可以尝试 <a href="https://www.pandoc.org/">Pandoc</a>

<a href="https://download.csdn.net/download/lly1122334/12741869">本文全部代码及其CSS下载地址</a>

解决方案

使用 Typora 结合 <a href="https://wkhtmltopdf.org/downloads.html">wkhtmltopdf</a> 使用 markdown 库 和 pdfkit 库

  1. 安装 mdutils
pip install markdown
pip install pdfkit
  1. 安装 wkhtmltopdf

<a href="https://wkhtmltopdf.org/downloads.html">wkhtmltopdf 下载地址</a>

添加到环境变量 Path 中(可使用绝对路径) 在这里插入图片描述

  1. 代码

test.md 参考:

<a href="https://www.zybuluo.com/mdeditor">作业部落默认 Markdown 模板</a>

import pdfkit
from markdown import markdown

input_filename = 'test.md'
output_filename = 'test.pdf'

with open(input_filename, encoding='utf-8') as f:
   text = f.read()

html = markdown(text, output_format='html')  # MarkDown转HTML
pdfkit.from_string(html, output_filename, options={'encoding': 'utf-8'})  # HTML转PDF

# wkhtmltopdf = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'  # 指定wkhtmltopdf
# configuration = pdfkit.configuration(wkhtmltopdf=wkhtmltopdf)
# pdfkit.from_string(html, output_filename, configuration=configuration, options={'encoding': 'utf-8'})  # HTML转PDF

若没有配置环境变量,使用注释里的代码

  1. 效果

不支持标注 不支持表格 不支持LaTeX 不支持代码块 不支持流程图、序列图、甘特图 在这里插入图片描述

引入更多扩展

test.md

|项目|价格|数量|
|---|---|---|
|计算机|$1600|5|
|手机|$12|12|
|管线|$1|234|

无法正确渲染 在这里插入图片描述

引入扩展 tables

import pdfkit
from markdown import markdown

text = '''|项目|价格|数量|
|---|---|---|
|计算机|$1600|5|
|手机|$12|12|
|管线|$1|234|'''
html = markdown(text, output_format='html', extensions=['tables'])  # MarkDown转HTML
pdfkit.from_string(html, 'test.pdf', options={'encoding': 'utf-8'})  # HTML转PDF

在这里插入图片描述

详细阅读:<a href="https://python-markdown.github.io/extensions/">Available Extensions</a>

引入数学包

HTML 引入 KaTeX markdown库启用mdx_math

安装

pip install python-markdown-math
import pdfkit
from markdown import markdown

input_filename = 'test.md'
output_filename = 'test.pdf'
html = '<!DOCTYPE html><body><script src="https://cdn.jsdelivr.net/npm/katex/dist/katex.min.js" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/katex/dist/contrib/mathtex-script-type.min.js" defer></script>{}</body></html>'
text = '$$E=mc^2$$'
text = markdown(text, output_format='html', extensions=['mdx_math'])  # MarkDown转HTML
html = html.format(text)
pdfkit.from_string(html, output_filename, options={'encoding': 'utf-8'})  # HTML转PDF

详细阅读:<a href="https://github.com/Python-Markdown/markdown/wiki/Third-Party-Extensions">第三方扩展</a>

网页转PDF

生成HTML代码效果不完美,可以使用<a href="https://www.zybuluo.com/mdeditor#">作业部落</a>的导出HTML功能,再转PDF

import pdfkit

pdfkit.from_file('test.html', 'test.pdf', options={'encoding': 'utf-8'})  # HTML转PDF

封装

使用<a href="https://python-markdown.github.io/extensions/">官方扩展</a>和部分<a href="https://github.com/Python-Markdown/markdown/wiki/Third-Party-Extensions">第三方扩展</a>

安装库

pip install python-markdown-math
pip install pygments
pip install pymdown-extensions

下载CSS

<a href="https://github.com/sindresorhus/github-markdown-css">github-markdown.css</a> 改名为 markdown.css codehilite.css 生成命令 pygmentize -S default -f html -a .highlight > codehilite.css linenum.css

[data-linenos]:before {
  content: attr(data-linenos);
}

tasklist.css

.markdown-body .task-list-item {
  list-style-type: none !important;
}

.markdown-body .task-list-item input[type="checkbox"] {
  margin: 0 4px 0.25em -20px;
  vertical-align: middle;
}

代码

import os
import pdfkit
from markdown import markdown
from pymdownx import superfences


def markdown2pdf(input, output='test.pdf', encoding='utf-8', savehtml=False):
   html = '''
   <!DOCTYPE html>
      <head>
         <meta charset="UTF-8">
         <meta name="viewport" content="width=device-width, initial-scale=1, minimal-ui">
         <title>{}</title>
         
         
         
         
         
         <script src="https://unpkg.com/mermaid@8.7.0/dist/mermaid.min.js"></script>
         <script src="https://cdn.jsdelivr.net/npm/katex/dist/katex.min.js" crossorigin="anonymous"></script>
         <script src="https://cdn.jsdelivr.net/npm/katex/dist/contrib/mathtex-script-type.min.js" defer></script>
      </head>
      <body>
         <article class="markdown-body">
            {}
         </article>
      </body>
   </html>
   '''

   with open(input, encoding=encoding) as f:
      text = f.read()

   extensions = [
      'toc',  # 目录,[toc]
      'extra',  # 缩写词、属性列表、释义列表、围栏式代码块、脚注、在HTML的Markdown、表格
   ]
   third_party_extensions = [
      'mdx_math',  # KaTeX数学公式,$E=mc^2$和$$E=mc^2$$
      'markdown_checklist.extension',  # checklist,- [ ]和- [x]
      'pymdownx.magiclink',  # 自动转超链接,
      'pymdownx.caret',  # 上标下标,
      'pymdownx.superfences',  # 多种块功能允许嵌套,各种图表
      'pymdownx.betterem',  # 改善强调的处理(粗体和斜体)
      'pymdownx.mark',  # 亮色突出文本
      'pymdownx.highlight',  # 高亮显示代码
      'pymdownx.tasklist',  # 任务列表
      'pymdownx.tilde',  # 删除线
   ]
   extensions.extend(third_party_extensions)
   extension_configs = {
      'mdx_math': {
         'enable_dollar_delimiter': True  # 允许单个$
      },
      'pymdownx.superfences': {
         "custom_fences": [
            {
               'name': 'mermaid',  # 开启流程图等图
               'class': 'mermaid',
               'format': superfences.fence_div_format
            }
         ]
      },
      'pymdownx.highlight': {
         'linenums': True,  # 显示行号
         'linenums_style': 'pymdownx-inline'  # 代码和行号分开
      },
      'pymdownx.tasklist': {
         'clickable_checkbox': True,  # 任务列表可点击
      }
   }  # 扩展配置
   title = '.'.join(os.path.basename(input).split('.')[:-1])
   text = markdown(text, output_format='html', extensions=extensions,
               extension_configs=extension_configs)  # MarkDown转HTML
   html = html.format(title, text)
   print(html)
   if savehtml:
      with open(input.replace('.md', '.html'), 'w', encoding=encoding) as f:
         f.write(html)
   pdfkit.from_string(html, output, options={'encoding': 'utf-8'})  # HTML转PDF


if __name__ == '__main__':
   markdown2pdf('test.md', 'test.pdf', savehtml=True)
   print('完成')

效果看文末

PS:

缺什么找对应扩展即可,若无可自行编写。 原理为MarkDown转HTML转PDF,pdfkit效果并不好,所以效果也有限。 可使用 <a href="https://acrobat.adobe.com/cn/zh-Hans/free-trial-download.html">Adobe Acrobat Pro</a> 转换,但流程图等转换依旧不完美。

额外

这部分非通用MarkDown

代码行号

linenum.md

```python
if __name__ == '__main__':
   print('Hello World!')
import math

print(math.pi)  # 圆周率
 
 
linenum.css
 
 

[data-linenos]:before { content: attr(data-linenos); }

 
 
linenum.py
 
 

from markdown import markdown

filename = 'linenum.md' html = ''' <!DOCTYPE html> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1, minimal-ui"> <title>linenum</title>

</head> <body> <article class="markdown-body"> {} </article> </body> </html> ''' encoding = 'utf-8' with open(filename, encoding=encoding) as f: text = f.read()

extensions = [ 'pymdownx.superfences', # 多种块功能允许嵌套,各种图表 'pymdownx.highlight' # 高亮显示代码 ] extension_configs = { 'pymdownx.highlight': { 'linenums': True, # 显示行号 'linenums_style': 'pymdownx-inline' # 代码和行号分开 } } # 扩展配置 text = markdown(text, output_format='html', extensions=extensions, extension_configs=extension_configs) # MarkDown转HTML html = html.format(text) print(html) with open(filename.replace('.md', '.html'), 'w', encoding=encoding) as f: f.write(html)

pdfkit.from_string(html, output, options={'encoding': 'utf-8'}) # HTML转PDF

print('完成')

 
 
效果
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200824140228516.png)
 这样直接复制代码不会带有行号
 
 
## 进度条
 
 
progressbar.css
 
 

.progress-label { position: absolute; text-align: center; font-weight: 700; width: 100%; margin: 0; line-height: 1.2rem; white-space: nowrap; overflow: hidden; }

.progress-bar { height: 1.2rem; float: left; background-color: #2979ff; }

.progress { display: block; width: 100%; margin: 0.5rem 0; height: 1.2rem; background-color: #eeeeee; position: relative; }

.progress.thin { margin-top: 0.9rem; height: 0.4rem; }

.progress.thin .progress-label { margin-top: -0.4rem; }

.progress.thin .progress-bar { height: 0.4rem; }

.progress-100plus .progress-bar { background-color: #00e676; }

.progress-80plus .progress-bar { background-color: #fbc02d; }

.progress-60plus .progress-bar { background-color: #ff9100; }

.progress-40plus .progress-bar { background-color: #ff5252; }

.progress-20plus .progress-bar { background-color: #ff1744; }

.progress-0plus .progress-bar { background-color: #f50057; }

 
 
progressbar.py
 
 

from markdown import markdown

filename = 'progressbar.md' html = ''' <!DOCTYPE html> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1, minimal-ui"> <title>progressbar</title>

</head> <body> {} </body> </html> ''' encoding = 'utf-8' with open(filename, encoding=encoding) as f: text = f.read()

extensions = [ 'markdown.extensions.attr_list', 'pymdownx.progressbar' ] text = markdown(text, output_format='html', extensions=extensions) # MarkDown转HTML html = html.format(text) print(html) with open(filename.replace('.md', '.html'), 'w', encoding=encoding) as f: f.write(html)

pdfkit.from_string(html, output, options={'encoding': 'utf-8'}) # HTML转PDF

print('完成')

 
 
progressbar.md
 
 

[=0% "0%"] [=5% "5%"] [=25% "25%"] [=45% "45%"] [=65% "65%"] [=85% "85%"] [=100% "100%"] [=85% "85%"]{: .candystripe} [=100% "100%"]{: .candystripe .candystripe-animate}

[=0%]{: .thin} [=5%]{: .thin} [=25%]{: .thin} [=45%]{: .thin} [=65%]{: .thin} [=85%]{: .thin} [=100%]{: .thin}

```

效果 在这里插入图片描述

参考文献

<a href="https://www.markdownguide.org/">Markdown Guide</a> <a href="https://www.cnblogs.com/Wshile/p/13194019.html">python 将markdown转换为pdf</a> <a href="https://github.com/wkhtmltopdf/wkhtmltopdf">wkhtmltopdf: Convert HTML to PDF using Webkit (QtWebKit)</a> <a href="https://github.com/JazzCore/python-pdfkit/">python-pdfkit: Wkhtmltopdf python wrapper to convert html to pdf</a> <a href="https://github.com/Python-Markdown/markdown">Python-Markdown/markdown: A Python implementation of John Gruber’s Markdown with Extension support.</a> <a href="https://github.com/trentm/python-markdown2/">python-markdown2: markdown2: A fast and complete implementation of Markdown in Python</a> <a href="https://github.com/mitya57/python-markdown-math">python-markdown-math: Math extension for Python-Markdown</a> <a href="https://github.com/FND/markdown-checklist">markdown-checklist: Python Markdown extension for lists of tasks with checkboxes</a> <a href="https://gitlab.com/ayblaq/prependnewline">prependnewline: prepends a new line to a list if marked as a paragraph by the markdown</a> <a href="https://github.com/facelessuser/pymdown-extensions/">pymdown-extensions: Extensions for Python Markdown</a> <a href="https://github.com/sindresorhus/github-markdown-css">github-markdown-css: The minimal amount of CSS to replicate the GitHub Markdown style</a> <a href="https://github.com/richleland/pygments-css">pygments-css: css files created from pygment’s built-in styles</a>

显示效果

<a href="https://www.zybuluo.com/mdeditor">点我对比 作业部落渲染效果</a> 在这里插入图片描述

原文地址:https://blog.csdn.net/lly1122334/article/details/107936390?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168905959516800227485936%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=168905959516800227485936&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_ecpm_v1~rank_v31_ecpm-4-107936390-null-null.142^v88^control,239^v2^insert_chatgpt&utm_term=markdown

本文章网址:https://www.sjxi.cn/detil/5b5dc67d066b4ee5b9eadf542ae3f32f

最新评论

当前未登陆哦
登陆后才可评论哦

湘ICP备2021009447号