首先,无疑先要有个Apache环境,建立一个叫pip的目录,存放所有的pip安装包。
搭建源的方法有两种,一种是直接将公网的pip源下载到本地,之后用crontab定期同步就可以了,这样的好处是大而全,不好的是对于网络带宽较小的,简直就是拉锯战。
这里介绍的一种方法可能更适合于个人开发者,就是把所有的requirementst的内容全部集中起来,然后利用pip download下来做成源。
这是一个下载Pip包的脚本,我暂且称之为download.sh
#!/bin/bash
PIP_REQUIRE=”pip-requires”
CACHE_PATH=”/opt/pip”
while read LINE
do
if [[ $LINE =~ ^[a-zA-Z] ]]
then
echo $LINE
yes w | pip install $LINE –no-install -d pip -I
fi
done <$PIP_REQUIRE
其中Cache_Path为当前pip包存放的位置
<VirtualHost *:80>
ServerAdmin jimjiang@gmail.com
ServerName test.jimflying.com
DocumentRoot /opt
<Directory /opt>
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
ErrorLog logs/mirrors-error_log
CustomLog logs/mirrors-access_log common
</VirtualHost>
配置完apache以后服务器端完成了
vi $HOME/.pip/pip.conf
[global]
find-links = http://192.168.0.30/pip
no-index = true
vi $HOME/.pydistutils.cfg
[easy_install]
index_url = http://192.168.0.30/pip
这时候再pip install -r requirements的时候速度会变得非常快
pip install pydocx
from pydocx import PyDocX
html = PyDocX.to_html("test.docx")
f = open("test.html", 'w', encoding="utf-8")
f.write(html)
f.close()
通过网页上传word文档,只接收docx
<form method="post" enctype="multipart/form-data">
<input type="file" name="file" accept="application/vnd.openxmlformats-officedocument.wordprocessingml.document">
</form>
windows下,将doc转为docx
pip3 install pypiwin32
from win32com import client
word = client.Dispatch("Word.Application")
doc = word.Documents.Open("D:\ \ .doc") //绝对路径 doc文件
doc.SaveAs("D:\ \ .docx",16) //保存的docx 文件,绝对路径
doc.Close()
word.Quit()
确定安装了beautifulsoup4,如果没有安装请在命令行运行以下命令安装:
pip install beautifulsoup4pip install lxml
pip install html5lib
然后创建一个任意名字的python文件,写入以下内容:
from bs4 import BeautifulSouphtml_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters and their names were
<a href="
,
<a href="
<a href="
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())
如果程序输出
# 输出 (#符号是注释,可忽略)# <html>
# <head>
# <title>
# The Dormouse's story
# </title>
# </head>
# <body>
# <p class="title">
# <b>
# The Dormouse's story
# </b>
# </p>
# <p class="story">
# Once upon a time there were three little sisters and their names were
# <a class="sister" href="
# Elsie
# </a>
# ,
# <a class="sister" href="
# Lacie
# </a>
# and
# <a class="sister" href="
# Tillie
# </a>
# and they lived at the bottom of a well.
# </p>
# <p class="story">
# ...
# </p>
# </body>
# </html>
即成功
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)