个人图书馆(calibre-web)是超级棒的一个个人图书馆软件,可以像群晖的video station那样使用挂刷器来管理和观看图书,特别棒。
一、下载calibre-web镜像在群晖docker套件中搜索calibre,选择第二个下载,之所以选这个,是因为它拥有图书格式转换的功能。
然后慢慢等待镜像下载好。
二、配置calibre启动参数在群晖上创建文件夹config和books,分别挂载到docker镜像的磁盘/calibre-web/config和/books,不清楚的可以看镜像介绍,很easy。这里在群晖创建的两个文件夹,权限一定要给足,让任何人都可以访问,如果在群晖上 *** 作不方便,可以考虑ssh连接群晖使用chmod -R 777来 *** 作。
映射端口:
在启动环境变量增加两个PUID和PGID,这个有点难说明白,就默认按照截图写吧,表示登录admin用户的权限。
把初始化的metadata.db文件拷贝到上面步骤创建的config文件夹下,我的是e-book/library/config。
如果不知道metadata.db怎么获得,可以在win11上安装calibre,安装过程中会提示选择书库的目录,就在这个下面。
安装好启动即可,默认用户名密码是:admin/admin123。
数据库配置,如果你是安装我的步骤选的,把metadata.db复制到了config目录下,那么这里填写/calibre-web/config就行。
下面我们简单配置一下calibre,选择右上角的设置按钮,然后选功能配置,把启用上传打上勾。
在群晖docker套件中,打开calibre的后台命令窗口,找到scholar.py文件,我们在同目录下增量豆瓣的py文件。
执行命令如下:
#回到上一级目录 cd ../ # 寻找scholar.py文件 find -name scholar.py # 进入到scholar.py文件所在路径 cd ./app/cps/metadata_provider/ # 新增一个Newdouban.py文件 vi Newdouban.py
Newdouban.py内容如下,文件是从calibre原始镜像仓库拷贝过来的:
import re import time import requests from concurrent.futures import ThreadPoolExecutor, as_completed from urllib.parse import urlparse, unquote from lxml import etree from functools import lru_cache from cps.services.metadata import metadata DOUBAN_SEARCH_JSON_URL = "https://www.douban.com/j/search" DOUBAN_BOOK_CAT = "1001" DOUBAN_BOOK_CACHE_SIZE = 500 # 最大缓存数量 DOUBAN_CONCURRENCY_SIZE = 5 # 并发查询数 DEFAULT_HEADERS = { 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3573.0 Safari/537.36' } PROVIDER_NAME = "New Douban Books" PROVIDER_ID = "new_douban" class NewDouban(metadata): __name__ = PROVIDER_NAME __id__ = PROVIDER_ID def __init__(self): self.searcher = DoubanBookSearcher() super().__init__() def search(self, query, generic_cover=""): if self.active: return self.searcher.search_books(query) class DoubanBookSearcher: def __init__(self): self.book_loader = DoubanBookLoader() self.thread_pool = ThreadPoolExecutor(max_workers=10, thread_name_prefix='douban_async') def calc_url(self, href): query = urlparse(href).query params = {item.split('=')[0]: item.split('=')[1] for item in query.split('&')} url = unquote(params['url']) return url def load_book_urls(self, query): url = DOUBAN_SEARCH_JSON_URL params = {"start": 0, "cat": DOUBAN_BOOK_CAT, "q": query} res = requests.get(url, params, headers=DEFAULT_HEADERS) book_urls = [] if res.status_code in [200, 201]: book_list_content = res.json() for item in book_list_content['items'][0:DOUBAN_CONCURRENCY_SIZE]: # 获取部分数据,默认5条 html = etree.HTML(item) a = html.xpath('//a[@]') if len(a): href = a[0].attrib['href'] parsed = self.calc_url(href) book_urls.append(parsed) return book_urls def search_books(self, query): book_urls = self.load_book_urls(query) books = [] futures = [self.thread_pool.submit(self.book_loader.load_book, book_url) for book_url in book_urls] for future in as_completed(futures): book = future.result() if book is not None: books.append(future.result()) return books class DoubanBookLoader: def __init__(self): self.book_parser = DoubanBookHtmlParser() @lru_cache(maxsize=DOUBAN_BOOK_CACHE_SIZE) def load_book(self, url): book = None start_time = time.time() res = requests.get(url, headers=DEFAULT_HEADERS) if res.status_code in [200, 201]: print("下载书籍:{}成功,耗时{:.0f}ms".format(url, (time.time() - start_time) * 1000)) book_detail_content = res.content book = self.book_parser.parse_book(url, book_detail_content.decode("utf8")) return book class DoubanBookHtmlParser: def __init__(self): self.id_pattern = re.compile(".*/subject/(\d+)/?") def parse_book(self, url, book_content): book = {} html = etree.HTML(book_content) title_element = html.xpath("//span[@property='v:itemreviewed']") book['title'] = self.get_text(title_element) share_element = html.xpath("//a[@data-url]") if len(share_element): url = share_element[0].attrib['data-url'] book['url'] = url id_match = self.id_pattern.match(url) if id_match: book['id'] = id_match.group(1) img_element = html.xpath("//a[@class='nbg']") if len(img_element): cover = img_element[0].attrib['href'] if not cover or cover.endswith('update_image'): book['cover'] = '' else: book['cover'] = cover rating_element = html.xpath("//strong[@property='v:average']") book['rating'] = self.get_rating(rating_element) elements = html.xpath("//span[@class='pl']") book['authors'] = [] book['publisher'] = '' for element in elements: text = self.get_text(element) if text.startswith("作者"): book['authors'].extend([self.get_text(author_element) for author_element in element.findall("..//a")]) elif text.startswith("译者"): book['authors'].extend([self.get_text(author_element) for author_element in element.findall("..//a")]) elif text.startswith("出版社"): book['publisher'] = self.get_tail(element) elif text.startswith("出版年"): book['publishedDate'] = self.get_tail(element) elif text.startswith("丛书"): book['series'] = self.get_text(element.getnext()) summary_element = html.xpath("//div[@id='link-report']//div[@class='intro']") book['description'] = '' if len(summary_element): book['description'] = etree.tostring(summary_element[-1], encoding="utf8").decode("utf8").strip() tag_elements = html.xpath("//a[contains(@class, 'tag')]") if len(tag_elements): book['tags'] = [tag_element.text.strip() for tag_element in tag_elements] book['source'] = { "id": PROVIDER_ID, "description": PROVIDER_NAME, "link": "https://book.douban.com/" } return book def get_rating(self, rating_element): return float(self.get_text(rating_element, '0')) / 2 def get_text(self, element, default_str=''): text = default_str if len(element) and element[0].text: text = element[0].text.strip() elif isinstance(element, etree._Element) and element.text: text = element.text.strip() return text if text else default_str def get_tail(self, element, default_str=''): text = default_str if isinstance(element, etree._Element) and element.tail: text = element.tail.strip() return text if text else default_str
原来旧的文件应该是douban.py,但现在貌似有点小问题:
import requests from cps.services.metadata import metadata class Douban(metadata): __name__ = "Douban Books" __id__ = "douban" doubanUrl = "http://YOUR_NAS_IP:8085" headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3573.0 Safari/537.36' } def search(self, query, generic_cover=""): if self.active: val = list() result = requests.get(self.doubanUrl + "/v2/book/search?q=" + query.replace(" ", "+"), headers=self.headers) for r in result.json()['books']: v = dict() v['id'] = r['id'] v['title'] = r['title'] v['authors'] = r.get('authors', []) v['description'] = r.get('summary', "") v['publisher'] = r.get('publisher', "") v['publishedDate'] = r.get('pubdate', "") v['tags'] = [tag.get('name', '') for tag in r.get('tags', [])] rating = r['rating'].get('average', '0') if not rating: rating = '0' v['rating'] = float(rating) / 2 if r.get('image'): v['cover'] = r.get('image') else: v['cover'] = generic_cover v['source'] = { "id": self.__id__, "description": self.__name__, "link": "https://book.douban.com/" } v['url'] = "https://book.douban.com/subject/" + r['id'] val.append(v) return val
然后按esc,输入wq保存退出,重启容器。
这个时候就可以看到我们增加的豆瓣的挂刷器了,因为我是弄好之后截的图,所以图片是有图书封面的:
名称可以随便填,协议选择TCP,外部端口是外网访问的端口,内部IP地址是群晖的本地IP,内部端口是容器暴露给宿主机的端口号,为了方便我们全部设置成了统一的8089。
然后就可以愉快的外网访问了。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)