NLG for Fun - Python快速自动标题生成器实例

模块:markovify
我们在这里使用的py模块是markovify。
markovify的描述:
markovify是一个简单的,可扩展的马尔可夫链发生器。目前,它的主要用途是构建大型文本语料库的马尔可夫模型,并从中产生随机语句。但是,从理论上讲,它可以用于其他应用程序。
关于数据集:
个数据集可以从kaggle数据集中下载
加载必需的包
import pandas as pd # data processing, csv file i/o (e.g. pd.read_csv)
import markovify #markov chain generator
# any results you write to the current directory are saved as output.
读取输入文本文件
inp = pd.read_csv('../input/abcnews-date-text.csv')
inp.head(3)
publish_date headline_text
020030219 aba decides against community broadcasting lic…
120030219 act fire witnesses must be aware of defamation
220030219a g calls for infrastructure protection summit
用马尔可夫链建立文本模型
text_model = markovify.newlinetext(inp.headline_text,state_size = 2)
自动生成的标题
# print five randomly-generated sentences
for i in range(5):
print(text_model.make_sentence())
iron magnate poised to storm cleanup
meet the png government defends stockdale appointment
the twitter exec charged with animal cruelty trial
pm denies role in pregnancy
shoalhaven business boosts hunter
删除内容