如何把文字转换成机器可读形式？ - Programming版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Programming版 - 如何把文字转换成机器可读形式？

相关主题
● 请教关于OpenNLP 和 Stanford NLP 选择	● Python日报 2015年2月楼
● 请问Python初学者怎么学	● 怎么自学cloud/big data programming
● perl的文本处理大部分容易在python里实现吗？	● 自学big data有啥好办法
● CNN做NLP工程多吗？	● python作为后台，比较简单的web ui，哪个framework最快速？？？
● 请推荐机器翻译的rest API/open source package	● Palantir靠的是军方关系
● 搜索引擎的克星是什么呢?	● 有啥好轮子可以抓取网页里的不规则信息？
● encode high cardinality categorical features	● 快速出app的MVP, 有什么推荐的框架和postgresql结合的比较好的么?
● Python, Java, Perl, PHP,Ruby	● good C++ open source project?

相关话题的讨论汇总
话题: company话题: 000话题: gold话题: march话题: veris

进入Programming版参与讨论

1

(共1页)

F****3 发帖数: 1504	1 请问把下面几段文字的的信息转换成machine readable的relational database数据库形式在CS领域里应该搜索什么关键词？我知道这个问题可能很深，需要看书，但是不知道要看什么书最对路子。。。请问在CS里面这个一般行话怎么称呼啊？要取出下面的信息 1. Company name 2. period to which on which the company issue the announcements 3. Financial measures: earnings or revenue? 4. Number of the financial measures such as $45,360,000 Example: Veris Gold Corp. announced unaudited consolidated earnings and operating results for the first quarter ended March 31, 2013. For the quarter, the company reported revenue of $45,360,000 against $20,889,000 a year ago. 1. Veris Gold Corp. 2. the first quarter ended March 31, 2013 3. $45,360,000 4. revenue
E*****m 发帖数: 25615	2 Information Retrieval 用 Python 的話，可以讀這本 http://www.amazon.com/Natural-Language-Processing-Python-Steven Ch. 7
F****3 发帖数: 1504	3 谢谢！下载安装了，书也借了。。。在StanfordNLP下载了一个StanfordNLP的包，但是是用java编译的。 http://www-nlp.stanford.edu/software/corenlp.shtml 请问是不是用那个Python wrapper就可以使用了在python环境下调用这个包了。不要意思啊，我的用词不专业。
E*****m 发帖数: 25615	4 可以用 Stanford parser, 也可以不用， NLTK 自己有傳統的 grammar, 沒有 Stanford 那樣的 statistical parser 那麼強大就是了。【在 F****3 的大作中提到】 : 谢谢！ : 下载安装了，书也借了。。。 : 在StanfordNLP下载了一个StanfordNLP的包，但是是用java编译的。 : http://www-nlp.stanford.edu/software/corenlp.shtml : 请问是不是用那个Python wrapper就可以使用了在python环境下调用这个包了。 : 不要意思啊，我的用词不专业。
F****3 发帖数: 1504	5 好的，我先把NLTK掌握好。。。谢谢！【在 E*****m 的大作中提到】 : : 可以用 Stanford parser, 也可以不用， NLTK 自己有傳統的 grammar, 沒有 : Stanford 那樣的 statistical parser 那麼強大就是了。
l*******s 发帖数: 1258	6 这个，叫unstructured data to structured data 如果要搞好，基本上NLP是唯一解决方案。不知道你要达到什么样的精确度。有一些现成的包，比如opennlp之类的，但是没法抽取出你要求的所有内容，或许只能搞定公司名而已。因为那些都是基于machine learning的用wall street journey语料训练的除非你自己标注一堆data然后重新训练模型，不过听你的意思这方面不擅长，还是工作量很大的。要是能凑活的话，不妨试试写一堆regex，搞rule based，或许能对付一阵子，就看你们需求如何了。另外，考虑下一些网上的API，比如Alchemy API等
F****3 发帖数: 1504	7 太谢谢你告诉我这些！非常有用！请问要用StanfordNLP那个包，我只会用python可不可以。我是文科生，java根本不会。。。想做的事情就是把类似下面这样的东西转换成机器可读形式： Transat A.T. Inc. announced the signing of an agreement with International Lease Finance Corporation for the long-term (eight-year) leasing of four Boeing B737-800 aircraft. These planes will be introduced in summer 2014 and become the core of Air Transat's permanent narrow-body fleet. They will be used on sun-destination routes to Mexico, the Caribbean and Florida. The agreement also includes the... More >> 【在 l*******s 的大作中提到】 : 这个，叫unstructured data to structured data : 如果要搞好，基本上NLP是唯一解决方案。 : 不知道你要达到什么样的精确度。 : 有一些现成的包，比如opennlp之类的，但是没法抽取出你要求的所有内容，或许只能 : 搞定公司名而已。因为那些都是基于machine learning的用wall street journey语料 : 训练的 : 除非你自己标注一堆data然后重新训练模型，不过听你的意思这方面不擅长，还是工作 : 量很大的。 : 要是能凑活的话，不妨试试写一堆regex，搞rule based，或许能对付一阵子，就看你 : 们需求如何了。
l***i 发帖数: 16	8 Try information extraction.
l*******s 发帖数: 1258	9 那就有点麻烦，很多NLP的包都是java的那就试试nltk吧，是python的。不过话又说回来了，为什么你非要搞这个东西呢？是不是老板让干的？跟老板谈谈，就说自己是文科生，确实不擅长这个，能不能找个人合作？然后自己负责其他部分，毕竟完整任务是最终目的，没必要所有事情都自己干，尤其是自己不擅长的。文科生，就应该有点文科生解决问题的思路吗。哈哈虽然我也是文科生。。。 --Dishes Map，基于餐馆Review的美食发现引擎 https://play.google.com/store/apps/details?id=dishesmap.mobile and be 【在 F****3 的大作中提到】 : 太谢谢你告诉我这些！非常有用！ : 请问要用StanfordNLP那个包，我只会用python可不可以。我是文科生，java根本不会 : 。。。 : 想做的事情就是把类似下面这样的东西转换成机器可读形式： : Transat A.T. Inc. announced the signing of an agreement with International : Lease Finance Corporation for the long-term (eight-year) leasing of four : Boeing B737-800 aircraft. These planes will be introduced in summer 2014 and : become the core of Air Transat's permanent narrow-body fleet. They will be : used on sun-destination routes to Mexico, the Caribbean and Florida. The : agreement also includes the... More >>

1

(共1页)

进入Programming版参与讨论

相关主题
● good C++ open source project?	● 请推荐机器翻译的rest API/open source package
● stl 源代码疑问	● 搜索引擎的克星是什么呢?
● reverse bits 的题目	● encode high cardinality categorical features
● C怪问题一个	● Python, Java, Perl, PHP,Ruby
● 请教关于OpenNLP 和 Stanford NLP 选择	● Python日报 2015年2月楼
● 请问Python初学者怎么学	● 怎么自学cloud/big data programming
● perl的文本处理大部分容易在python里实现吗？	● 自学big data有啥好办法
● CNN做NLP工程多吗？	● python作为后台，比较简单的web ui，哪个framework最快速？？？

相关话题的讨论汇总
话题: company话题: 000话题: gold话题: march话题: veris

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)