Tools for data visualization

A picture is worth a thousand words, but creating cool infographics can be time-consuming. So we've found 20 amazing tools to make it easier.

It’s often said that data is the new world currency, and the web is the exchange bureau through which it’s traded. As consumers, we’re positively swimming in data; it's everywhere from food labels to World Health Organisation reports. As a result, for the designer it’s becoming increasingly difficult to present data in a way that stands out from the mass of competing data streams.

One of the best ways to get your message across is to use a visualisation to quickly draw attention to the key messages, and by presenting data visually it’s also possible to uncover surprising patterns and observations that wouldn’t be apparent from looking at stats alone. As author, data journalist and information designer David McCandless said in his TED talk: "By visualising information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you’re lost in information, an information map is kind of useful."

There are many different ways of telling a story, but everything starts with an idea. So to help you get started we’ve rounded up 20 of the most awesome data visualisation tools available on the web.

Google Search Tips

Google搜索技巧的清单

  • link:URL = 列出到链接到目标URL的网页清单.
  • related:URL = 列出于目标URL地址有关的网页.
  • site:domain.com 搜索区域仅限于目标网站.
  • allinurl:WORDS = 只显示在URL地址里有搜索结果的页面.
  • inurl:WORD = 跟allinurl类似,但是只在URL中搜索第一个词.
  • allintitle:WORD = 搜索网页标题.
  • intitle:WORD = 跟allintitle类似,但是只在标题里搜索第一个词.
  • cache:URL = 将显示关于URL的Google缓存(中国不可用).
  • info:URL = 将显示一个包含了这些元素的页面:类似结果的链接,反向链接,还有包括了这个URL的页面.在搜索框里直接输入URL会起到同样的效果.
  • filetype:SOMEFILETYPE = 指定文件类型.
  • -filetype:SOMEFILETYPE = 剔除指定文件类型.
  • site:www.somesite.net “+www.somesite.net” = 显示该站点有多少网页被google收录
  • allintext: = 搜索文本,但不包括网页标题和链接
  • allinlinks: = 搜索链接, 不包括文本和标题
  • WordA OR WordB = 搜索包含两关键词之一的页面
  • “Word” OR “Phrase” = 精确的要求搜索单词或者句子
  • WordA -WordB = 包含单词A但是不包含单词B
  • WordA +WordB = 都包含
  • ~WORD = 寻找此单词和它的同义词
  • ~WORD -WORD = 只搜索同义词,不要原词

A simple guide to getting started with data science

There are many articles on this subject from renowned data scientists (Dataspora, Gigaom, Quora, Hilary Mason). This post captures my journey (a software engineer) on learning Statistics and Data Visualization.

I'm mid-way in my 5 year journey to become proficient in data science and my learning program has included self-learning (books, blogs, toy problems), projects at work, class-room training (Stanford), teaching/presentations, conferences (UseR, Strata). Here's what I've done so far and what worked and what didn't...

R Packages

文本挖掘

Rwordseg

R环境下的中文分词工具,使用rJava调用Java分词工具Ansj。Ansj 也是一个开源的 Java 中文分词工具,基于中科院的 ictclas 中文分词算法,采用隐马尔科夫模型(Hidden Markov Model, HMM)。作者孙健重写了一个Java版本,并且全部开源,使得 Ansi 可用于人名识别、地名识别、组织机构名识别、多级词性标注、关键词提取、指纹提取等领域,支持行业词典、 用户自定义词典。详细信息可以参考作者孙健的专访以及项目的Github地址

rmmseg4j(不推荐,使用Rordseg替代)