



本篇文章应 @Tetrad因果推断 邀请详细介绍一下如何使用Tetrad的GUI版本实现对数据集的数据加载、因果分析以及最终的因果图生成。第一部分介绍具体问题要求,以及提供的不同软件包,此处我们选择的是提到的第一种——Tetrad。第二部分介绍我们选择的数据集以及关键步骤和最后的因果图。第三部分详细介绍每一步如何操作Tetrad得到结果。前两部分截取本人报告片段,主要语言为英语,其余部分使用中文介绍。


Apply one causal discovery algorithm on a real world problem. You need to specify the details of the problem, collect the data by yourself or from a public website, briefly summarize what algorithm you use, and explain the results. You may use any causal discovery algorithm described in the following paper [Spirtes et al., 2016], and use the software packages in Page 26 of the paper.

  • Peter Spirtes and Kun Zhang. Causal discovery and inference: concepts and recent methodological advances. Applied Informatics, 3:3, 2016 https://applied-informatics-j.springeropen.com/track/pdf/10.1186/ s40535-016-0018-x

Page 26: The following software packages are available online:

  • The Tetrad project webpage (Tetrad implements a large number of causal discovery meth ods, including PC and its variants, FCI, and LiNGAM): http://www.phil.cmu.edu/ tetrad/.

  • Kernel-based conditional independence test Zhang et al. (2011): http://people. tuebingen.mpg.de/kzhang/KCI-test.zip.

  • LiNGAMand its extensions, Shimizu et al. (2006, 2011): https://sites.google.com/ site/sshimizu06/lingam.

  • Fitting the nonlinear additive noise model Hoyer et al. (2009): http://webdav. tuebingen.mpg.de/causality/additive-noise.tar.gz

  • Distinguishing cause from effect based on the PNL causal model, Zhang and Hyväri nen (2009, 2010): http://webdav.tuebingen.mpg.de/causality/CauseOrEffect_ NICA.rar

  • Probabilistic latent variable models for distinguishing between cause and effect, Mooij et al. (2010): http://webdav.tuebingen.mpg.de/causality/nips2010-gpi-code.tar. gz

  • Information-geometric causal inference, Daniusis et al. (2010); Janzing et al. (2012): http://webdav.tuebingen.mpg.de/causality/igci.tar.gz


According to a research released in 2020 by World Health Organization (WHO), the world’s biggest killer is ischaemic heart disease, responsible for 16% of the world’s total deaths. Since 2000, the largest increase in deaths has been for this disease, rising by more than 2 million to 8.9 million deaths in 2019.

Medical scholars have published numerous articles on factors associated with heart disease. In recent years, with the development of machine learning technology, studies have emerged that use machine learning methods to predict heart disease. Since heart disease may have a causal relationship with many factors, causal discovery algorithms are suitable for analyzing factors related to heart disease. The dataset used in this paper is from https://archive.ics.uci.edu/dataset/45/heart+disease, which contains a total of 303 instances and involves 13 valid features. The specific parameters have been marked in Table 2.

In this experiment, we utilized the Tetrad platform to implement causal discovery algorithms and generate causal graphs. Tetrad is a software platform for causal discovery and statistical analysis, providing a series of algorithms and tools to help researchers identify causal relationships between variables. We employed three algorithms in total: PC algorithm, FCI algorithm, and FAS algorithm.

First, we need to construct a network within Tetrad, consisting of data blocks, knowledge blocks, and search blocks. The network structure is depicted in Figure5. The data block is responsible for importing the heart disease dataset, the knowledge block is used to add prior knowledge, defining the order of causal relationships through the hierarchical definition of variables as shown in Figure 6. Finally, the search block utilizes different algorithms to obtain a graph representing causal relationships.

The resulting graphs obtained from the search are shown in Figure 7, and the outcomes from the three algorithms are similar. From the results, it is evident that only fbs (fasting blood sugar) and restecg (resting electrocardiographic results) do not correlate with other features, indicating no apparent causal relationship. This suggests that factors such as fasting blood sugar levels do not have a significant causal relationship with heart disease. Additionally, ca (number of major vessels) and thal (type of thalassemia) are directly related to the severity of the disease, indicating that the number of major vessels and other physical signs are highly correlated with heart disease. Furthermore, basic characteristics such as age and gender have a close causal relationship with a large number of other features, which is a result that aligns with our intuition.



下载Tetrad,网上关于这部分也有很多教程,这里简单提几点步骤。首先官网是CMU的https://www.cmu.edu/dietrich/philosophy/tetrad/,进入之后选择Use Tetrad,并选择GUI版本,如下图所示:

选择最左侧GUI 版本的Tetrad

选择Get The Latest Executable,在下载页面选择launch版本的:



windows powershell中进入到保存.jar文件的目录下使用





点击左上角File,并选择Load Data,载入刚刚预处理的.csv文件

加载页面如下,根据自己数据集的特征选择加载方式,比如此处我的数据集需要将Data type从Continuous改为Discrete










此时左侧是选择算法的filter,右侧是description,选择好算法后,点击Set Parameter,下图所示

set parameters

设置好需要的参数之后点击Run Search & Generate Graph就可以得到最终的因果图。选择新的search模块连好线之后可以使用别的算法再试。在最终的Graph页面可以拖动每个变量的名字改变位置,得到你需要的结构。


Copyright © 2024 aigcdaily.cn  北京智识时代科技有限公司  版权所有  京ICP备2023006237号-1