基于Affymetrix芯片的基因表達(dá)研究

出版時(shí)間：2012-1 出版社：科學(xué)出版社作者：〔美〕Hinrich Göhlmann、Willem Talloen 著，張春秀譯頁(yè)數(shù)：327
Tag標(biāo)簽：無(wú)

內(nèi)容概要

Affymetrix
GeneChip系統(tǒng)是目前應(yīng)用最廣泛的生物芯片平臺(tái)。但是由于Aflymetrix芯片含有超大量的信息，很多Affymetrix芯片用戶(hù)趨向于使用默認(rèn)的分析設(shè)置，得到的常常不是最優(yōu)化的結(jié)論。分子生物學(xué)家和生物統(tǒng)計(jì)學(xué)家根據(jù)十余年的基因表達(dá)譜實(shí)驗(yàn)研究和數(shù)據(jù)分析的實(shí)踐經(jīng)驗(yàn)編寫(xiě)了《基于Affymetrix芯片的基因表達(dá)研究》，從理論概念到實(shí)驗(yàn)結(jié)果，解釋了使用Affymetrix芯片進(jìn)行基因表達(dá)研究的全部過(guò)程，拆除了分子生物學(xué)、生物信息學(xué)和生物統(tǒng)計(jì)學(xué)之間無(wú)處不在的語(yǔ)言障礙。
本書(shū)權(quán)威實(shí)用，介紹了Affymetrix芯片的重要技術(shù)、統(tǒng)計(jì)學(xué)易犯的錯(cuò)誤和問(wèn)題，同時(shí)涉及其他芯片平臺(tái)的一般規(guī)則和應(yīng)用。通過(guò)例證和全彩圖例，描述了技術(shù)和統(tǒng)計(jì)方法的概念，為初學(xué)者提供詳細(xì)指導(dǎo)。本領(lǐng)域的專(zhuān)家則可以了解芯片所涉及的其他學(xué)科知識(shí)，拓展基因芯片表達(dá)譜研究的認(rèn)識(shí)。

書(shū)籍目錄

附圖目錄
表格目錄
BioBox目錄
StatsBox目錄
前言
縮寫(xiě)詞和術(shù)語(yǔ)
1 生物學(xué)問(wèn)題
1.1 為什么進(jìn)行基因表達(dá)?
1.1.1 生物技術(shù)的進(jìn)展
1.1.2 生物學(xué)相關(guān)的研究
1.2 研究問(wèn)題
1.2.1 相關(guān)性和實(shí)驗(yàn)研究對(duì)比
1.3 研究課題的主要類(lèi)型
1.3.1 兩組間比較
1.3.2 多組間比較
1.3.3 不同治療方式間的比較
1.3.4 多組與對(duì)照組的比較
1.3.5 研究主題內(nèi)的變化
1.3.6 分類(lèi)和預(yù)測(cè)樣本
2 AffymetriX芯片技術(shù)
2.1 探針
2.2 探針組
2.2.1 標(biāo)準(zhǔn)探針組的定義
2.2.2 客戶(hù)可選擇的芯片描述文件(CDF)
2.3 芯片類(lèi)型
2.3.1 標(biāo)準(zhǔn)表達(dá)檢測(cè)芯片
2.3.2 外顯子芯片
2.3.3 基因芯片
2.3.4 疊瓦芯片
2.3.5 用于某項(xiàng)研究的專(zhuān)用芯片
2.4 標(biāo)準(zhǔn)實(shí)驗(yàn)室芯片實(shí)驗(yàn)流程
2.4.1 體外轉(zhuǎn)錄分析
2.4.2 全轉(zhuǎn)錄本正義鏈標(biāo)記
2.5 AffymetriX芯片的數(shù)據(jù)質(zhì)量
2.5.1 分析數(shù)據(jù)的重復(fù)性
2.5.2 分析數(shù)據(jù)的穩(wěn)定性
2.5.3 分析的敏感性
3 實(shí)驗(yàn)操作
3.1 生物學(xué)實(shí)驗(yàn)
3.1.1 生物學(xué)背景
3.1.1.1 實(shí)驗(yàn)?zāi)康?假設(shè)
3.1.1.2 技術(shù)平臺(tái)
3.1.1.3 mRNA水平的預(yù)期改變
3.1.2 樣本
3.1.2.1 選擇合適的樣品/組織
3.1.2.2 樣本的類(lèi)型
3.1.2.3 樣本的異質(zhì)性
3.1.2.4 性別
3.1.2.5 時(shí)間點(diǎn)
3.1.2.6 樣本切割引起的誤差
3.1.2.7 動(dòng)物處理產(chǎn)生的誤差
3.1.2.8 RNA的質(zhì)量
3.1.2.9 RNA的數(shù)量
3.1.3 預(yù)實(shí)驗(yàn)
3.1.4 主實(shí)驗(yàn)
3.1.4.1 對(duì)照實(shí)驗(yàn)
3.1.4.2 實(shí)驗(yàn)處理
3.1.4.3 分批實(shí)驗(yàn)
3.1.4.4 隨機(jī)化
3.1.4.5 標(biāo)準(zhǔn)化
3.1.4.6 選擇對(duì)照
3.1.4.7 樣品量/重復(fù)次數(shù)/費(fèi)用
3.1.4.8 平衡設(shè)計(jì)
3.1.4.9 對(duì)照樣本
3.1.4.10 樣本混合
3.1.4.11 實(shí)驗(yàn)記錄
3.1.5 實(shí)驗(yàn)數(shù)據(jù)分析驗(yàn)證
3.2 芯片實(shí)驗(yàn)
3.2.1 外源RNA對(duì)照
3.2.2 靶基因合成
3.2.3 批處理影響
3.2.4 全基因組芯片和用于某項(xiàng)研究的專(zhuān)用芯片比較
4 數(shù)據(jù)分析預(yù)處理
4.1 數(shù)據(jù)預(yù)處理
4.1.1 探針的信號(hào)強(qiáng)度
4.1.2 轉(zhuǎn)換為log2的對(duì)數(shù)
4.1.3 背景校正
4.1.4 歸一化
4.1.5 AffymetriX芯片概要
4.1.5.1 完全匹配(PM)和錯(cuò)配(MM)技術(shù)
4.1.5.2 只使用PM探針的技術(shù)
4.1.6 整體解決方案
4.1.7 信號(hào)檢測(cè)方法
4.1.7.1 芯片分析系統(tǒng)MAS 5.0
4.1.7.2 背景和雜交信號(hào)檢測(cè)(DABG)
4.1.7.3 檢出/缺失比值(PANP)
4.1.8 標(biāo)準(zhǔn)化
4.2 質(zhì)量控制
4.2.1 技術(shù)數(shù)據(jù)
4.2.2 虛擬圖像
4.2.3 重復(fù)性評(píng)價(jià)
4.2.3.1 重復(fù)性評(píng)價(jià)方法
4.2.3.2 實(shí)例分析
4.2.4 批處理效應(yīng)
4.2.5 批處理效應(yīng)校正
5 數(shù)據(jù)分析
5.1 為什么我們需要統(tǒng)計(jì)學(xué)?
5.1.1 需要對(duì)數(shù)據(jù)作出解釋
5.1.2 需要一個(gè)優(yōu)秀的實(shí)驗(yàn)設(shè)計(jì)
5.1.3 統(tǒng)計(jì)學(xué)與生物信息學(xué)比較
5.2 高維數(shù)據(jù)的問(wèn)題
5.2.1 分析結(jié)果的重復(fù)性
5.2.2 數(shù)據(jù)挖掘和驗(yàn)證
5.3 基因過(guò)濾
5.3.1 過(guò)濾方法
5.3.1.1 信號(hào)強(qiáng)度
5.3.1.2 兩樣品間變異
5.3.1.3 缺失/檢出
5.3.1.4 含有效信息的/無(wú)有效信息的檢出
5.3.2 數(shù)據(jù)過(guò)濾對(duì)檢驗(yàn)和多重校正的影響
5.3.3 幾種過(guò)濾方法的比較
5.4 無(wú)監(jiān)督數(shù)據(jù)分析
5.4.1 進(jìn)行無(wú)監(jiān)督分析的原因
5.4.1.1 批次影響
5.4.1.2 技術(shù)或生物學(xué)的偏差
5.4.1.3 表型數(shù)據(jù)的質(zhì)量校驗(yàn)
5.4.1.4 共調(diào)控基因的識(shí)別
5.4.2 聚類(lèi)
5.4.2.1 距離和聯(lián)系
5.4.2.2 聚類(lèi)算法
5.4.2.3 聚類(lèi)質(zhì)量校驗(yàn)
5.4.3 多元投影方法
5.4.3.1 多元投影方法類(lèi)型
5.4.3.2 基因和樣本關(guān)系圖
5.5 檢測(cè)差異表達(dá)
5.5.1 復(fù)雜問(wèn)題的簡(jiǎn)單解決方法
5.5.2 統(tǒng)計(jì)檢驗(yàn)
5.5.2.1 倍數(shù)變化
5.5.2.2 t-檢驗(yàn)類(lèi)型
5.5.2.3 由t統(tǒng)計(jì)到p值
5.5.2.4 方法比較
5.5.2.5 線(xiàn)性模型
5.5.3 多重檢驗(yàn)的校正
5.5.3.1 多重檢驗(yàn)的問(wèn)題
5.5.3.2 多重校正步驟
5.5.3.3 方法比較
5.5.3.4 事后比較
5.5.4 統(tǒng)計(jì)學(xué)意義與生物學(xué)相關(guān)性
5.5.5 樣本數(shù)量估計(jì)
5.6 有監(jiān)督的預(yù)測(cè)
5.6.1 分類(lèi)與假設(shè)檢驗(yàn)
5.6.2 芯片分類(lèi)的挑戰(zhàn)
5.6.2.1 過(guò)度擬合
5.6.2.2 偏執(zhí)方差平衡
5.6.2.3 交叉效驗(yàn)
5.6.2.4 非唯一分類(lèi)解決方案
5.6.3 位點(diǎn)選擇方法
5.6.4 分類(lèi)方法
5.6.4.1 判別分析
5.6.4.2 最近鄰分析法
5.6.4.3 邏輯(Logistic)回歸
5.6.4.4 神經(jīng)網(wǎng)絡(luò)
5.6.4.5 支持向量機(jī)
5.6.4.6 分類(lèi)樹(shù)
5.6.4.7 集成方法
5.6.4.8 芯片預(yù)測(cè)分析(PAM)
5.6.4.9 方法比較
5.6.5 復(fù)雜的預(yù)測(cè)問(wèn)題
5.6.5.1 多級(jí)問(wèn)題
5.6.5.2 生存預(yù)測(cè)
5.6.6 樣本量
5.7 通路分析
5.7.1 通路分析的統(tǒng)計(jì)學(xué)方法
5.7.1.1 過(guò)表達(dá)分析
5.7.1.2 功能分類(lèi)評(píng)分
5.7.1.3 基因集分析
5.7.1.4 方法比較
5.7.2 數(shù)據(jù)庫(kù)
5.7.2.1 Gene ontology
5.7.2.2 京都基因與基因組百科全書(shū)(KEGG)
5.7.2.3 基因芯片通路分析(GenMAPP)
5.7.2.4 腺嘌呤富集元件數(shù)據(jù)庫(kù)(ARED)
5.7.2.5 概念圖(cMAP)
5.7.2.6 凋亡路徑圖(BioCarta)
5.7.2.7 染色體位置
5.8 其他分析方法
5.8.1 基因網(wǎng)絡(luò)分析
5.8.2 元分析
5.8.3 染色體位置
6 分析結(jié)果表示
6.1 數(shù)據(jù)可視化
6.1.1 熱圖
6.1.2 強(qiáng)度圖
6.1.3 基因表圖
6.1.4 維恩圖(Venn圖)
6.1.5 散點(diǎn)圖
6.1.5.1 火山圖(Volcano plot)
6.1.5.2 MA圖
6.1.5.3 高維數(shù)據(jù)的散點(diǎn)圖
6.1.6 柱狀圖
6.1.7 盒圖
6.1.8 小提琴圖表
6.1.9 密度圖
6.1.10 樹(shù)狀圖
6.1.11 基因表達(dá)通路
6.1.12 出版用圖表
6.2 生物學(xué)解釋
6.2.1 重要數(shù)據(jù)庫(kù)
6.2.1.1 Entrez Gene
6.2.1.2 AffymetriX網(wǎng)站(NetAffx)
6.2.1.3 OMIM
6.2.2 文獻(xiàn)挖掘
6.2.3 數(shù)據(jù)整合
6.2.3.1 多種分子篩選數(shù)據(jù)
6.2.3.2 系統(tǒng)生物學(xué)
6.2.4 實(shí)時(shí)定量聚合酶反應(yīng)(RTqPCR)驗(yàn)證
6.3 數(shù)據(jù)發(fā)表
6.3.1 ArrayExpress
6.3.2 基因表達(dá)文庫(kù)(GEO)
6.4 可重復(fù)性研究
7 藥物研發(fā)
7.1 早期標(biāo)志物的需求
7.2 關(guān)鍵路徑計(jì)劃
7.3 藥物發(fā)現(xiàn)
7.3.1 正常組織和病變組織的不同
7.3.2 疾病亞型的發(fā)現(xiàn)
7.3.3 分子靶標(biāo)的識(shí)別
7.3.4 分子特征譜
7.3.5 疾病模型特征
7.3.6 化合物分析
7.3.7 劑量效應(yīng)處理
7.4 藥物開(kāi)發(fā)
7.4.1 生物標(biāo)志物
7.4.2 響應(yīng)顯著性
7.4.3 毒理基因組學(xué)
7.5 臨床實(shí)驗(yàn)
7.5.1 功能指標(biāo)
7.5.2 結(jié)果預(yù)測(cè)的意義
8 使用R和Bioconductor
8.1 R和Bioconductor
8.2 R和Sweave(R語(yǔ)言的一種函數(shù))
8.3 R和Eclipse(一種代碼)
8.4 自動(dòng)芯片分析
8.4.1 裝載文件包
8.4.2 基因過(guò)濾
8.4.3 無(wú)監(jiān)督探索
8.4.4 差異表達(dá)檢驗(yàn)
8.4.5 有監(jiān)督分類(lèi)
8.5 其他芯片分析軟件
9 未來(lái)前景
9.1 同時(shí)分析不同數(shù)據(jù)類(lèi)型
9.2 未來(lái)的芯片
9.3 新一代(二代)測(cè)序:芯片的終結(jié)?
參考文獻(xiàn)
索引
附圖目錄
2.1 標(biāo)準(zhǔn)AffymetriX芯片圖
2.2 GC含量對(duì)信號(hào)強(qiáng)度的影響
2.3 同一探針集中的探針之間信號(hào)強(qiáng)度的差別
2.4 使用客戶(hù)選擇的CDF時(shí),探針集大小引起的差異
2.5 外顯子芯片和3′端芯片探針覆蓋范圍的比較
2.6 外顯子芯片的轉(zhuǎn)錄本注釋
3.1 性別特異基因Xist(X染色體失活特異轉(zhuǎn)錄本)
3.2 樣本切割產(chǎn)生誤差示例
3.3 甲狀腺素在小鼠紋狀體的表達(dá)
3.4 小鼠結(jié)腸樣本切割引起的誤差
3.5 降解與非降解RNA對(duì)比
3.6 RNA的降解圖顯示3′偏差
3.7 不同批次芯片的批間效果
4.1 芯片掃描圖像的一角
4.2 對(duì)數(shù)轉(zhuǎn)換的分配效應(yīng)
4.3 芯片數(shù)據(jù)中的兩種噪音成分
4.4 歸一化對(duì)強(qiáng)度依賴(lài)變異的影響
4.5 歸一化對(duì)MA圖的影響
4.6 MAS 5.0背景計(jì)算
4.7 由affyPLM產(chǎn)生的虛擬圖像
4.8 兩重復(fù)關(guān)聯(lián)評(píng)估重復(fù)性
4.9 中心定位前后的成對(duì)一致性
4.10 光譜圖評(píng)估重復(fù)性
4.11 由MAQC(生物芯片質(zhì)量控制)得到的歸一化前AffymetriX數(shù)據(jù)的盒式圖
4.12 來(lái)自MAQC研究得到的AffymetriX芯片數(shù)據(jù)的SPM(譜圖)
4.13 存在批次效應(yīng)的差異表達(dá)基因的強(qiáng)度圖
5.1 信息豐富的和不提供信息的探針集的探針比較
5.2 基因過(guò)濾對(duì)p值分布的影響
5.3 不同過(guò)濾技術(shù)排除基因的百分比
5.4 兩種過(guò)濾技術(shù)的差異
5.5 基因過(guò)濾技術(shù)的分布差別
5.6 在聚類(lèi)中的歐幾里得(Euclidean)和皮爾森(Pearson)距離
5.7 基于歐幾里得和皮爾森距離的ALL數(shù)據(jù)的分級(jí)聚類(lèi)
5.8 分級(jí)聚類(lèi)運(yùn)算的示意圖
5.9 k均值運(yùn)算的示意圖
5.10 ALL數(shù)據(jù)的主要成分分析
5.11 ALL數(shù)據(jù)的譜圖
5.12 t-檢驗(yàn)的可變性
5.13 t-檢驗(yàn)
5.14 不良的t-檢驗(yàn):變異對(duì)顯著性的影響
5.15 Δ=0.75的SAM圖
5.16 t分布
5.17 使用大樣本資料比較兩種差異表達(dá)檢驗(yàn)的方法(30 vs.30)
5.18 使用小樣本資料比較兩種差異表達(dá)檢驗(yàn)的方法(3 vs.3)
5.19 各種交互效應(yīng)的假設(shè)方案
5.20 用GLUCO數(shù)據(jù)中具有不同表達(dá)方式的四個(gè)基因解釋交互效應(yīng)
5.21 多種檢驗(yàn)校正方法及其如何處理假陽(yáng)性和假陰性
5.22 ALL數(shù)據(jù)組中調(diào)整過(guò)和未調(diào)整過(guò)的p值
5.23 高維性和過(guò)度擬合在分離中的關(guān)聯(lián)
5.24 過(guò)度擬合的問(wèn)題
5.25 嵌套循環(huán)交叉驗(yàn)證
5.26 利用PAM基因組合秩次升高
5.27 利用LASSO基因組合秩次升高
5.28 交叉驗(yàn)證中的位點(diǎn)排列
5.29 進(jìn)行分類(lèi)的最佳基因數(shù)量
5.30 懲罰回歸:懲罰的系數(shù)關(guān)聯(lián)
5.31 神經(jīng)網(wǎng)絡(luò)方案
5.32 支持向量機(jī)模型的二維可視框圖
5.33 使用MLP包含高秩基因組的GO通路
5.34 利用GSA含有高秩基因組的GO通路
5.35 BioCarta通路
5.36 識(shí)別差異表達(dá)的染色體區(qū)域
6.1 熱圖
6.2 強(qiáng)度圖
6.3 基因列表圖
6.4 Venn(維恩)圖
6.5 火山圖
6.6 MA圖
6.7 平滑散點(diǎn)圖
6.8 柱狀圖
6.9 數(shù)據(jù)組HD的盒圖
6.10 小提琴圖
6.11 密度圖
6.12 系統(tǒng)樹(shù)圖
6.13 重要基因組的GO通路
7.1 藥物開(kāi)發(fā)中的基因表達(dá)譜
7.2 Fos的劑量反應(yīng)特征
9.1 二代測(cè)序排序可能出現(xiàn)的錯(cuò)誤
表格目錄
1.1 雙通道ANOVA設(shè)計(jì)
2.1 AffymetriX探針集的類(lèi)型和名稱(chēng)
2.2 已經(jīng)不再使用的AffymetriX探針集和名稱(chēng)
2.3 原始AffymetriX探針集的注釋級(jí)別
2.4 產(chǎn)生客戶(hù)可選擇的CDF的規(guī)則
2.5 基于Ensembl Gene數(shù)據(jù)庫(kù)的HG U133 plus 2.0探針的使用
3.1 不同樣本的RNA產(chǎn)率
4.1 背景微小差異的影響
5.1 修正p值的計(jì)算
5.2 分類(lèi)和假設(shè)檢驗(yàn)
5.3 采用LASSO和PAM選擇的重要基因
5.4 懲罰回歸:基因選擇
5.5 采用MLP選擇的重要基因
5.6 采用GSA選擇的前5個(gè)上調(diào)基因組和前5個(gè)下調(diào)基因組
BioBox目錄
1.1 基因表達(dá)芯片
1.2 分子生物學(xué)的中心法則
1.3 siRNA
1.4 表型
2.1 剪接變異
2.2 基因
3.1 Northern雜交
3.2 轉(zhuǎn)錄因子
3.3 血液
3.4 細(xì)胞培養(yǎng)
3.5 X染色體失活:Xist
3.6 凝膠電泳
3.7 生物分析儀進(jìn)行RNA分析
3.8 RTqPCR(熒光定量PCR)
5.1 管家基因
7.1 生物標(biāo)志物
7.2 EC50,ED50,IC50,LC50和LD50
7.3 生物標(biāo)志物和臨床意義
7.4 基因表達(dá)的意義
9.1 表觀(guān)遺傳學(xué)的實(shí)例:DNA甲基化
StatsBox目錄
1.1 關(guān)聯(lián)的兩種解釋
3.1 能力
4.1 準(zhǔn)度和精度
4.2 貝葉斯統(tǒng)計(jì)
4.3 可重復(fù)性
4.4 關(guān)聯(lián)假設(shè)
5.1 參數(shù),變量,統(tǒng)計(jì)
5.2 完全擬合
5.3 有監(jiān)督和無(wú)監(jiān)督的研究
5.4 重取樣技術(shù)
5.5 神經(jīng)網(wǎng)絡(luò)
5.6 多變量投影方法的步驟
5.7 確定差異表達(dá)的步驟
5.8 比值的對(duì)數(shù)=對(duì)數(shù)差異
5.9 零假設(shè)和p值
5.10 變異,標(biāo)準(zhǔn)偏差和標(biāo)準(zhǔn)誤差
5.11 經(jīng)驗(yàn)貝葉斯方法
5.12 顯著性水平和能力
5.13 參數(shù)和非參數(shù)檢驗(yàn)比較
5.14 Explanatory和響應(yīng)變異
5.15 通用線(xiàn)性模型
5.16 測(cè)量規(guī)模
5.17 交互反應(yīng)
5.18 規(guī)則化或懲罰
5.19 敏感性和特異性
5.20 多重檢驗(yàn)校正步驟
5.21 信息并不是越多越好
5.22 核心技術(shù)
5.23 刀切法和自助法

章節(jié)摘錄

Chapter 1Biological question All experimental work starts in principle with a question. This also applies to the field of molecular biology. A molecular scientist is using a certain technique to answer a specific question such as, “Does the cell produce more of a given protein when treated in a certain way?” Questions in molecular biology are indeed regularly focused on specific proteins or genes, often because the applied technique cannot measure more. Gene expression studies that make use of microarrays also start with a biological question. The largest difference to many other molecular biology approaches is, however, the type of question that is being asked. Scientists will typically not run arrays to find out whether the expression of a specific messenger RNA is altered in a certain condition. More often they will focus their question on the treatment or the condition of interest. Centering the question on a biological phenomenon or a treatment has the advantage of allowing the researcher to discover hitherto unknown alterations. On the other hand, it poses the problem that one needs to define when an“interesting” alteration occurs. 1.1 Why gene expression? 1.1.1 Biotechnological advancements Research evolves and advances not only through the compilation of knowledge but also through the development of new technologies. Traditionally, researchers were able to measure only a relatively small number of genes at a time. The emergence of microarrays (see BioBox 1.1) now allows scientists to analyze the expression of many genes in a single experiment quickly and efficiently. 1.1.2 Biological relevance Living organisms contain information on how to develop its form and structure and how to build the tools that are responsible for all biological processes that need to be carried out by the organism. This information ? the genetic ..........................................Geneexpressionmicroarrays.Inmicroarrays,thousandstomillionsofprobesarefixedtoorsynthesizedonasolidsur-face,beingeitherglassorasiliconchip.Thelatterexplainswhymicroarraysarealsooftenreferredtoaschips.Thetar-getsoftheprobes,themRNAsamples,arelabelledwithfluo-rescentdyesandarehybridizedtotheirmatchingprobes.Thehybridizationintensity,whichestimatestherelativeamountsofthetargettranscripts,canafterwardsbemeasuredbytheamountoffluorescentemissionontheirrespectivespots.Therearevariousmicroarrayplatformsdifferinginarrayfabrication,thenatureandlengthoftheprobes,thenumberoffluorescentdyesthatarebeingused,etc.BioBox 1.1: Gene expression microarrays content ? is encoded in information units referred to as genes. The whole set of genes of an organism is referred to as its genome. The vast majority of genomes are encoded in the sequence of chemical building blocks made from deoxyribonucleic acid (DNA) and a smaller number of genomes are composed of ribonucleic acid (RNA), e.g., for certain types of viruses. The genetic information is encoded in a specific sequence made from four different nucleotide bases: adenine, cytosine, guanine and thymine. A slighlty different composition of building blocks is present in mRNA where the base thymine is replaced by uracil. Genetic information encoding the building plan for proteins is transferred from DNA to mRNA to proteins. The gene sequence can range in length typically between hundreds and thousands of nucleotides up to even millions of bases. The number of genes that contain protein-coding information is expected to be between 25,000 to 30,000 when looking at the human genome. A protein is made by constructing a string of protein building blocks (amino acids). The order of the amino acids in a protein matches the sequence of the nucleotides in the gene. In other words, messenger RNA interconnects DNA and protein, and also has some important practical advantages compared to both DNA and proteins (see BioBox 1.2). Increasing our knowlegde about the dynamics of the genome as manifested in the alterations in gene expression of a cell upon treatment, disease, development or other external stimuli, should enable us to transform this knowledge into better tools for the diagnosis and treatment of diseases. DNA is made of two strands forming together a chemical structure that is called “double helix.” The two strands are connected with one another via pairs of bases that form hydrogen bonds between both strands. Such pairing of so-called “complementary” bases occurs only between certain pairs. ..........Centraldogmaofmolecularbiology.Thedogmaofmolec-ularbiologyexplainshowtheinformationtobuildproteinsistransferredinlivingorganisms.Thegeneralflowofbiologicalinformation(greenarrows)hasthreemajorcomponents:(1)DNAtoDNA(replication)occursinthecellnucleus(drawninyellow)priortocelldevision,(2)DNAtomRNA(transcrip-tion)takesplacewheneverthecell(drawninlightred)needstomakeaprotein(drawnaschainofreddots),and(3)mRNAtoproteins(translation)istheactualproteinsynthesisstepintheribosomes(drawningreen).Besidesthesegeneraltransfersthatoccurnormallyinmostcells,therearealsosomespecialinformationtransfersthatareknowntooccurinsomevirusesorinalaboratoryexperimentalsetting.BioBox 1.2: Central dogma of molecular biology ..........................................Hydrogen bonds can be formed between cytosine and guanine or between adenine and thymine. The pairing of the two strands occurs in a process called “hybridization.” Compared to DNA, mRNA is more dynamic and less redundant. The information that is encoded in the DNA is made available for processing in a step called “gene expression” or “transcription.” Gene expression is a highly complex and tightly regulated process by which a working copy of the original sequence information is made. This allows a cell to respond dynamically both to environmental stimuli and to its own changing needs, while DNA is relatively invariable. Furthermore, as mRNA constitutes only the expressed part of the DNA, it focuses more directly on processes underlying biological activity. This filtering is convenient as the functionality of most DNA sequences is irrelevant for the study at hand. Compared to proteins, mRNA is much more measurable. Proteins are 3D conglomerates of multiple molecules and cannot benefit from the hybridising nature of the base pairs in the 2D, single molecule, structure of mRNA and DNA. Furthermore, proteins are very unstable due to denaturation, and cannot be preserved even with very laborious methods for sample extraction and storage. When using microarrays to study alterations in gene expression, people normally will only want to study the types of RNA that code for proteins ? the messenger RNA (mRNA). It is however important to keep in mind that RNAnot only contains mRNA?acopyofa section of the genomic DNA carrying the information of how to build proteins. Besides the code for the synthesis of ribosomal RNA, there are other non-coding genes that, e.g., contain information for the synthesis of RNA molecules. These RNAs have different functions that range from enzymatic activities to regulating transcription of mRNAs and translation of mRNA sequences to proteins. The numbers of these functional RNAs that are encoded in the genome are not known. Initial studies looking at the overall transcriptional activity along the DNA are predicting that the number will most likely be larger than the number of protein-coding genes. People used to say that a large portion of the genomic information encoded in the DNA are useless (“junk DNA”). Over the last years scientific evidence has accumulated that a large proportion of the genome is being transcribed into RNAs of which a small portion constitutes messenger RNAs. All these other non-coding RNAs are divided into two main groups depending on their size. While short RNAs are defined to have sizes below 200 bases, the long RNAs are thought to be mere precursors for the generation of small RNAs, of which the function is currently still unknown ? in contrast to the known small RNAs such as microRNAs or siRNAs[6] (see BioBox 1.3 for an overview of different types of RNA). Microarrays are also being made to study differences in abundance of these kinds of RNA. ..........RNA.IncontrasttomRNA(messengerRNA)whichcontainstheinformationofhowtoassembleaprotein,therearealsodifferenttypesofnon-codingRNA(sometimesabbreviatedasncRNA)a.Herearethetypesthataremostrelevantinthecontextofthisbook:miRNAinlength,whichregulategeneexpression.longncRNA(longnon-codingRNA)arelongRNAmoleculesthatperformregulatoryroles.AnexampleisXIST,whichcanalsobeusedfordataqualitycontroltoidentifythegenderofasubject(seeBioBox3.5).rRNA(ribosomalRNA)arelongRNAmoleculesthatmakeupthecentralcomponentoftheribosomeb.TheyareresponsiblefordecodingmRNAintoaminoacidsandareusedforRNAqualitycontrolpurposes(seeSection3.1.2.8).siRNA(smallinterferingRNA)aresmalldouble-strandedRNAmoleculesofabout20-25nucleotidesinlengthandplayavarietyofrolesinbiology.ThemostcommonlyknownfunctionisaprocesscalledRNAinterference(RNAi).InthisprocesssiRNAsinterferewiththeex-pressionofaspecificgene,leadingtoadownregulationofthesynthesisofnewproteinencodedbythatgenec.tRNA(transferRNA)aresmallsingle-strandedRNAmoleculesofabout74-95nucleotidesinlenghts,whichtransferasingleaminoacidtoagrowingpolypeptidechainattheribosomalsiteofproteinsynthesis.EachtypeoftRNAmoleculecanbeattachedtoonlyonetypeofaminoacid.aNon-codingRNAreferstoRNAmoleculesthataretranscribedfromDNAbutnottranslatedintoprotein.bRibosomescanbeseenastheproteinmanufacturingmachineryofalllivingcells.cThereare,however,alsoprocessesknownassmallRNA-inducedgeneactivationwherebydouble-strandedRNAstargetgenepromoterstoinducetranscriptionalactivationofassociatedgenes.BioBox 1.3: siRNA ..........................................In this book we will focus on studying mRNA. However, most likely many remarks given on the experimental design and the data analysis will apply to the study of small RNA as well. 1.2 Research question The key to optimal data analysis lies in a clear formulation of the research question. Being aware of having to define what one considers to be a “relevant” finding in the data analysis step will help in asking the right question and in designing the experiment properly so that the question can really be answered. A well-thought-out and focused research question leads directly into hypotheses, which are both testable and measurable by proposed experiments. Furthermore, a well-formulated hypothesis helps to choose the most appropriate test statistic out of the plethora of available statistical procedures and helps to set up the design of the study in a carefully considered manner. To formulate the right question, one needs to disentangle the research topic into testable hypotheses and to put it in a wider framework to reflect on potentially confounding factors. Some of the most commonly used study designs in microarray research will be introduced here by means of real-life examples. For each type of study, research questions are formulated and example datasets described. These datasets will be used troughout the book to illustrate some technical and statistical issues. 1.2.1 Correlational vs. experimental research Microarray research can either be correlational or experimental. In correlational research, scientists generally do not apply a treatment or stimulus to provoke an effect on, e.g., gene expression (influence variables), but measure them and look for correlations with mRNA (see StatsBox 1.1). A typical example are cohort studies, where individuals of populations with specific characteristics (like diseased patients and healthy controls) are sampled and analysed. In experimental research, scientists manipulate certain variables (e.g., apply a compound to a cell line) and then measure the effects of this manipulation on mRNA. Experiments are designed studies where individuals are assigned to specifically chosen conditions, and mRNA is afterwards collected and compared. It is important to comprehend that only experimental data can conclusively demonstrate causal relations between variables. For example, if we found that a certain treatment A affects the expression levels of gene X, then we can conclude that treatment A influences the expression of gene X. Data from

圖書(shū)封面

圖書(shū)標(biāo)簽Tags

無(wú)

評(píng)論、評(píng)分、閱讀與下載

還沒(méi)讀過(guò)(91)
勉強(qiáng)可看(660)
一般般(112)
內(nèi)容豐富(4671)
強(qiáng)力推薦(383)

基于Affymetrix芯片的基因表達(dá)研究 PDF格式下載

用戶(hù)評(píng)論 (總計(jì)2條)

剛要做基因芯片實(shí)驗(yàn)，買(mǎi)來(lái)看看
很專(zhuān)業(yè)的書(shū)，只翻譯了每張的開(kāi)頭，英文原版看著有些吃

基于Affymetrix芯片的基因表達(dá)研究

用戶(hù)評(píng)論 (總計(jì)2條)

推薦圖書(shū)

相關(guān)圖書(shū)