掌握R语言文件读取方法_java

目标

掌握 R语言文件读取方法

学习笔记

utils包内Date Input用法
base包内readLines用法
stringi包内stri_read_lines
xlsx包内Date Input用法

1.utils包内Date Input用法

以read.table为例。

read.table参数详细说明见http://www.360doc.com/showweb/0/0/1029326103.aspx

read.table(file, header = FALSE, sep = “”, quote = “”'",
dec = “.”, numerals = c(“allow.loss”, “warn.loss”, “no.loss”),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = “NA”, colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = “#”,
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = FALSE,
fileEncoding = “”, encoding = “unknown”, text, skipNul = FALSE)

参数file

写法1：“文件名称”,如果不写路径，是会在当前路径下读取，可用getwd()获取当前路径。可用setwd（“路径”）修改当前路径。
写法2：绝对路径\文件名称，比如“D: \…\test.xlsx”。
写法2：“clipboard”,利用复制，然后读取

getwd()
setwd("....\...")#输入想要设置的路径

在工作路径中设计一张表来测试，命名为test.xlsx 。

x1<-read.table('test.xlsx')
View(x1)

x1为

x1<-read.table(‘test.xlsx’)
Warning messages:
1: In read.table(“test.xlsx”) : line 1 appears to contain embedded nulls
2: In read.table(“test.xlsx”) :
incomplete final line found by readTableHeader on ‘test.xlsx’

报错“incomplete final line”，表示识别不到excel哪里是最后一行，我也不知道该怎么在excel里表示最后一行，所以建议不用read.table() 直接读excel。

解决办法：复制数据到txt文件里，命名为test.txt

x2<-read.table('test.txt')
print(x2)

x2为

可以看到第一行不被读取，为什么？这就要看下参数comment.char了

参数comment.char

这个参数用来识别注释字符的开始，默认值为“#”，所以我的txt里的#开头的一行被识别为注释，不会被读取。所以设置comment.char = “”，试下

x2<-read.table('test.txt',comment.char = "")

x2为

那我现在想把第一行作为表头，就要设置参数header了。

参数header

默认为false，表示第一行不作为表头。若想将第一行作为表头，可设置为TURE。

x2<-read.table('test.txt',comment.char = "",header = TRUE)

x2见下图，表头里本来为#的，无法识别，被记为X.

想要指定列名，行名，就要用到参数 row.names和col.names了

参数 row.names和col.names

以改变列名举例，

x2<-read.table('test.txt',comment.char = "",header = TRUE,
               col.names=c("a","b","c"))

x2为

列名修改成功。
这里为什么会用函数c（）？函数c（）会将赋值结合成向量或者列表，我习惯用这个。

可以用class（）查看读取后的数据类型

class(x2)
[1] “data.frame”

可见read.table() 主要用来读取表格型数据，读入后为"data.frame"类型的数据。

以上为read.table的用法研究。

在utils包下除了read.table这个，还有这些读取文件的方法，参数类似，但默认值有所区别。

read.csv(file, header = TRUE, sep = “,”, quote = “”",
dec = “.”, fill = TRUE, comment.char = “”, …)

read.csv2(file, header = TRUE, sep = “;”, quote = “”",
dec = “,”, fill = TRUE, comment.char = “”, …)

read.delim(file, header = TRUE, sep = “\t”, quote = “”",
dec = “.”, fill = TRUE, comment.char = “”, …)

read.delim2(file, header = TRUE, sep = “\t”, quote = “”",
dec = “,”, fill = TRUE, comment.char = “”, …)

2. base包内readLines用法

readLines(con = stdin(), n = -1L, ok = TRUE, warn = TRUE,
encoding = “unknown”, skipNul = FALSE)

x3<-readLines(‘test.txt’)
x3
[1] “#\t中文\tEnglish” “1\t2\t3”
[3] “4\t5\t6” “中文\t8\t9”
[5] “13\tEnglish\t9” “13\t14\t%”
[7] “16\t17\t18” “19\t20\t21”
class(x3)
[1] “character”

对于表格型数据，readLines会把制表符识别为“\t”.

3. stringi包内stri_read_lines

stri_read_lines(con, encoding = NULL, fname = con, fallback_encoding = NULL)

首先安装stringi包

install.packages("stringi")
library(stringi)

x3<-stri_read_lines(‘test.txt’)
x3
[1] “#\t中文\tEnglish” “1\t2\t3”
[3] “4\t5\t6” “中文\t8\t9”
[5] “13\tEnglish\t9” “13\t14\t%”
[7] “16\t17\t18” “19\t20\t21”
class(x3)
[1] “character”

对于表格型数据，stri_read_lines会把制表符识别为“\t”.

4.xlsx包内Date Input用法

首先需要用install.packages（）安装xlsx包，然后用library()加载包。

install.packages("xlsx")
library(xlsx)

如果电脑上没有安装Java，此时会报错

错误: package or namespace load failed for ‘xlsx’:
loadNamespace()里算’rJava’时.onLoad失败了，详细内容：调用: fun(libname, pkgname)
错误: JAVA_HOME cannot be determined from the Registry

所以需要通过官网https://www.oracle.com/java/technologies/javase-downloads.html 安装Java.

但是报错。等我解决了这个问题再继续研究。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/795417.html