VAPScript: 逆引きリファレンス

項目

サンプル

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30
3	32
3	32
4	32
4	40

=>

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30

C1	C2
3	32
3	32
4	32
4	40

6:4 でテーブルをランダムに分けたい

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30
3	32
3	32
4	32
4	40

=>

C1	C2
1	10
1	10
2	21
3	32
3	32
4	40

C1	C2
2	20
2	20
3	30
4	32

キーを単位に 6:4 のブロック比で前後２つに分けたい

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30
3	32
3	32
4	32
4	40

=>

C1	C2
1	10
1	10
2	20
2	20
2	21

C1	C2
3	30
3	32
3	32
4	32
4	40

キーを単位に 6:4 のブロック比でランダムに分けたい

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30
3	32
3	32
4	32
4	40

=>

C1	C2
2	20
2	20
2	21
4	32
4	40

C1	C2
1	10
1	10
3	30
3	32
3	32

[結果]

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30
3	32
3	32
4	32
4	40

=>

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30

C1	C2
3	32
3	32
4	32
4	40

[VAPScript]

c1("C1") = {1, 1, 2, 2, 2, 3, 3, 3, 4, 4 };
c2("C2") = {10,10,20,20,21,30,32,32,32,40};
table = cbind(c1,c2);
//
ratio = 0.6;
nr = as.integer(nrow(table)*ratio);
keycol = {1:nr};
result = sel(row=keycol, table);
result2 = sel(not row=keycol, table);

[解説]

まず、60%の行数を取得するために、sel(row=)関数を用いて取得します。
残りは sel(not row) 関数で取得します。

[結果]

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30
3	32
3	32
4	32
4	40

=>

C1	C2
1	10
1	10
2	21
3	32
3	32
4	40

C1	C2
2	20
2	20
3	30
4	32

[VAPScript]

c1("C1") = {1, 1, 2, 2, 2, 3, 3, 3, 4, 4 };
c2("C2") = {10,10,20,20,21,30,32,32,32,40};
table = cbind(c1,c2);
//
ratio = 0.6;
ijudge = as.integer(nrow(table)*ratio);
ctemp = shuffle({1:nrow(table)});
result = sel(ctemp(1) <= ijudge, table);
result2 = sel(ctemp(1) > ijudge, table);

[解説]

行ごとの取得判定に、数値列(ctemp)を用います。数値列の値が指定値よりも大きい(sel( <= ,)か小さい(sel( >, ))かで２つに分けます。
ランダムで取得するために、shuffle()関数で数値列の要素をランダムに並び替えます。

[結果]

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30
3	32
3	32
4	32
4	40

=>

C1	C2
1	10
1	10
2	20
2	20
2	21

C1	C2
3	30
3	32
3	32
4	32
4	40

[VAPScript]

c1("C1") = {1, 1, 2, 2, 2, 3, 3, 3, 3, 4 };
c2("C2") = {10,10,20,20,21,30,32,32,32,40};
table = cbind(c1,c2);
//
ratio = 0.6;
keyColName = "C1";
keyCol = remove_dup(table(keyColName), keyColName);
nr = as.integer(nrow(keyCol) * ratio);
keyColBefore = sel(row={1:nr}, keyCol);
result = sel(belong(table(keyColName), keyColBefore), table);
result2 = sel(not belong(table(keyColName), keyColBefore), table);

[解説]

キー列(C1) をおよそ 6:4 に分けます。

まずキー列(C1)の重複を除いた列 keyCol = {1,2,3,4} を取得します。
次に前半のキー列名の列データ keyColBefore を作成します。
前半のテーブルを sel(belong, ) で取得し、後半のテーブルを sel(not belong, ) で取得します。

[結果]

C1	C2
1	10
1	10
2	20
2	20
2	21
3	30
3	32
3	32
4	32
4	40

=>

C1	C2
2	20
2	20
2	21
4	32
4	40

C1	C2
1	10
1	10
3	30
3	32
3	32

[VAPScript]

c1("C1") = {1, 1, 2, 2, 2, 3, 3, 3, 4, 4 };
c2("C2") = {10,10,20,20,21,30,32,32,32,40};
table = cbind(c1,c2);
//
keyColName = "C1";
ratio = 0.6;
c1uniq("C1") = remove_dup(table(keyColName), keyColName);
c2rate("C2") = shuffle({1:nrow(c1uniq)});
tableJudge = cbind(c1uniq, c2rate);
ijudge = as.integer(nrow(tableJudge)*ratio);
mergeTable = merge(table, tableJudge, "00", keyColName, "C1");
ic=ncol(mergeTable);
result = sel(mergeTable(ic) <= ijudge, table);
result2 = sel(mergeTable(ic) > ijudge, table);

[別解; キー列が文字列のとき]

c1("C1") = {1, 1, 2, 2, 2, 3, 3, 3, 4, 4 };
c2("C2") = {10,10,20,20,21,30,32,32,32,40};
c1 = as.string(c1);
table = cbind(c1,c2);
//
keyColName = "C1";
ratio = 0.6;
key = table(keyColName);
uniqueKey = counts(key); // 重複を取り除く
keyLearn = make_sample(uniqueKey, ratio);
keyTest = sel(not belong(uniqueKey, keyLearn), uniqueKey);
result = sel(belong(key, keyLearn), table);
result2 = sel(belong(key, keyTest), table);

[解説]

作業テーブルとして、キー列(C1)と判定の数値列(C2)のテーブル(tableJudge)を作成します。更に、tableJudge と参照元の table を結合して、取得判定用のテーブル(mergeTable)を作成します。
取得判定の数値列(C2)を判定条件に用いて、２つのテーブルを取得します。

別解では、前半のキー列取得のために、 make_sample()関数を用いてキー列のランダムサンプリングを行ないます。