u***t 发帖数: 3986 | 1 I have a huge recordset (50 millions rows x N columns)
I am going to break the recordset into 100 smaller sub-recordsets with 500,
000 rows each, by setting pagesize:
rs.PageSize = 500,000
Now, I got 100 pages of this recordsets.
Questions:
How do I export all these 100 pages into 100 excel files? Or say, I do I
treat these 100 pages as 100 sub-recordsets?
I am using below methods to load the data to excel:
ojbExcelWorkSheet.Range.CopyFromRecordset rs
Anybody?
* I don't want to treat records individually, instead I must them in whole
as recordset.
Thanks! |
w*r 发帖数: 2421 | 2 do not use that, page the record set with that many record will eat
a lot of memory in VB engine.
You should consider this:
use loop with counter to write data directly to CSV file
when counter goes up to 500K, close csv file and reopen another one
with file handler, reset counter to zero, keep writing.
with single record operation for the loop, it will be slower, however, you
can further optimize it by having a smller page size to optimize the data
fetch side oepration (handled by ado.net automatically), then in you own
program, using stringbuffer to temporarily store text and write to disks in
"chuncks" by flush the stringbuffer to disk file every 100 records or 1000
records. that will improve IO efficiency.
in this case, fixed size record text works better when you have better
control how big the bugger is and setup appropriated string/stream writers
in your vb code.
【在 u***t 的大作中提到】 : I have a huge recordset (50 millions rows x N columns) : I am going to break the recordset into 100 smaller sub-recordsets with 500, : 000 rows each, by setting pagesize: : rs.PageSize = 500,000 : Now, I got 100 pages of this recordsets. : Questions: : How do I export all these 100 pages into 100 excel files? Or say, I do I : treat these 100 pages as 100 sub-recordsets? : I am using below methods to load the data to excel: : ojbExcelWorkSheet.Range.CopyFromRecordset rs
|
g***l 发帖数: 18555 | 3 EXCEL不是用来存储数据的,那么多记录,所以才用数据库啊,EXCEL管用,人家数据库
不都倒闭了。 |
i****a 发帖数: 36252 | 4 question is, why do you want to have 100 excel files with 500k records each?
【在 u***t 的大作中提到】 : I have a huge recordset (50 millions rows x N columns) : I am going to break the recordset into 100 smaller sub-recordsets with 500, : 000 rows each, by setting pagesize: : rs.PageSize = 500,000 : Now, I got 100 pages of this recordsets. : Questions: : How do I export all these 100 pages into 100 excel files? Or say, I do I : treat these 100 pages as 100 sub-recordsets? : I am using below methods to load the data to excel: : ojbExcelWorkSheet.Range.CopyFromRecordset rs
|
g***l 发帖数: 18555 | 5 因为EXCEL有个65535的RECORD限制
each?
【在 i****a 的大作中提到】 : question is, why do you want to have 100 excel files with 500k records each?
|
i****a 发帖数: 36252 | 6 Excel 2010 limit is higher.
but my question is what LZ wants to do with records in excel. there maybe
a better way than 100 excel files
and since LZ asked in DB board, I assume the data is from a database. if
LZ has a real need to have 100 excel files, exporting directly from
database is more efficient.
【在 g***l 的大作中提到】 : 因为EXCEL有个65535的RECORD限制 : : each?
|
g***l 发帖数: 18555 | 7 输出到CSV或者ACCESS不就得了,问题是那么大你怎么QUERY啊,找什么数据,他们自己
都不知道吧,EXCEL去拔份几百万的数据,有点可笑 |
u***t 发帖数: 3986 | 8 Excel is used to store data from records using query provided by offshore
resource, after writing the recordset to file, buffer is released.
Sequential looping is far slower during execution. That's why I want to use
process the comparison using recordset.
The reason I don't export directly from DB to file, is that only data with
certain criteria is needed from source DB...
bottomline, looping isn't efficient... |
u***t 发帖数: 3986 | 9 500,000 is the optimal maxrownum that an excel can hold. Get 500k from
source, then get 500k from target, then use a query to compare them, write
mismatch to a third excel file, takes 10 minutes.
With loop on the source, whole day.
each?
【在 i****a 的大作中提到】 : question is, why do you want to have 100 excel files with 500k records each?
|
g***l 发帖数: 18555 | |
|
|
u***t 发帖数: 3986 | 11 I have yet to see a solution from you, pro!
【在 g***l 的大作中提到】 : 这种要让专业人士笑死的。
|
w*r 发帖数: 2421 | 12 totall confused... sorce target are in different DB instance??? then you
should consider load them into same db instance and use DB engine to do such
comparison
【在 u***t 的大作中提到】 : 500,000 is the optimal maxrownum that an excel can hold. Get 500k from : source, then get 500k from target, then use a query to compare them, write : mismatch to a third excel file, takes 10 minutes. : With loop on the source, whole day. : : each?
|
u***t 发帖数: 3986 | 13 Use the same Queries from offsource developers to clone a 2nd DB schema for
testing purpose, does that work?!
such
【在 w*r 的大作中提到】 : totall confused... sorce target are in different DB instance??? then you : should consider load them into same db instance and use DB engine to do such : comparison
|
g***l 发帖数: 18555 | 14 弄个BACKUP ZIP一下,下载下来RESTORE一个不就完了,还EXCEL,还要IMPORT多个文件,最后这里出错那里出错的,笑死。
for
【在 u***t 的大作中提到】 : Use the same Queries from offsource developers to clone a 2nd DB schema for : testing purpose, does that work?! : : such
|
u***t 发帖数: 3986 | 15 Dev's query is based on business rule.
Testing's query is also based on business rule.
my point is using Dev's queries doesn't serve the purpose of data validation.
件,最后这里出错那里出错的,笑死。
【在 g***l 的大作中提到】 : 弄个BACKUP ZIP一下,下载下来RESTORE一个不就完了,还EXCEL,还要IMPORT多个文件,最后这里出错那里出错的,笑死。 : : for
|
B*****g 发帖数: 34098 | 16 能写中文吗?
validation.
【在 u***t 的大作中提到】 : Dev's query is based on business rule. : Testing's query is also based on business rule. : my point is using Dev's queries doesn't serve the purpose of data validation. : : 件,最后这里出错那里出错的,笑死。
|
g***l 发帖数: 18555 | 17 什么不是BASED ON BUSINESS RULE啊,现在是说数据,大数据量传输怎么能用EXCEL呢
?EXCEL不是数据库,多个文件就是自找麻烦
validation.
【在 u***t 的大作中提到】 : Dev's query is based on business rule. : Testing's query is also based on business rule. : my point is using Dev's queries doesn't serve the purpose of data validation. : : 件,最后这里出错那里出错的,笑死。
|