Generate huge reports from BigQuery efficiently


In this blog, we will learn how to generate huge reports from Bigquery efficiently.

BigQuery is a fully managed enterprise data warehouse that helps us to manage and analyze our data. Its serverless architecture allows us to use SQL queries to fetch desired results from the databases which have many tables. It enables us to store massive amounts of data in the tables with ease. This data can help us analyze better and make decisions precisely.

Retrieving huge amounts of data in order to be viewed and analyzed can be done in two different methods. Both ways require us to first fetch our desired data through queries.

Write CSV File

In the first method, after the query is executed successfully and the data is generated. We can write this data in a file preferably a CSV or an XLSX file. This file can be parsed easily for any report generation. Since we are writing data in files in our system it is an expensive operation because it includes I/O operation. So we will have to wait for all the data to be written in the file and then only can we generate a report.

Below is the code to write data in a CSV file:

public static async Task WriteBigQueryDataToCsvFile()
        {
            GetQueryResultsOptions getQueryResultsOptions = new GetQueryResultsOptions();
            getQueryResultsOptions.PageSize = 25000;
            getQueryResultsOptions.StartIndex = 0;
            getQueryResultsOptions.PageToken = null;
            string query = "[insert your select query here]";
            using(var client = await BigQueryClient.CreateAsync("EnterPackageIDHere", credentials))
            {
                BigQueryResults result = await client.ExecuteQueryAsync(query, null);
                using (StreamWriter csvWriter = new StreamWriter("C:\\Windows\\Temp\\SampleReportName.csv", true))
                {
                    var page = result.ReadPage(25000);

                    do
                    {
                        foreach (BigQueryRow row in page.Rows)
                        {
                            foreach (var data in row.RawRow.F)
                            {
                                csvWriter.Write(data.V.ToString());
                            }
                            csvWriter.Write(csvWriter.NewLine);
                        }
                        getQueryResultsOptions.PageToken = page.NextPageToken;
                        getQueryResultsOptions.StartIndex = null;
                        result = client.GetQueryResults(page.JobReference, getQueryResultsOptions);
                        page = result.ReadPage(25000);

                    } while (getQueryResultsOptions.PageToken != null);
                }
}
        }

Export Job

While the second method involves using an export job provided by Google Cloud Platform (GCP), this job helps in exporting data that we queried into several formats including comma-separated file (CSV) format. BigQuery makes it possible to export up to 1GB of data into a single file. If we are exporting more than 1 GB of data, then the data gets split into multiple files, the size of which may vary. This method is faster than the first because it avoids the expensive I/O operation and thus saves time to generate CSV files. These CSV files then can be downloaded and be used for report generation.

Below is the code for the export job:

public static async Task ExportDataToGCSSampleFunction()
        {
            string query = "Select * From SampleTable";
            using (var client = await BigQueryClient.CreateAsync("SamplePackageId", credentials))
            {
                BigQueryResults result = await client.ExecuteQueryAsync(query, null);
                String filePath = "gs://SampleGCSBucket/";

                BigQueryJob bqJob = client.CreateExtractJob(result.TableReference, filePath+ "TableResult.csv");
                BigQueryJob bqJobStatus = await bqJob.PollUntilCompletedAsync();
            }
        }

That’s it for this blog. Hope it helped you to generate reports efficiently.

Leave A Comment

Your email address will not be published. Required fields are marked *