温馨提示:本站仅提供公开网络链接索引服务,不存储、不篡改任何第三方内容,所有内容版权归原作者所有
AI智能索引来源:http://www.a1qa.com/blog/testing-big-data-three-fundamental-components
点击访问原文链接

Testing Big Data: three fundamental components

Testing Big Data: three fundamental components Services Back Full-cycle testing services QA consulting Software lifecycle QA Ad-hoc testing Test automation Pre-certification testing User acceptance testing Crowdsourced testing Documentation services QA for digital transformation Engagement models Team augmentation Dedicated QA teams Managed testing services Fixed-price QA projects Quality engineering Shift-left testing Continuous testing Testing in Agile Multi-vendor environment Complete test coverage Functional testing Performance testing Cybersecurity testing Accessibility testing Compatibility testing Embedded testing Integration testing Localization testing Microservices testing Migration testing Regression testing Usability testing Systems & platforms Web apps Mobile apps Blockchain CRM ERP AR/VR Cloud Internet of things Medical devices Desktop Big data Salesforce SaaS AWS Azure Industries Back Software development Banking and financial services Telecommunications Media and entertainment Travel and hospitality eCommerce Insurance Healthcare Gaming Education Blog 25 April 2025 Building a safety net for banks: the role of testing in the ISO 20022 shift Approach Back How we work Testing environment Industry expertise Process maturity QA outsourcing ...With fast response to our requirements and professional approach, I can definitely recommend the cooperation with a1qa. Rainar Ütt, Head of Quality, InnoGames Portfolio Blog Company Back About us Clients QA Academy Awards News Values Events Contact us Case study a1qa helps roll out multi-regional mobile solutions for a leader in financial technology Contact us Blog Testing Big Data: three fundamental components Big Data is a big topic in software development today. When it comes to practice, software testers may not yet fully understand what Big Data exactly is. What testers do know is that you need a plan for testing it. 15 July 2014 Big data testing Home Blog Testing Big Data: three fundamental components Uploading the initial data to HDFS Article by a1qa a1qa Big Data is a big topic in software development today, and quality assurance consulting is no exception to it. When it comes to practice, software testers may not yet fully understand what Big Data exactly is. What testers do know is that you need a plan for testing it.

The problem here is the lack of a clear understanding about what to test and how deep inside a tester should go. There are some key questions that must be answered before going down this path. Since most Big Data lacks a traditional structure, what does Big Data quality look like? And what are the most appropriate software testing tools?

As a software tester, it is imperative to first have a clear definition of Big Data. Many of us improperly believe that Big Data is just a large amount of information. This is a completely wrong approach. For example, a 2 petabyte Oracle database alone doesn’t constitute a Big Data situation – just a high load one. To be very precise, Big Data is a series of approaches, tools and methods for processing high volumes of structured and (most importantly) of unstructured data. The key difference between Big Data and “ordinary” high load systems is the ability to create flexible queries.

The Big Data trend first appeared five years ago in U.S., when researchers from Google announced their global achievement in the scientific journal, Nature. Without any significant results of medical tests, they were able to track the spread of flu in the U.S. by analyzing numbers of Google search queries to track influenza-like illness in a population.

Today, Big Data can be described by three “Vs”: Volume, Variety and Velocity. In other words, you have to process an enormous amount of data of various formats at high speed. The processing of Big Data, and, therefore its software testing process, can be split into three basic components.

The process is illustrated below by an example based on the open source Apache Hadoop software framework:

Uploading the initial data to the Hadoop Distributed File System (HDFS).Execution of Map-Reduce operations.Rolling out the output results from the HDFS. Uploading the initial data to HDFS In this first step, the data is retrieved from various sources (social media, web logs, social networks etc.) and uploaded to the HDFS, being split into multiple files:

Verify that the required data was extracted from the original system and there was no data corruption.Validate that the data files were uploaded to the HDFS correctly.Check the files partition and copy them to different data units.Determine the most complete set of data that needs to be checked. For a step-by-step validation, you can use such tools as Datameer, Talend or Informatica. Execution of map-reduce operations In this step, you process the initial data using a Map-Reduce operation to obtain the desired result. Map-reduce is a data processing concept for condensing large volumes of data into useful aggregated results:

Check required business logic on standalone unit and then on the set of units.Validate the Map-Reduce process to ensure that the “key-value” pair is generated correctly.Check the aggregation and consolidation of data after performing “reduce” operation.Compare the output data with initial files to make sure that the output file was generated and its format meets all the requirements. The most appropriate language for the verification of data is Hive. Testers prepare requests with the Hive (SQL-style) Query Language (HQL) that they send to Hbase to verify that the output complies with the requirements. Hbase is a NoSQL database that can serve as the input and output for Map-Reduce jobs.

You can also use other Big Data processing programs as an alternative to Map-Reduce. Frameworks like Spark or Storm are good examples of substitutes for this programming model, as they provide similar functionality and are compatible with the Hadoop community.

Rolling out the output results from HDFS This final step includes unloading the data that was generated by the second step and loading it into the downstream system, which may be a repository for data to generate reports or a transactional analysis system for further processing: Conduct inspection of data aggregation to make sure that the data has been loaded into the required system and thus was not distorted. Validate that the reports include all the required data, and all indicators are referred to concrete measures and displayed correctly.

Testing data in a Big Data project can be obtained in two ways: copying actual production data or creating data exclusively for testing purposes – the former being the preferred method for software testers. In this case, the conditions are as realistic as possible and thus it becomes easier to work with a larger number of test scenarios. However, not all companies are willing to provide real data when they prefer to keep some information confidential. In this case, you must create testing data yourself or make a request for artificial info. The main drawback of this scenario is that artificial business scenarios created by using limited data inevitably restrict testing. Only real users themselves can detect defects in that case.

As speed is one of Big Data’s main characteristics, it is mandatory to do performance testing. A huge volume of data and an infrastructure similar to the production infrastructure is usually created for performance testing. Furthermore, if this is acceptable, data is copied directly from production.

To determine the performance metrics and to detect errors, you can use, for instance, the Hadoop performance monitoring tool. There are fixed indicators like operating time, capacity and system-level metrics like memory usage within performance testing.

To be successful, Big Data testers have to learn the components of the Big Data ecosystem from scratch. Since the market has created fully automated testing tools for Big Data validation, the tester has no other option but to acquire the same skill set as the Big Data developer in the context of leveraging the Big Data technologies like Hadoop. This requires a tremendous mindset shift for both the testers as well as testing units within organizations. In order to be competitive, companies should invest in Big Data-specific training needs and developing the automation solutions for Big Data validation.

In conclusion, Big Data processing holds much promise for today’s businesses. If you apply the right test strategies and follow best practices, you will improve Big Data testing quality, which will help to identify defects in early stages and reduce overall cost.

You can also read the article on Computer Technology Review.

Share this: More Posts 8 December 2023, by a1qa 3 min read The year in valuable conversations: recapping 2023 a1qa’s roundtables for IT executives  From dissecting novel industry trends to navigating effective ways of enhancing software quality — let’s recall all a1qa’s roundtables. Join us! Big data testing Cybersecurity testing Functional testing General Interviews Performance testing QA trends Quality assurance Test automation Usability testing Web app testing 30 July 2021, by a1qa 4 min read Big data testing 101: the complete guide Check out three QA practices to ensure well-organized big data systems and high data quality. Big data testing 30 November 2020, by a1qa 5 min read Acumatica: ensuring sound business operations with well-tested ERP system Internal business activities are advancing, while ERP systems’ usage is growing rapidly. Explore how to ascertain their accurate work through timely applying QA. Big data testing Cybersecurity testing ERP testing Functional testing Performance testing Test automation 28 October 2020, by a1qa 5 min read eHealth software testing: taking the digital Hippocratic oath Medicine has broken new ground. However, there’s still no room for errors. Get to know more information about effective testing approach in the health sector.  Big data testing Functional testing Performance testing QA in eHealth Test automation 27 May 2020, by a1qa 5 min read Following six main 2020 retail trends with QA In this article, we are talking about how QA supports prime retail trends. Big data testing Localization testing QA trends 20 February 2020, by a1qa 6 min read Finding technologies value during digital transformation journey To develop and make a good profit in the context of digital transformation, businesses have to follow the trends in this area. Make sure you know how technologies can help in the process of digital transformation. Big data testing Blockchain app testing Cloud-based testing IoT testing 21 January 2019, by a1qa 5 min read IT trends that will shape the face of QA in 2019 We’ve rounded up the top 11 tendencies that will determine the future of testing in 2019 and beyond. Agile Big data testing Cloud-based testing Cybersecurity testing IoT testing Performance testing QA trends Test automation 27 April 2018, by a1qa 4 min read Specifics of data warehouse and business intelligence testing How unbiased professional testing helps get confidence in business critical data. Big data testing Performance testing 24 January 2018, by a1qa 4 min read Testing trends for 2018 What trends will mark the software quality assurance of 2018? Read the article not to miss on the potential benefits while shaping your QA strategy.  Agile Big data testing Cybersecurity testing Mobile app testing Performance testing QA trends Test automation Related posts Get in touch Name Please fill in the required field. Email Email address seems invalid. Company Phone Project description Please fill in the required field. I hereby give my consent for a1qa and its affiliates to process my personal data in accordance with Privacy Notice for the purpose of handling my request and responding to it. I am aware of the fact that I have the right to withdraw my consent at any time. Please accept the terms to proceed. Add an attachment This file is too large Up to 5 attachments. File must be less than 5 MB.
Allowed types: jpg, jpeg, png, svg, pptx, pdf, doc, docx, ppt, odt File input 1 File input 2 File input 3 File input 4 File input 5 Send a message Thank you! Thank you for reaching out! We’ll get back to you shortly. Close We use cookies on our website to improve its functionality and to enhance your user experience. We also use cookies for analytics. If you continue to browse this website, we will assume you agree that we can place cookies on your device. For more details, please read our Privacy and Cookies Policy. Accept United States
160 Clairemont Ave, Suite 200, Decatur, GA 30030
+1 720 207 5122

United Kingdom
3rd Floor, 5-8 Dysart Street, Moorgate House, London, EC2A 2BX
+44 204 525 7620

Subscribe to news Subscribe to news Full name Please fill in the required field. Company Please fill in the required field. Email Email address seems invalid. I would like to subscribe to a1qa’s newsletter and other marketing communication. By clicking this checkbox, I give my consent for a1qa and its affiliates to process my personal data in accordance with the Privacy Notice.

You can unsubscribe at any time by clicking the button "Unsubscribe" at the bottom of every email. Please accept the terms to proceed. Subscribe Thank you! Thank you for reaching out! We’ll get back to you shortly. Close Follow us © a1qa software testing company, 2026. All rights reserved. Privacy Policy Quality

智能索引记录