Home » Technology » 8 Interesting facts about data science

8 Interesting facts about data science

According to the We Are Social’s comprehensive study of digital, social and mobile usage around the world (2016): among the 7.4 billion inhabitants of the world 3.4 billion are web users, 2.3 billion are active social media users, 3.8 billion are unique mobile users. Zettabytes of information produced by these users are steadily growing. With operating such a significant capacity of information, the data analytics has evolved to transform into the separate field of science – the data science. This new field is nowadays under construction. It has a lot of contradictions and raises discussions about the aim, subject, format of existence and even about the correct naming. However, its expediency leaves less and less uncertainty.

Let’s for a moment leave our paper tasks to assignment help, become the data scientists together for a short time, and explore some interesting facts about data science.

1- Data science needs autonomy of scientists to achieve results

If you simply analyze information, you can be easily mixed up with a statistician. There is a major discussion on whether the data science is only a part of statistics or is a separate discipline which involves a more sophisticated approach, analyzing inputs from wider diapason of sources. Please, kindly put in the comment field “+” if you think it is an independent subject. Put “–“ if you consider it to be a part of statistics, cybernetics or any other science. Your general comments regarding this dispute, your views and ideas on the matter are also welcomed.

On a more practical level, there is also no shared vision. Who are the data scientists? Should there be a separate department for data science and analytics within the enterprise and how it will help the business operation?

When reading the Business Harvard Review’s famous article «The Sexiest Job of the 21st Century» one can get a notion on what skills should the data scientist maintain and on how much added-value can he or she bring if given the right amount of autonomy. The article was written in 2011 and has made a big resonance. Its statements and conclusions remain valid today. The market is still in great need of creative, passionate people who are skilled in math, physics, computer science, programming and the specific area they are working in. To top it all, they have to burn inside with a flame of genesis and to be the doers rather than just consultants.

Well, when you have such a player on your board it is crucial to untie his hands to get the practical result. In this respect, I like the example with LinkedIn’s “People you may know.” The thing is that a scientist rarely knows what information he may get from the analysis and how it may be used to benefit the business. It was the case of J.Goldman from LinkedIn. His gains and ideas often faced unproductive criticism and denial. Without CEO’s personal experience and the favorable attitude, empowering his data scientist, users wouldn’t have benefited from a very useful feature and the company – from a drastic increase in page views.

By the way, the mentioned above article in the Business Harvard Review made the word combination “data scientist” neologism, naming this profession the new sexy of the jobs in the 21st century and stating that shortage of data scientist could be trouble for some business areas in a short perspective.

2- Relevantly free but not a freelancer

As it looks like from a today’s perspective, it is extremely hard to work in the field of data science and analytics as a freelancer. The point is that this profession requires constant interaction with the business from inside. A data scientist himself is involved in bringing innovations to the company’s commodity, customer handling, the efficiency of internal processes, etc. He does not just give advice or show smart presentations. He is among the decision makers.

To reach this target a data scientist should acquire the information from different departments, should communicate to people from various levels of a hierarchy of the enterprise and has to process the sensitive business information.  Usually, the data analysts work under NDA. But even then companies seldom trust people from outside the entity. When in need, they prefer to hire consulting agencies rather than to carry on with self-occupied individuals.

3- Who are the judges?

A data scientist is a person who works with the most precious information of the firm, and it is intended not for the third parties. What should this specialist then present in his CV to show the level of expertise and his achievements? There are no standard criteria and single methodology to measure the scientist skills. All such evaluations are rather instinctive, and often it could be a self-PR that is hard to check or some ambiguous judgments on the matter. The situation complicates by the fact that with the number of the same tools two scientists will come to a unsimilar quality of results.

Employers use scales from 0 to 10 for each skill they stated in a job offer. This step brings more quantity to the evaluation; it does not remove subjectivity, however. You may also try to make a graph of your skills like the one shown below. This may help you to mark the areas you may need to develop.

There is no trustworthy expert certification to evaluate the proficiency in the data science, like PMP in project management, for example. However, there are platforms like Kaggle, KDDCup or DrivenData that organize competitions on data analysis. They take real-life projects, provided by existing companies. The participants offer their solutions for the tasks/problems and are ranked based on the effectiveness of the solution. The contest on such games is high, and it is hard to get into the top 30 of best solutions. That is why those scientists who were shortlisted for example, on Kaggle, are recognized as skilled professionals within the domain. This might be one more point of motivation to participate.

4- Big Data is watching you

Every beginner in data science looks on the growing numbers of information generation and consumption in the world and immediately thinks about the Big Data technologies he should be proficient in. Hadoop, Spark, MongoDB, Talend and another naming for the Big Data tools could demoralize even the most persistent scientist. In practice, not many firms need to analyze terabytes of information simultaneously. Even if it is the case, most of the inputs could be divided into the smaller aggregated parts and be analyzed with the information tools or packages of R, Python, XPath, etc.

As they say, future is coming already today. In the next, five to ten years the mentioned above workaround may not be the most comfortable way of analysis.

As the Forrester’s TechRadar assumes with its methodology: “the new, evolving tools to manage big data will grow at double-digit rates.”  Its analysts evaluate the top 10 promising big data analytics technologies. The judgment was made regarding their current stage of development and time they will move to the next level. It really might be the time to start an acquaintance with Big Data instruments.

5- Data are nothing

The data science and analytics are rich on disputes. In one of them, it goes about the DIKW model that is a part of the information management concept and forms the theoretical base for data science. On the basic level of this model stands Data. As far as the data are always unstructured and raw, they cannot be used as themselves and in this respect are deemed to be nothing. The scientist makes decisions when data pass through the DIKW stages. They should gain context or according to scientist G.Harmon:”the empirical perception” and become information, then with our experience information gains structure and value and becomes knowledge and finally primary information transform into facts and become wisdom.

Some opponents argue that any hierarchy of data, information, knowledge, and wisdom could not exist (R.Capurro). The data is a perplexity. Information equals the representation of meaning. Knowledge is a meaning selection based on communication and wisdom is a nonmaterial category thus, it is not correct to logically structure these notions. Another state that information and knowledge are both knowledge and wisdom is a use of knowledge (M.Fricke).

6- Scientists do experiments

With the vast number of tools and technologies available for data science analytics, the expert can handle the calculation errors relevantly easy while the poor experimental design is a costly mistake. It is laborious to correct the defects after they occurred. The scientist should always keep this in mind and should actively design the experiment.

The experimental design commences with exploratory data analysis aiming to find out mistakes, verify assumptions, explore relationships between variables. Then a model-based test is conducted. The problem with models is that they are grounded on assumptions. While being mathematically ideal, they may stand rather far from the real state of things and the bigger the distance between reality and assumption, the higher is the chance that an experiment will fail. In this regard, an important routine for a data scientist is to analyze assumptions in terms of their uncertainty level.

7- Customer first

Let’s have a look at the data science not literally as science but as a service. The mass market delivers its products directly to the consumer’s door. This accessibility, among the other factors, makes the customer think about personalization of goods and services they receive. This awareness brings new value to the insight information collected by data scientists.  The consumers expect to receive an attitude as if they deal with a family business. As if they know their vendors well. Recent practices also show that very often it is more cost effective to maintain the existing clients than to attract and keep the new people. The data science can bring value to both sides.

Knowing the preferences, likes, and dislikes of your customer, his frequent transactions, and other relevant information, is not a new approach. In 1994 an article named “Database Marketing” by J.Berry was published.  The paper marked the future trends in data analytics on the example of marketing but had a slightly negative connotation. There was a feeling that entities are manipulative, interfering with the private information and think only about their profit while collecting consumer info. As a relief one perceive a statement that the gathered information was useless. It came in such an unstructured format that it could have been hardly used correctly by the decision makers.

With new social-oriented business psychology that is being formed today, with the available technologies for data analysis and interpretation, companies cannot only benefit themselves but also build a partner relationship with their customers.

8- Talking about money

According to the IDC and EMC report, the volume of generated data around the world doubles every year and will reach 40 trillion gigabytes by 2020. However, only a slight part of it (0.5%) is used or even analyzed. This interesting facts to know make data science field a delicious slice of the market pie.

Another study by Gartner Inc. shows that more and more companies are taking steps to incorporate data science operations into their business models. 73% of organizations-respondents of the survey have already invested in big data.

Wikibon.org’s researched that big data will be a $50 billion worth business by 2017. And the IBM “Business Analytics and Optimization for the Intelligent Enterprise” review forecasts that companies can receive 20 times more profits and a 30% on return with big data, providing they have a well-organized data analytics and development scenario.

Contributed by Alan Davis