![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
Wikipedia is great partly because of its rules, made by many sharp people over time. One of these rules is no original research; Wikipedia is not a place to showcase new papers under the guise of citations, since it suggests the new paper is a reliable source; the fact that it cites other sources does not mean that it itself can be viewed as a secondary source for our purposes; what is wanted is citations one-source-removed, such as established journals, newspapers, textbooks -- impartial analysts, looking objectively at primary sources. In this case, the citation added is a primary source -- a pdf file of a research paper; don't see why it is in this article other than to promote this particular paper using search engine optimization.-- Tomwsulcer ( talk) 17:51, 3 February 2014 (UTC)
Headine-1: Big data: are we making a big mistake? March 28, 2014
[The science of 'statistical learning' and 'computer learning' can keep research on track.] — Charles Edwin Shipp ( talk) 01:48, 30 March 2014 (UTC)
Another editor is insisting that this article include a mention of "the Canadian Open Data Experience (CODE) Inspiration Day event held at the University of Waterloo Stratford Campus located in Stratford, Ontario" at which "renowned Data Scientist Hilary Mason spoke about Big Data." I don't see how this adds to a reader's understanding of this topic and remain convinced that it should be removed. Can other editors please comment or contribute to this discussion? Thanks! ElKevbo ( talk) 11:57, 7 April 2014 (UTC)
Running the Duplication Detector report reveals the following:
Comparing documents for duplicated text:
http://en.wikipedia.org/wiki/Big_data
http://www.stanfordlawreview.org/online/privacy-and-big-data/privacy-and-big-dataDownloaded document from http://en.wikipedia.org/wiki/Big_data (239986 characters, 7914 words)
Downloaded document from http://www.stanfordlawreview.org/online/privacy-and-big-data/privacy-and-big-data (71117 characters (UTF8), 5217 words)
Total match candidates found: 1202 (before eliminating redundant matches)
Please run the report itself to see which sentences are a exact match (about 6) & which are close paraphrases (about a dozen or more).
Peaceray (
talk)
02:51, 8 June 2014 (UTC)
I proposed to add the CSS product designed by Lablanche & Company for data compression and data encryption in one step. It is possible?
SL — Preceding unsigned comment added by 90.50.49.149 ( talk) 16:30, 10 January 2015 (UTC)
The CSS product has an interest for data compression and encryption and can be sold to big companies and institutions. This product can generate ten of millions of dollards, so i think you should accept to include it in the wilkipedia big data page (just one sentence). "The start-up Lablanche & Company commercializes a prototype named CSS for big data vizualization and data compression/encryption using the recent compressed sensing theory". SL — Preceding unsigned comment added by 90.50.49.149 ( talk) 04:02, 12 January 2015 (UTC)
This company is not unknown because its website is referenced on Google and on the Compressed Sensing wilkipedia page. Fgtyg78
What do you have to answer to this? Fgtyg78
There is not conflict of interest because CSS is a unique software prototype for big data problems with an innovative mechanism of compression/encryption. This prototype has no equivalent in big companies and institutions. So this thing must appear in encyclopedy to inform people on the use of compressed sensing in big data.
Fgtyg78 — Preceding unsigned comment added by 90.50.49.149 ( talk) 11:01, 12 January 2015 (UTC)
![]() | This
edit request to
Big data has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Would it be possible to add link to e-Science page ( /info/en/?search=E-Science) in the section discussing research applications of Big Data?
The e-Science page seems to provide more details of the examples mentioned on the Big Data one, and this linking could eventually motivate avoiding some duplication of the material.
Mheikkurinen ( talk) 20:02, 29 January 2015 (UTC)
{{U|
Technical 13}} (
e •
t •
c)
20:23, 29 January 2015 (UTC)Hi. A few reverts back the tone of the text changed entirely and now sounds like a poorly written magazine article. Can someone take the article back to a stable point please? Rui ''Gabriel'' Correia ( talk) 08:10, 29 January 2015 (UTC)
I disagree with these edits, but am not going to engage in edit warring on a subject for which I have no passion. The most I could do would be to slap about 4 or 5 requests for clarification tags — which the editor would most likely remove, as he has twice done with the "tone" tags, without appreciating that these are there to help.
Rui ''Gabriel'' Correia ( talk) 14:34, 29 January 2015 (UTC)
Jugdev. I offered to help, yet you rejected my help, claiming you understood what you were doing. Apparently not. I completely understand what you are trying to do by presenting the most recent developments first, but it does not work that way. Let's look at the first sentence of Elephant, for example: "Elephants are large mammals of the family Elephantidae and the order Proboscidea." Now, if I want to add that the market for ivory is driving the African elephant to extinction, where do I add this? Before the original text - i.e., present the most recent information as you are doing with big data, or after the existing sentence? Let's take a look:
Which one (1. or 2.) is a most logical sequence?
That is ONE aspect of it. The other is the tone. The tone is going wrong precisely because you are swinging around the logical order of the bits of information. Which is why you need to add "Data has always been Big", otherwise the next sentence hangs, because segments like "aspect that differs now", "compared with the past", trigger in the reader a sense that something is missing. So you patched in the bit about "Data has always been Big" to cover it up (and plagiarised - read below).
You also claim to be familiar with the styleguide and have now had ample opportunity to analyse your edits to see if they comply. It amazes me then that you keep on claiming that you edit is in line with the styleguide and yet you have not yet picked up that there is a problem with "sheer scale" and "super efficient speed". This is partly because you just plagiarised the source, then changed or moved one or two words around "Data has been “big” all along. What has changed now is not just scale and cross-channel inputs, but the sheer speed and accessibility of data".
Greetings. Rui ''Gabriel'' Correia ( talk) 11:00, 2 February 2015 (UTC)
I don't think it is very productive to have a whole team of editors monitoring one edit by one editor. We have done all that can be considered par for the course, we have pointed out what is deficient about the version the editor would like to use, all to no avail. If said editor cannot grasp a simple thing, such as not starting an article/ lede on a minor sentence, then he needs a tutor. I don't know if appointing a tutor is foreseen in the mechanisms to deal with stubborn editors. If not, progressive blocking seems to be the only solution. Regretably. Rui ''Gabriel'' Correia ( talk) 23:57, 5 February 2015 (UTC)
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Is the opening paragraph a good summary of the Big data topic? Narsil ( talk) 19:16, 2 February 2015 (UTC)
Data has always been Big.is certainly unencyclopedic in tone and such conversational style should be avoided. I much prefer the one sentence version. The current formatting (with bold Big, Data and Big Data in the first, second and third sentences) is clearly not per manual of style. I think the previous version which starts
Big data is an all-encompassingis better although could possibly be improved by breaking down "difficult to process them using traditional data processing applications." though with my limited familiarity with the topic I wouldn't be sure how. SPACKlick ( talk) 10:17, 3 February 2015 (UTC)
Thank you for your contribution. Firstly, the above overview by Narsil is a convenient version of events. Please refer to the talk page (pasted below for your convenience) for a more thorough overview and also my criticism of the changes made. In response to your comments SPACKlick, I loosely agree with your concern regarding the formatting, and in defense feel that the formatting may help the article aesthetically by allowing users to identify keywords. I however disagree with the first point, as the sentences in question happen to be a quotation from a well regarded publication about Big Data. It is my understanding that the quote evokes a particular frame of mind/ thought, which in turn allows the reader to begin grappling with the complex topic. I do not believe that we have enough evidence to completely revert the article. All we have seen is a critique of two sentences that happen to be from a publication that specialise on the subject in question.
Please see a summary of the talk page below: [quoted from User talk:Jugdev#Manual of style ]
Collapsed some material that was copied here from his user talk by User:Jugdev. Click to view. EdJohnston ( talk) 04:09, 6 February 2015 (UTC) |
---|
The following discussion has been closed. Please do not modify it. |
|
I've added a "tone" tag to the page to direct visitors to this discussion. User:Jugdev, please do not remove the tag. The tag is there to indicate that editors disagree about whether the tone is appropriate, and this disagreement clearly exists. Don't remove the tag just because you think the tone is good--we know that! honestly!--the tag is so other editors will come here and give their opinions. If they agree with you about your edits, then they'll say we should remove the tag, and this should get wrapped up sooner. But if you remove the tag yourself, this could be considered disruptive (per WP:DISRUPT) or even edit-warring. Narsil ( talk) 03:15, 4 February 2015 (UTC)
I disagree with the tone tag - just to repeat myself : the paragraph in question is a published quotation from a highly regarded title from the field of big data. -JG ( talk) 08:57, 4 February 2015 (UTC)
Jugdev, writing for the Wikipedia is not about collating quotes. It is about making sense of information found in reliable sources, conveying the information found in the sources in your own words in an encyclopaedic style and citing the sources consulted. If you are going to use quotations, this must be done sparingly, where applicable and justifiable, but not as the opening of a lede or article. Regards, Rui ''Gabriel'' Correia ( talk) 10:58, 6 February 2015 (UTC)
Rui, I've been told the nothing encapsulates the essence of a debated topic the way a published quotation does. I will find one that's fit for our purpose. -JG ( talk) 11:14, 6 February 2015 (UTC)
The trend to larger data sets equates to additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, prevent diseases, combat crime and so on."
This is badly written I tried to correct it but I don't understand what it's trying to say enough to repair the English.
SPACKlick (
talk)
17:05, 10 February 2015 (UTC)
"When a large amount of related data is in a single data set, it is possible to derive information from it that could not be derived from an equivalent amount of data in smaller data sets. This allows correlations to be found..."But that sentence isn't supported by the citation (in The Economist: Data, data everywhere), which just talks about the total amount of data, and not whether that data is in one data set or many. OTOH, as I read it, the original sentence has the same problem (that it's drawing a conclusion not supported by the source). So I'd just cut out the whole bit about numbers of data sets (one big vs many small), and change it to
The trend to larger data sets allows new correlations to be found to "spot business trends, prevent diseases, combat crime and so on."Narsil ( talk) 19:23, 10 February 2015 (UTC)
Previously it was described as they filtered 99.999% of data. Upon reading further the following Thesis, it looks like they filter more than that. I've updated things accordingly and thrown in what was surely a clumsy citation. Feel free to clean it up, and then delete this talk entry.
https://cds.cern.ch/record/1504817/files/CERN-THESIS-2013-004.pdf
L1 filtering 40Mhz to ~60-65Ghz (so ~.015% data retained). L2 filtering 65Khz to 6Khz so (10% of data retained) L3 filtering 5-6Khz to 500-600hz so (10% of data retained). So 99.99995 % of data was filtered. — Preceding unsigned comment added by 98.200.115.85 ( talk) 13:34, 24 March 2015 (UTC)
The current article no longer mentions volume, velocity, and variety as ways of characterizing Big Data. Why? I though the combination was a good way to describe important aspects of Big Data. 108.212.231.175 ( talk) 15:45, 22 March 2014 (UTC)Mark Kerstetter
Variety - The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data.
The link to Reference 4 is dead. — Preceding unsigned comment added by 97.119.162.121 ( talk) 23:55, 23 December 2015 (UTC)
Funny though it is, what is the ultimate source of the cartoon? Is it just a WP editor? WP:NOTBLOG... 121.103.176.27 ( talk) 08:05, 13 August 2014 (UTC)
And then a year later the illustration was removed by User:McGeddon with an unfounded allegation in the edit summary ("cut a Wikipedia editor's joke cartoon") and without any discussion here. An editor that only contributes deletions to this article. Great. :-( -- Atlasowa ( talk) 06:55, 3 March 2016 (UTC)
Please, let's go through Further reading section source by source here & not ax the entire section. I am an IT professional & I do not agree with HelpUsStopSpam that all the sources are spam. Please discuss each source first before removing it. Peaceray ( talk) 19:55, 16 September 2016 (UTC)
It appears to me that all of the three variants "Big Data", "Big data" and "big data" are used throughout the text. This should be cleaned up.
Grstein ( talk) 07:08, 16 April 2015 (UTC)
Suggested for inclusion at the bottom of the Big Data Definition the six-characteristics of Big Biomedical and Health Data:
References
I found the Critical data studies article while doing new-page patrolling, and it seems to me that it is a rather narrow viewpoint, and should either be expanded with content from the Critique section of this article, or should even be merged into that section. I don't know much about the scholarship in this field, however, so I'm not confident to do this myself. -- Slashme ( talk) 08:59, 21 December 2016 (UTC)
Does really Big data article needs a photo of one of thousands of researches such as ... Danah Boyd ??? April 2017. — Preceding unsigned comment added by 223.197.149.174 ( talk) 12:22, 13 April 2017 (UTC)
This may or may not be usable, but it is the real history, put together from old slides after Steve Lohr wrote in NY Times and meetup groups wanted talks. See Big Data - Yesterday Today and Tomorrow = Slides or Big Data - Yesterday, Today and Tomorrow -video at Stanford Slide 16 shows the use in "Hardware, Wetware, Software", the general-purpose talk I used 1994-1996 (and maybe a bit during 1993), which was captured on video by University Video Communications in 1996. It was the opening keynote for TRI-Ada conference, November 1995.
For a few years, most of the "Big Data" use was in my talks. By 1996, it was part of external marketing. Slide 15 shows "Big Data" as part of SGI booth at SC'96, supercomputing conference in Pittsburgh Slides 23-25 show sample slides from 1997, "Big Advantages from Big Tools for Big Data". JohnMashey ( talk) 06:20, 19 May 2017 (UTC)
There seems to be a lot of that same information used within Big Data and Data lake. Are these topics the same ? And are any of these different than a data warehouse ? -jim 07:45, 5 July 2017 (UTC) — Preceding unsigned comment added by Jwilleke ( talk • contribs)
The "information technology" section probably should be reorganized. Retail and real estate are not subdivisions of information technology.
There probably should be a "finance" section. See alternative data and surveillance capitalism, two not very good articles which relate to that. Crunching on lots of data for finance purposes is a very real activity. Is it covered elsewhere on Wikipedia? Technical analysis covers crunching on pure financial data, but lately there's a trend towards looking at miscellaneous data outside the financial markets for financial purposes. John Nagle ( talk) 20:00, 11 August 2017 (UTC)
The texts on U.S., India, and UK were copied from https://www.ijedr.org/papers/IJEDR1504022.pdf. Copyright issue?-- K3vinvmp ( talk) 19:49, 12 September 2016 (UTC)
![]() | This
edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
References
The result of the move request was: not moved ( closed by non-admin page mover) SITH (talk) 14:28, 5 February 2019 (UTC)
Big data →
Big Data – It is a name, not just a normal sentence. I can't move the page myself, so this is more of a move request rather than a discussion.
Bageense (
talk)
13:36, 29 January 2019 (UTC)
Eventually this article will need a history section. I am not sure what that I will look like. Here is one treatment of the subject -
Blue Rasberry (talk) 15:30, 5 August 2019 (UTC)
I don't know if this article was ever good enough to deserve a ranking of B, but it seems to have deteriorated into a lot jargon that is not well cited. If there are no objections in the next little while, I'm downgrading it to class C. Ethanpet113 ( talk) 22:48, 28 August 2020 (UTC)
This topic describes Big Data as a field of science. Big Data is an object, the field is Big Data Analysis or Big Data Analytics. Big Data Analytics links to this page but maybe if they were two separate pages it would be easier to improve both pages.
softwaretestwriter ( talk) 00:27, 20 January 2021 (UTC) NmuoMmiri
No need for a separate article - add to existing section Pam D 08:34, 2 November 2021 (UTC)
Veracity is also valid outside the bigdata context. See the recent reference of ISWC 2021 that was my motivation for creating the article. I didn't even know that the term also exists in the context of big data. I also think it is to early to say whether merging makes sense - it IMHO depends on the growth of the article. -- WolfgangFahl ( talk) 11:08, 2 November 2021 (UTC)
![]() | This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
Wikipedia is great partly because of its rules, made by many sharp people over time. One of these rules is no original research; Wikipedia is not a place to showcase new papers under the guise of citations, since it suggests the new paper is a reliable source; the fact that it cites other sources does not mean that it itself can be viewed as a secondary source for our purposes; what is wanted is citations one-source-removed, such as established journals, newspapers, textbooks -- impartial analysts, looking objectively at primary sources. In this case, the citation added is a primary source -- a pdf file of a research paper; don't see why it is in this article other than to promote this particular paper using search engine optimization.-- Tomwsulcer ( talk) 17:51, 3 February 2014 (UTC)
Headine-1: Big data: are we making a big mistake? March 28, 2014
[The science of 'statistical learning' and 'computer learning' can keep research on track.] — Charles Edwin Shipp ( talk) 01:48, 30 March 2014 (UTC)
Another editor is insisting that this article include a mention of "the Canadian Open Data Experience (CODE) Inspiration Day event held at the University of Waterloo Stratford Campus located in Stratford, Ontario" at which "renowned Data Scientist Hilary Mason spoke about Big Data." I don't see how this adds to a reader's understanding of this topic and remain convinced that it should be removed. Can other editors please comment or contribute to this discussion? Thanks! ElKevbo ( talk) 11:57, 7 April 2014 (UTC)
Running the Duplication Detector report reveals the following:
Comparing documents for duplicated text:
http://en.wikipedia.org/wiki/Big_data
http://www.stanfordlawreview.org/online/privacy-and-big-data/privacy-and-big-dataDownloaded document from http://en.wikipedia.org/wiki/Big_data (239986 characters, 7914 words)
Downloaded document from http://www.stanfordlawreview.org/online/privacy-and-big-data/privacy-and-big-data (71117 characters (UTF8), 5217 words)
Total match candidates found: 1202 (before eliminating redundant matches)
Please run the report itself to see which sentences are a exact match (about 6) & which are close paraphrases (about a dozen or more).
Peaceray (
talk)
02:51, 8 June 2014 (UTC)
I proposed to add the CSS product designed by Lablanche & Company for data compression and data encryption in one step. It is possible?
SL — Preceding unsigned comment added by 90.50.49.149 ( talk) 16:30, 10 January 2015 (UTC)
The CSS product has an interest for data compression and encryption and can be sold to big companies and institutions. This product can generate ten of millions of dollards, so i think you should accept to include it in the wilkipedia big data page (just one sentence). "The start-up Lablanche & Company commercializes a prototype named CSS for big data vizualization and data compression/encryption using the recent compressed sensing theory". SL — Preceding unsigned comment added by 90.50.49.149 ( talk) 04:02, 12 January 2015 (UTC)
This company is not unknown because its website is referenced on Google and on the Compressed Sensing wilkipedia page. Fgtyg78
What do you have to answer to this? Fgtyg78
There is not conflict of interest because CSS is a unique software prototype for big data problems with an innovative mechanism of compression/encryption. This prototype has no equivalent in big companies and institutions. So this thing must appear in encyclopedy to inform people on the use of compressed sensing in big data.
Fgtyg78 — Preceding unsigned comment added by 90.50.49.149 ( talk) 11:01, 12 January 2015 (UTC)
![]() | This
edit request to
Big data has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Would it be possible to add link to e-Science page ( /info/en/?search=E-Science) in the section discussing research applications of Big Data?
The e-Science page seems to provide more details of the examples mentioned on the Big Data one, and this linking could eventually motivate avoiding some duplication of the material.
Mheikkurinen ( talk) 20:02, 29 January 2015 (UTC)
{{U|
Technical 13}} (
e •
t •
c)
20:23, 29 January 2015 (UTC)Hi. A few reverts back the tone of the text changed entirely and now sounds like a poorly written magazine article. Can someone take the article back to a stable point please? Rui ''Gabriel'' Correia ( talk) 08:10, 29 January 2015 (UTC)
I disagree with these edits, but am not going to engage in edit warring on a subject for which I have no passion. The most I could do would be to slap about 4 or 5 requests for clarification tags — which the editor would most likely remove, as he has twice done with the "tone" tags, without appreciating that these are there to help.
Rui ''Gabriel'' Correia ( talk) 14:34, 29 January 2015 (UTC)
Jugdev. I offered to help, yet you rejected my help, claiming you understood what you were doing. Apparently not. I completely understand what you are trying to do by presenting the most recent developments first, but it does not work that way. Let's look at the first sentence of Elephant, for example: "Elephants are large mammals of the family Elephantidae and the order Proboscidea." Now, if I want to add that the market for ivory is driving the African elephant to extinction, where do I add this? Before the original text - i.e., present the most recent information as you are doing with big data, or after the existing sentence? Let's take a look:
Which one (1. or 2.) is a most logical sequence?
That is ONE aspect of it. The other is the tone. The tone is going wrong precisely because you are swinging around the logical order of the bits of information. Which is why you need to add "Data has always been Big", otherwise the next sentence hangs, because segments like "aspect that differs now", "compared with the past", trigger in the reader a sense that something is missing. So you patched in the bit about "Data has always been Big" to cover it up (and plagiarised - read below).
You also claim to be familiar with the styleguide and have now had ample opportunity to analyse your edits to see if they comply. It amazes me then that you keep on claiming that you edit is in line with the styleguide and yet you have not yet picked up that there is a problem with "sheer scale" and "super efficient speed". This is partly because you just plagiarised the source, then changed or moved one or two words around "Data has been “big” all along. What has changed now is not just scale and cross-channel inputs, but the sheer speed and accessibility of data".
Greetings. Rui ''Gabriel'' Correia ( talk) 11:00, 2 February 2015 (UTC)
I don't think it is very productive to have a whole team of editors monitoring one edit by one editor. We have done all that can be considered par for the course, we have pointed out what is deficient about the version the editor would like to use, all to no avail. If said editor cannot grasp a simple thing, such as not starting an article/ lede on a minor sentence, then he needs a tutor. I don't know if appointing a tutor is foreseen in the mechanisms to deal with stubborn editors. If not, progressive blocking seems to be the only solution. Regretably. Rui ''Gabriel'' Correia ( talk) 23:57, 5 February 2015 (UTC)
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Is the opening paragraph a good summary of the Big data topic? Narsil ( talk) 19:16, 2 February 2015 (UTC)
Data has always been Big.is certainly unencyclopedic in tone and such conversational style should be avoided. I much prefer the one sentence version. The current formatting (with bold Big, Data and Big Data in the first, second and third sentences) is clearly not per manual of style. I think the previous version which starts
Big data is an all-encompassingis better although could possibly be improved by breaking down "difficult to process them using traditional data processing applications." though with my limited familiarity with the topic I wouldn't be sure how. SPACKlick ( talk) 10:17, 3 February 2015 (UTC)
Thank you for your contribution. Firstly, the above overview by Narsil is a convenient version of events. Please refer to the talk page (pasted below for your convenience) for a more thorough overview and also my criticism of the changes made. In response to your comments SPACKlick, I loosely agree with your concern regarding the formatting, and in defense feel that the formatting may help the article aesthetically by allowing users to identify keywords. I however disagree with the first point, as the sentences in question happen to be a quotation from a well regarded publication about Big Data. It is my understanding that the quote evokes a particular frame of mind/ thought, which in turn allows the reader to begin grappling with the complex topic. I do not believe that we have enough evidence to completely revert the article. All we have seen is a critique of two sentences that happen to be from a publication that specialise on the subject in question.
Please see a summary of the talk page below: [quoted from User talk:Jugdev#Manual of style ]
Collapsed some material that was copied here from his user talk by User:Jugdev. Click to view. EdJohnston ( talk) 04:09, 6 February 2015 (UTC) |
---|
The following discussion has been closed. Please do not modify it. |
|
I've added a "tone" tag to the page to direct visitors to this discussion. User:Jugdev, please do not remove the tag. The tag is there to indicate that editors disagree about whether the tone is appropriate, and this disagreement clearly exists. Don't remove the tag just because you think the tone is good--we know that! honestly!--the tag is so other editors will come here and give their opinions. If they agree with you about your edits, then they'll say we should remove the tag, and this should get wrapped up sooner. But if you remove the tag yourself, this could be considered disruptive (per WP:DISRUPT) or even edit-warring. Narsil ( talk) 03:15, 4 February 2015 (UTC)
I disagree with the tone tag - just to repeat myself : the paragraph in question is a published quotation from a highly regarded title from the field of big data. -JG ( talk) 08:57, 4 February 2015 (UTC)
Jugdev, writing for the Wikipedia is not about collating quotes. It is about making sense of information found in reliable sources, conveying the information found in the sources in your own words in an encyclopaedic style and citing the sources consulted. If you are going to use quotations, this must be done sparingly, where applicable and justifiable, but not as the opening of a lede or article. Regards, Rui ''Gabriel'' Correia ( talk) 10:58, 6 February 2015 (UTC)
Rui, I've been told the nothing encapsulates the essence of a debated topic the way a published quotation does. I will find one that's fit for our purpose. -JG ( talk) 11:14, 6 February 2015 (UTC)
The trend to larger data sets equates to additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, prevent diseases, combat crime and so on."
This is badly written I tried to correct it but I don't understand what it's trying to say enough to repair the English.
SPACKlick (
talk)
17:05, 10 February 2015 (UTC)
"When a large amount of related data is in a single data set, it is possible to derive information from it that could not be derived from an equivalent amount of data in smaller data sets. This allows correlations to be found..."But that sentence isn't supported by the citation (in The Economist: Data, data everywhere), which just talks about the total amount of data, and not whether that data is in one data set or many. OTOH, as I read it, the original sentence has the same problem (that it's drawing a conclusion not supported by the source). So I'd just cut out the whole bit about numbers of data sets (one big vs many small), and change it to
The trend to larger data sets allows new correlations to be found to "spot business trends, prevent diseases, combat crime and so on."Narsil ( talk) 19:23, 10 February 2015 (UTC)
Previously it was described as they filtered 99.999% of data. Upon reading further the following Thesis, it looks like they filter more than that. I've updated things accordingly and thrown in what was surely a clumsy citation. Feel free to clean it up, and then delete this talk entry.
https://cds.cern.ch/record/1504817/files/CERN-THESIS-2013-004.pdf
L1 filtering 40Mhz to ~60-65Ghz (so ~.015% data retained). L2 filtering 65Khz to 6Khz so (10% of data retained) L3 filtering 5-6Khz to 500-600hz so (10% of data retained). So 99.99995 % of data was filtered. — Preceding unsigned comment added by 98.200.115.85 ( talk) 13:34, 24 March 2015 (UTC)
The current article no longer mentions volume, velocity, and variety as ways of characterizing Big Data. Why? I though the combination was a good way to describe important aspects of Big Data. 108.212.231.175 ( talk) 15:45, 22 March 2014 (UTC)Mark Kerstetter
Variety - The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data.
The link to Reference 4 is dead. — Preceding unsigned comment added by 97.119.162.121 ( talk) 23:55, 23 December 2015 (UTC)
Funny though it is, what is the ultimate source of the cartoon? Is it just a WP editor? WP:NOTBLOG... 121.103.176.27 ( talk) 08:05, 13 August 2014 (UTC)
And then a year later the illustration was removed by User:McGeddon with an unfounded allegation in the edit summary ("cut a Wikipedia editor's joke cartoon") and without any discussion here. An editor that only contributes deletions to this article. Great. :-( -- Atlasowa ( talk) 06:55, 3 March 2016 (UTC)
Please, let's go through Further reading section source by source here & not ax the entire section. I am an IT professional & I do not agree with HelpUsStopSpam that all the sources are spam. Please discuss each source first before removing it. Peaceray ( talk) 19:55, 16 September 2016 (UTC)
It appears to me that all of the three variants "Big Data", "Big data" and "big data" are used throughout the text. This should be cleaned up.
Grstein ( talk) 07:08, 16 April 2015 (UTC)
Suggested for inclusion at the bottom of the Big Data Definition the six-characteristics of Big Biomedical and Health Data:
References
I found the Critical data studies article while doing new-page patrolling, and it seems to me that it is a rather narrow viewpoint, and should either be expanded with content from the Critique section of this article, or should even be merged into that section. I don't know much about the scholarship in this field, however, so I'm not confident to do this myself. -- Slashme ( talk) 08:59, 21 December 2016 (UTC)
Does really Big data article needs a photo of one of thousands of researches such as ... Danah Boyd ??? April 2017. — Preceding unsigned comment added by 223.197.149.174 ( talk) 12:22, 13 April 2017 (UTC)
This may or may not be usable, but it is the real history, put together from old slides after Steve Lohr wrote in NY Times and meetup groups wanted talks. See Big Data - Yesterday Today and Tomorrow = Slides or Big Data - Yesterday, Today and Tomorrow -video at Stanford Slide 16 shows the use in "Hardware, Wetware, Software", the general-purpose talk I used 1994-1996 (and maybe a bit during 1993), which was captured on video by University Video Communications in 1996. It was the opening keynote for TRI-Ada conference, November 1995.
For a few years, most of the "Big Data" use was in my talks. By 1996, it was part of external marketing. Slide 15 shows "Big Data" as part of SGI booth at SC'96, supercomputing conference in Pittsburgh Slides 23-25 show sample slides from 1997, "Big Advantages from Big Tools for Big Data". JohnMashey ( talk) 06:20, 19 May 2017 (UTC)
There seems to be a lot of that same information used within Big Data and Data lake. Are these topics the same ? And are any of these different than a data warehouse ? -jim 07:45, 5 July 2017 (UTC) — Preceding unsigned comment added by Jwilleke ( talk • contribs)
The "information technology" section probably should be reorganized. Retail and real estate are not subdivisions of information technology.
There probably should be a "finance" section. See alternative data and surveillance capitalism, two not very good articles which relate to that. Crunching on lots of data for finance purposes is a very real activity. Is it covered elsewhere on Wikipedia? Technical analysis covers crunching on pure financial data, but lately there's a trend towards looking at miscellaneous data outside the financial markets for financial purposes. John Nagle ( talk) 20:00, 11 August 2017 (UTC)
The texts on U.S., India, and UK were copied from https://www.ijedr.org/papers/IJEDR1504022.pdf. Copyright issue?-- K3vinvmp ( talk) 19:49, 12 September 2016 (UTC)
![]() | This
edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
References
The result of the move request was: not moved ( closed by non-admin page mover) SITH (talk) 14:28, 5 February 2019 (UTC)
Big data →
Big Data – It is a name, not just a normal sentence. I can't move the page myself, so this is more of a move request rather than a discussion.
Bageense (
talk)
13:36, 29 January 2019 (UTC)
Eventually this article will need a history section. I am not sure what that I will look like. Here is one treatment of the subject -
Blue Rasberry (talk) 15:30, 5 August 2019 (UTC)
I don't know if this article was ever good enough to deserve a ranking of B, but it seems to have deteriorated into a lot jargon that is not well cited. If there are no objections in the next little while, I'm downgrading it to class C. Ethanpet113 ( talk) 22:48, 28 August 2020 (UTC)
This topic describes Big Data as a field of science. Big Data is an object, the field is Big Data Analysis or Big Data Analytics. Big Data Analytics links to this page but maybe if they were two separate pages it would be easier to improve both pages.
softwaretestwriter ( talk) 00:27, 20 January 2021 (UTC) NmuoMmiri
No need for a separate article - add to existing section Pam D 08:34, 2 November 2021 (UTC)
Veracity is also valid outside the bigdata context. See the recent reference of ISWC 2021 that was my motivation for creating the article. I didn't even know that the term also exists in the context of big data. I also think it is to early to say whether merging makes sense - it IMHO depends on the growth of the article. -- WolfgangFahl ( talk) 11:08, 2 November 2021 (UTC)