Structured vs Unstructured Data 101: Top Guide | Datamation

文章推薦指數: 80 %
投票人數:10人

The “versus” in unstructured data vs. structured data does not denote ... value from potentially valuable data sources like rich media, ... Trends BigData DataCenter AI Cloud Applications Mobile OpenSource Security Storage Networks ERP Careers Search Monday,February28,2022 FacebookTwitterYoutube Trends BigData DataCenter AI Cloud Applications Mobile OpenSource Security Storage Networks ERP Careers More Search Signin Welcome!Logintoyouraccount yourusername yourpassword Forgotyourpassword?Gethelp PrivacyPolicy Passwordrecovery Recoveryourpassword youremail Apasswordwillbee-mailedtoyou. Structuredvs.UnstructuredData HomeBigData ByChristineTaylor May21,2021 Structureddatavs.unstructureddata:structureddataiscomprisedofclearlydefineddatatypeswithpatternsthatmakethemeasilysearchable;whileunstructureddata–“everythingelse”–iscomprisedofdatathatisusuallynotaseasilysearchable,includingformatslikeaudio,video,andsocialmediapostings. The“versus”inunstructureddatavs.structureddatadoesnotdenoteconflictbetweenthetwo.Customersselectoneortheothernotbasedontheirdatastructure,butontheapplicationsthatusethem:relationaldatabasesforstructureddata,andmostanyothertypeofapplicationforunstructureddata. Ifyou’relookingforbigdatasolutionsforyourenterprise,refertoourlistofthetopbigdatacompanies. However,thereisagrowingtensionbetweentheeaseofanalysisonstructureddataversusmorechallenginganalysisonunstructureddata.Structureddataanalyticsisamatureprocessandtechnology.Unstructureddataanalyticsisanascentindustrywithalotofnewinvestmentinresearchanddevelopment,butit’snotyetamaturetechnology.Thestructureddatavs.unstructureddataissuewithincorporationsisdecidingiftheyshouldinvestinanalyticsforunstructureddata,andifitispossibletoaggregatethetwointobetterbusinessintelligence. LearnmoreaboutbigdatawithourcoursesonTechRepublicAcademy! WhatIsStructuredData? Structureddatausuallyresidesinrelationaldatabases(RDBMS).Fieldsstorelength-delineateddatalikephonenumbers,SocialSecuritynumbers,orZIPcodes.Recordsevencontaintextstringsofvariablelengthlikenames,makingitasimplemattertosearch.Datamaybehuman-ormachine-generated,aslongasthedataiscreatedwithinanRDBMSstructure.Thisformatiseminentlysearchable,bothwithhuman-generatedqueriesandviaalgorithmsusingtypesofdataandfieldnames,suchasalphabeticalornumeric,currency,ordate. Commonrelationaldatabaseapplicationswithstructureddataincludeairlinereservationsystems,inventorycontrol,salestransactions,andATMactivity.StructuredQueryLanguage(SQL)enablesqueriesonthistypeofstructureddatawithinrelationaldatabases. Somerelationaldatabasesstoreorpointtounstructureddata,suchascustomerrelationshipmanagement(CRM)applications.Theintegrationcanbeawkwardatbestsincememofieldsdonotlendthemselvestotraditionaldatabasequeries.Still,mostoftheCRMdataisstructured. WhatIsUnstructuredData? Unstructureddataisessentiallyeverythingelse.Unstructureddatahasaninternalstructurebutisnotstructuredviapredefineddatamodelsorschema.Itmaybetextualornon-textual,andhuman-ormachine-generated.Itmayalsobestoredwithinanon-relationaldatabaselikeNoSQL. Typicalhuman-generatedunstructureddataincludes: Textfiles:Wordprocessing,spreadsheets,presentations,emails,logs. Email:Emailhassomeinternalstructurethankstoitsmetadata,andwesometimesrefertoitassemi-structured.However,itsmessagefieldisunstructuredandtraditionalanalyticstoolscannotparseit. SocialMedia:DatafromFacebook,Twitter,LinkedIn. Website:YouTube,Instagram,photosharingsites. Mobiledata:Textmessages,locations. Communications:Chat,IM,phonerecordings,collaborationsoftware. Media:MP3,digitalphotos,audioandvideofiles. Businessapplications:MSOfficedocuments,productivityapplications. Typicalmachine-generatedunstructureddataincludes: Satelliteimagery:Weatherdata,landforms,militarymovements. Scientificdata:Oilandgasexploration,spaceexploration,seismicimagery,atmosphericdata. Digitalsurveillance:Surveillancephotosandvideo. Sensordata:Traffic,weather,oceanographicsensors. ThemostinclusiveBigDataanalysismakesuseofbothstructuredandunstructureddata. StructuredVs.UnstructuredData:What’sTheDifference? Besidestheobviousdifferencebetweenstoringinarelationaldatabaseandstoringoutsideofone,thebiggestdifferencebetweenstructuredandunstructureddataistheeaseofanalysis.Matureanalyticstoolsexistforstructureddata,butanalyticstoolsforminingunstructureddataarenascentanddeveloping. Userscanrunsimplecontentsearchesacrosstextualunstructureddata.Butitslackoforderlyinternalstructuredefeatsthepurposeoftraditionaldataminingtools,andtheenterprisegetslittlevaluefrompotentiallyvaluabledatasourceslikerichmedia,networkorweblogs,customerinteractions,andsocialmediadata. Ontopofthis,thereissimplymuchmoreunstructureddatathanstructured.Unstructureddatamakesup80%andmoreofenterprisedata,andisgrowingattherateof55%and65%peryear.Andwithoutthetoolstoanalyzethismassivedatacategory,organizationsareleavingvastamountsofvaluabledataonthebusinessintelligencetable. StructureddataistraditionallyeasierforBigDataapplicationstodigest,buttoday’sdataanalyticssolutionsaremakinggreatstridesintheunstructureddataarea. HowSemi-StructuredDataFitsWithStructuredAndUnstructuredData Semi-structureddatamaintainsinternaltagsandmarkingsthatidentifyseparatedataelements,whichenablesdataanalyststodetermineinformationgroupingandhierarchies.Bothdocumentsanddatabasescanbesemi-structured.Thistypeofdataonlyrepresentsabout5-10%ofthedatapie,buthascriticalbusinessusagecaseswhenusedincombinationwithstructuredandunstructureddata. Emailisaverycommonexampleofasemi-structureddatatype.Althoughmoreadvancedanalysistoolsarenecessaryforthreadtracking,near-dedupe,andconceptsearching;email’snativemetadataenablesclassificationandkeywordsearchingwithoutanyadditionaltools. Emailisahugeusecase,butmostsemi-structureddevelopmentcentersoneasingdatatransportissues.Sharingsensordataisagrowingusecase,asareweb-baseddatasharingandtransport:electronicdatainterchange(EDI),manysocialmediaplatforms,documentmarkuplanguages,andNoSQLdatabases. ExamplesofSemi-structuredData MarkuplanguageXMLThisisasemi-structureddocumentlanguage.XMLisasetofdocumentencodingrulesthatdefineahuman-andmachine-readableformat.(AlthoughsayingthatXMLishuman-readabledoesn’tpackabigpunch:anyonetryingtoreadanXMLdocumenthasbetterthingstodowiththeirtime.)Itsvalueisthatitstag-drivenstructureishighlyflexible,andcoderscanadaptittouniversalizedatastructure,storage,andtransportontheweb. OpenstandardJSON(JavaScriptObjectNotation)JSONisanothersemi-structureddatainterchangeformat.Javaisimplicitinthename,butotherC-likeprogramminglanguagesrecognizeit.Itsstructureconsistsofname/valuepairs(orobject,hashtable,etc.)andanorderedvaluelist(orarray,sequence,list).Sincethestructureisinterchangeableamonglanguages,JSONexcelsattransmittingdatabetweenwebapplicationsandservers. NoSQLSemi-structureddataisalsoanimportantelementofmanyNoSQL(“notonlySQL”)databases.NoSQLdatabasesdifferfromrelationaldatabasesbecausetheydonotseparatetheorganization(schema)fromthedata.ThismakesNoSQLabetterchoicetostoreinformationthatdoesnoteasilyfitintotherecordandtableformat,suchastextwithvaryinglengths.Italsoallowsforeasierdataexchangebetweendatabases.SomenewerNoSQLdatabaseslikeMongoDBandCouchbasealsoincorporatesemi-structureddocumentsbynativelystoringthemintheJSONformat. Inbigdataenvironments,NoSQLdoesnotrequireadminstoseparateoperationalandanalyticsdatabasesintoseparatedeployments.NoSQListheoperationaldatabaseandhostsnativeanalyticstoolsforbusinessintelligence.InHadoopenvironments,NoSQLdatabasesingestandmanageincomingdataandserveupanalyticresults. Thesedatabasesarecommoninbigdatainfrastructureandreal-timeWebapplicationslikeLinkedIn.OnLinkedIn,hundredsofmillionsofbusinessusersfreelysharejobtitles,locations,skills,andmore;andLinkedIncapturesthemassivedatainasemi-structuredformat.Whenjob-seekinguserscreateasearch,LinkedInmatchesthequerytoitsmassivesemi-structureddatastores,cross-referencesdatatohiringtrends,andsharestheresultingrecommendationswithjobseekers.ThesameprocessoperateswithsalesandmarketingqueriesinpremiumLinkedInserviceslikeSalesforce.Amazonalsobasesitsreaderrecommendationsonsemi-structureddatabases. SQLvs.NoSQL SQL(structuredquerylanguage)andNoSQL(“notonly”structuredquerylanguage)particularlyshowcasesomeofthekeydifferencesbetweenstructuredandunstructureddata.SQLalmostalwayscomesintheformofadatabasebecausethestructureddataitcontainscaneasilybedisplayedinawaythatshowsrelationshipsbetweendataentities.NoSQL,ontheotherhand,cannoteasilybedisplayedinatraditionaltableoranotherrelationaldatabaseformat,becausethemixofunstructuredandsemi-structureddatacannotbelaidoutaccordingtoanypatternorschema.  WhileSQLandotherstructuredlanguagesetupsareofteneasiertocomprehendandmanagemanually,theydon’talwayshaveasmuchpotentialenergyfordataanalysisandmanipulation.NoSQLandotherinstancesofunstructureddataaredifficulttocomprehendandanalyze,evenwithsomeofthestrongesttools,buttheoutcomegivesyouawidervarietyofdatatypesforbusinessintelligencepractices.Ultimately,youneedbothstructuredandunstructureddata,aswellasthedifferentformatsthattheycanbedisplayedandorganizedinto,inordertodevelopafullpictureofyourcorporatedata. ReadNext:BestDataAnalysisMethods2021 StructuredVs.UnstructuredData:NextGenToolsAreGameChangers Newtoolsareavailabletoanalyzeunstructureddata,particularlygivenspecificusecaseparameters.Mostofthesetoolsarebasedonmachinelearning.Structureddataanalyticscanusemachinelearningaswell,butthemassivevolumeandmanydifferenttypesofunstructureddatarequiresit. Afewyearsago,analystsusingkeywordsandkeyphrasescouldsearchunstructureddataandgetadecentideaofwhatthedatainvolved.eDiscoverywas(andis)aprimeexampleofthisapproach.However,unstructureddatahasgrownsodramaticallythatusersneedtoemployanalyticsthatnotonlyworkatcomputespeeds,butalsoautomaticallylearnfromtheiractivityanduserdecisions.NaturalLanguageProcessing(NLP),patternsensingandclassification,andtext-miningalgorithmsareallcommonexamples,asaredocumentrelevanceanalytics,sentimentanalysis,andfilter-drivenWebharvesting.Unstructureddataanalyticswithmachine-learningintelligenceallowsorganizationsto: Analyzedigitalcommunicationsforcompliance.Failedcompliancecancostcompaniesmillionsofdollarsinfees,litigation,andlostbusiness.Patternrecognitionandemailthreadinganalysissoftwaresearchesmassiveamountsofemailandchatdataforpotentialnoncompliance.ArecentexampleinthisareaisVolkswagen,whomighthaveavoidedhugefinesandreputationalhitsbyusinganalyticstomonitorcommunicationsforsuspiciousmessages. Trackhigh-volumecustomerconversationsinsocialmedia.Textanalyticsandsentimentanalysisletsanalystsreviewpositiveandnegativeresultsofmarketingcampaigns,orevenidentifyonlinethreats.Thislevelofanalyticsisfarmoresophisticatedthansimplekeywordsearch,whichcanonlyreportbasics,likehowoftenpostersmentionedthecompanynameduringanewcampaign.Newanalyticsalsoincludecontext:wasthementionpositiveornegative?Werepostersreactingtoeachother?Whatwasthetoneofreactionstoexecutiveannouncements?Theautomotiveindustry,forexample,isheavilyinvolvedinanalyzingsocialmedia,sincecarbuyersoftenturntootherposterstoguidetheircarbuyingexperience.Analystsuseacombinationoftextminingandsentimentanalysistotrackauto-relateduserpostsonTwitterandFacebook. Gainnewmarketingintelligence.Machine-learninganalyticstoolsquicklyworkonmassiveamountsofdocumentstoanalyzecustomerbehavior.Amajormagazinepublisherappliedtextminingtohundredsofthousandsofarticles,analyzingeachseparatepublicationbythepopularityofmajorsubtopics.Thentheyextendedanalyticsacrossalltheircontentpropertiestoseewhichoveralltopicsgotthemostattentionbycustomerdemographic.Theanalyticsranacrosshundredsofthousandsofpiecesofcontentacrossallpublications,andcross-referencedhottopicresultsbysegments.Theresultwasaricheducationonwhichtopicsweremostinterestingtodistinctcustomers,andwhichmarketingmessagesresonatedmoststronglywiththem. ReadNext:WhatisDataAnnotation? IneDiscovery,datascientistsusekeywordstosearchunstructureddataandgetareasonableideaofthedatainvolved.  ToolstoUseforStructuredandUnstructuredDataAnalytics Nomatterwhatyourbusinessspecificsare,today’sgoalistotapbusinessvaluethroughbothstructuredandunstructureddatasets.Bothtypesofdatapotentiallyholdagreatdealofvalue,andnewertoolscanaggregate,query,analyze,andleveragealldatatypesfordeepbusinessinsightacrosstheuniverseofcorporatedata.Checkoutthesetopbusinessintelligencetoolsforstructuredandunstructureddataanalytics,andstartgrowingyourdatacapabilitiesacrossalltypesofdata: ApacheHadoop Tableau(Salesforce) KNIME MicrosoftPowerBI OracleBI RapidMiner SASViyaandTextMiner CogitoSemanticTechnology ZohoAnalytics CVAT Nextsteps:tofullyunderstandtheenterpriseITinfrastructurethathoststoday’sstructuredandunstructuredBigDatatools,readWhatisCloudComputing?TheCompleteGuide OriginallypublishedMarch28,2018.RepublishedwithupdatesonMay21,2021. PreviousarticleWhatisDataVisualization?NextarticleWhatisDataDiscovery? Similararticles BigData TopDataManagementPlatforms&Software2022 BigData KyndrylandAWSFormCloudPartnership BigData AWSPlansGlobalExpansionofLocalZones  Share Facebook Twitter Pinterest WhatsApp LatestArticles BigData TopDataManagementPlatforms... CynthiaHarvey-February28,2022 BigData KyndrylandAWSForm... ChrisEhrlich-February27,2022 BigData AWSPlansGlobalExpansion... ChrisEhrlich-February25,2022 Applications BallyImplementingOracleRetail... ChrisEhrlich-February25,2022 DatamationistheleadingindustryresourceforB2Bdataprofessionalsandtechnologybuyers.Datamation’sfocusisonprovidinginsightintothelatesttrendsandinnovationinAI,datasecurity,bigdata,andmore,alongwithin-depthproductrecommendationsandcomparisons.Morethan1.7MusersgaininsightandguidancefromDatamationeveryyear. FacebookTwitterYoutube Advertisers AdvertisewithTechnologyAdviceonDatamationandourotherdataandtechnology-focusedplatforms. AdvertisewithUs Menu OurBrands PrivacyPolicy Terms About Contact Advertise California–DoNotSellMyInformation PropertyofTechnologyAdvice. ©2021TechnologyAdvice.AllRightsReserved AdvertiserDisclosure:SomeoftheproductsthatappearonthissitearefromcompaniesfromwhichTechnologyAdvicereceivescompensation.Thiscompensationmayimpacthowandwhereproductsappearonthissiteincluding,forexample,theorderinwhichtheyappear.TechnologyAdvicedoesnotincludeallcompaniesoralltypesofproductsavailableinthemarketplace. ×



請為這篇文章評分?