social media analytics: a survey of techniques, tools and ... ¢â

文章推薦指數: 80 %
投票人數:10人

OPEN FORUM Social media analytics: a survey of techniques tools and platforms Bogdan Batrinca • Philip C Treleaven Received: 25 February ... socialmediaanalytics:asurveyoftechniques,toolsand...¢â‚¬¢...HomeDocumentsSocialmediaanalytics:asurveyoftechniques,toolsand...¢â‚¬¢Scraping¢â‚¬â€‌collectingonlinedataClickheretoloadreaderSeeFullReaderprevnextoutof28Poston11-May-20205viewsCategory:Documents1downloadReportDownloadFacebookTwitterE-MailLinkedInPinterestEmbedSize(px) 344x292429x357514x422599x487TRANSCRIPTOPENFORUM Socialmediaanalytics:asurveyoftechniques,toolsandplatforms BogdanBatrinca•PhilipC.Treleaven Received:25February2014/Accepted:4July2014/Publishedonline:26July2014 �TheAuthor(s)2014.ThisarticleispublishedwithopenaccessatSpringerlink.com AbstractThispaperiswrittenfor(socialscience) researchersseekingtoanalyzethewealthofsocialmedia nowavailable.Itpresentsacomprehensivereviewof softwaretoolsforsocialnetworkingmedia,wikis,really simplesyndicationfeeds,blogs,newsgroups,chatand newsfeeds.Forcompleteness,italsoincludesintroduc- tionstosocialmediascraping,storage,datacleaningand sentimentanalysis.Althoughprincipallyareview,the paperalsoprovidesamethodologyandacritiqueofsocial mediatools.Analyzingsocialmedia,inparticularTwitter feedsforsentimentanalysis,hasbecomeamajorresearch andbusinessactivityduetotheavailabilityofweb-based applicationprogramminginterfaces(APIs)providedby Twitter,FacebookandNewsservices.Thishasledtoan ‘explosion’ofdataservices,softwaretoolsforscrapingand analysisandsocialmediaanalyticsplatforms.Itisalsoa researchareaundergoingrapidchangeandevolutiondueto commercialpressuresandthepotentialforusingsocial mediadataforcomputational(socialscience)research. Usingasimpletaxonomy,thispaperprovidesareviewof leadingsoftwaretoolsandhowtousethemtoscrape, cleanseandanalyzethespectrumofsocialmedia.In addition,itdiscussedtherequirementofanexperimental computationalenvironmentforsocialmediaresearchand presentsasanillustrationthesystemarchitectureofa socialmedia(analytics)platformbuiltbyUniversityCol- legeLondon.Theprincipalcontributionofthispaperisto provideanoverview(includingcodefragments)for scientistsseekingtoutilizesocialmediascrapingand analyticseitherintheirresearchorbusiness.Thedata retrievaltechniquesthatarepresentedinthispaperare validatthetimeofwritingthispaper(June2014),butthey aresubjecttochangesincesocialmediadatascrapingAPIs arerapidlychanging. KeywordsSocialmedia�Scraping�Behavior economics�Sentimentanalysis�Opinionmining� NLP�Toolkits�Softwareplatforms 1Introduction Socialmediaisdefinedasweb-basedandmobile-based Internetapplicationsthatallowthecreation,accessand exchangeofuser-generatedcontentthatisubiquitously accessible(KaplanandHaenlein2010).Besidessocial networkingmedia(e.g.,TwitterandFacebook),forcon- venience,wewillalsousetheterm‘socialmedia’to encompassreallysimplesyndication(RSS)feeds,blogs, wikisandnews,alltypicallyyieldingunstructuredtext andaccessiblethroughtheweb.Socialmediaisespecially importantforresearchintocomputationalsocialscience thatinvestigatesquestions(Lazeretal.2009)using quantitativetechniques(e.g.,computationalstatistics, machinelearningandcomplexity)andso-calledbigdata fordataminingandsimulationmodeling(Cioffi-Revilla 2010). Thishasledtonumerousdataservices,toolsandana- lyticsplatforms.However,thiseasyavailabilityofsocial mediadataforacademicresearchmaychangesignificantly duetocommercialpressures.Inaddition,asdiscussedin Sect.2,thetoolsavailabletoresearchersarefarfromideal. Theyeithergivesuperficialaccesstotherawdataor(for B.Batrinca�P.C.Treleaven(&) DepartmentofComputerScience,UniversityCollegeLondon, GowerStreet,LondonWC1E6BT,UK e-mail:[email protected] B.Batrinca e-mail:[email protected] 123 AI&Soc(2015)30:89–116 DOI10.1007/s00146-014-0549-4non-superficialaccess)requireresearcherstoprogram analyticsinalanguagesuchasJava. 1.1Terminology Westartwithdefinitionsofsomeofthekeytechniques relatedtoanalyzingunstructuredtextualdata: •Naturallanguageprocessing—(NLP)isafieldof computerscience,artificialintelligenceandlinguistics concernedwiththeinteractionsbetweencomputersand human(natural)languages.Specifically,itisthe processofacomputerextractingmeaningfulinforma- tionfromnaturallanguageinputand/orproducing naturallanguageoutput. •Newsanalytics—themeasurementofthevarious qualitativeandquantitativeattributesoftextual (unstructureddata)newsstories.Someoftheseattri- butesare:sentiment,relevanceandnovelty. •Opinionmining—opinionmining(sentimentmining, opinion/sentimentextraction)istheareaofresearch thatattemptstomakeautomaticsystemstodetermine humanopinionfromtextwritteninnaturallanguage. •Scraping—collectingonlinedatafromsocialmedia andotherWebsitesintheformofunstructuredtextand alsoknownassitescraping,webharvestingandweb dataextraction. •Sentimentanalysis—sentimentanalysisreferstothe applicationofnaturallanguageprocessing,computa- tionallinguisticsandtextanalyticstoidentifyand extractsubjectiveinformationinsourcematerials. •Textanalytics—involvesinformationretrieval(IR), lexicalanalysistostudywordfrequencydistributions, patternrecognition,tagging/annotation,information extraction,dataminingtechniquesincludinglinkand associationanalysis,visualizationandpredictive analytics. 1.2Researchchallenges Socialmediascrapingandanalyticsprovidesarichsource ofacademicresearchchallengesforsocialscientists, computerscientistsandfundingbodies.Challenges include: •Scraping—althoughsocialmediadataisaccessible throughAPIs,duetothecommercialvalueofthedata, mostofthemajorsourcessuchasFacebookand Googlearemakingitincreasinglydifficultforacadem- icstoobtaincomprehensiveaccesstotheir‘raw’data; veryfewsocialdatasourcesprovideaffordabledata offeringstoacademiaandresearchers.Newsservices suchasThomsonReutersandBloombergtypically chargeapremiumforaccesstotheirdata.Incontrast, TwitterhasrecentlyannouncedtheTwitterDataGrants program,whereresearcherscanapplytogetaccessto Twitter’spublictweetsandhistoricaldatainorderto getinsightsfromitsmassivesetofdata(Twitterhas morethan500milliontweetsaday). •Datacleansing—cleaningunstructuredtextualdata (e.g.,normalizingtext),especiallyhigh-frequency streamedreal-timedata,stillpresentsnumerousprob- lemsandresearchchallenges. •Holisticdatasources—researchersareincreasingly bringingtogetherandcombiningnoveldatasources: socialmediadata,real-timemarket&customerdata andgeospatialdataforanalysis. •Dataprotection—onceyouhavecreateda‘bigdata’ resource,thedataneedstobesecured,ownershipand IPissuesresolved(i.e.,storingscrapeddataisagainst mostofthepublishers’termsofservice),andusers providedwithdifferentlevelsofaccess;otherwise, usersmayattemptto‘suck’allthevaluabledatafrom thedatabase. •Dataanalytics—sophisticatedanalysisofsocialmedia dataforopinionmining(e.g.,sentimentanalysis)still raisesamyriadofchallengesduetoforeignlanguages, foreignwords,slang,spellingerrorsandthenatural evolvingoflanguage. •Analyticsdashboards—manysocialmediaplatforms requireuserstowriteAPIstoaccessfeedsorprogram analyticsmodelsinaprogramminglanguage,suchas Java.Whilereasonableforcomputerscientists,these skillsaretypicallybeyondmost(socialscience) researchers.Non-programminginterfacesarerequired forgivingwhatmightbereferredtoas‘deep’accessto ‘raw’data,forexample,configuringAPIs,merging socialmediafeeds,combiningholisticsourcesand developinganalyticalmodels. •Datavisualization—visualrepresentationofdata wherebyinformationthathasbeenabstractedinsome schematicformwiththegoalofcommunicating informationclearlyandeffectivelythroughgraphical means.Giventhemagnitudeofthedatainvolved, visualizationisbecomingincreasinglyimportant. 1.3Socialmediaresearchandapplications Socialmediadataisclearlythelargest,richestandmost dynamicevidencebaseofhumanbehavior,bringingnew opportunitiestounderstandindividuals,groupsandsociety. Innovativescientistsandindustryprofessionalsare increasinglyfindingnovelwaysofautomaticallycollect- ing,combiningandanalyzingthiswealthofdata.Natu- rally,doingjusticetothesepioneeringsocialmedia 90AI&Soc(2015)30:89–116 123applicationsinafewparPythonandWebDataExtraction:Introduction2016/07/02¢ ¢â‚¬â€œWebScrapingwithPython:CollectingbyDemonstration¢â‚¬â€Œ!ScrapingDistributed,with¢â‚¬“Programming...ScrapingDistributed,HierarchicalScrapingbyexamplesScrapingHandoutWebScraping-¢â‚¬¢Serverbusytoanswerscrapingdemandsandcannotserveothertra¯¬’c¢â‚¬¢Method,SelectingVariables,andCollectingData...Aug02,2019¢ Datascraping(extractingdataAPIs+WebScraping-GitHubPages¢â‚¬¢Websites¢â‚¬¢Requests¢â‚¬¢Scraping¢â‚¬¢APIs¢â‚¬¢APIWrappers.WhatSCRAPING-1&1.–Heavierduringinitialaggressivescraping...MachineToolSpecialtyComponentsWebScrapingServicesPresentacionwebscrapingScreenscrapingOdd2015scrapingWoodenScraper-...¢â‚¬¢Scrapingandcollectingofcervicalcellsforpapsmears¢â‚¬¢TheelongatedSCREENSCRAPINGUSING“IBMPERSONALCOMMUNICATIONâ€‌scrapingIBM5250.pdfSCREENSCRAPINGUSING“IBMPERSONALCOMMUNICATIONâ€‌...MeaningtheuseofscreenscrapperScrapingforStoriesUniprep¢â€‍¢4XLRotaryScrapingTool-CaldertechUniprep¢â€‍¢4XLRotaryThisrotaryscrapingtoolfeaturesScrapingtheOlympicsTytu¥â€orygina¥â€u:WebScrapingwithPython:CollectingMoreDatafromtheModernWeb...2019-05-15¢ BigDataESSNetWP1:WebScrapingforJobVacancyStatistics2018-05-22¢ ¢â‚¬¢Directwebscraping¢â‚¬¢AlmostScraping:WebScrapingwithoutProgrammingViewmore



請為這篇文章評分?